Pre/Post deployment script

Hybrid512 · 4 May 2021 13:44

Hi,

I am in a situation where I would like to configure vlans for a new machine at deployment time for any ready machine and any deployment mode (through MaaS UI/CLI, through Juju or other things like Terraform) and for that, I’d like to use a custom script that would just do that based on informations gathered from my context (machine’s MAC address, machine’s configured vlans, …) but that are run by the MaaS controller (or even Redion server) and not the machine itself.

A small example :

I have a new machine that has just been racked in my DC on a switch port that only has the PXE network configured
In MaaS, I pre-created a few subnets which already exists on my fabrics (but might not be autodiscovered for security reasons)
Once in ready status, I configure my new machine’s network with some vlans, …

What I’d like is that, at deploy time, MaaS execute (and not the deployed machine itself through cloud-init because this machine won’t have the permissions to communicate with the fabrics API) a custom script that will get informations from the deployed machine (MAC, name, vlans, …) and automatically configure the switch port with the required vlans before the machine is deployed.

Is it possible ?
How can this be achieved ?

Best regards

Hybrid512 · 4 June 2021 15:41

Well, @billwear, I saw you edited the post to add the “cloud-init” category but this is not really the case … the main purpose is “how do you run an action (script, hook, whatever) from the MaaS Region server itself at deploy time and not from the machine beeing deployed” ?

Best regards

cgrabowski · 7 June 2021 19:57

Hi @Hybrid512, to clarify what you mean by “configure with some vlans” you mean to attach the new machine to MAAS managed VLANs? You can create the VLANs in MAAS itself, but MAAS cannot execute a given script on its own host.

Hybrid512 · 8 June 2021 07:15

Let me clarify with an example :

We will be using Cisco ACI with MaaS.
Cisco ACI is a SDN fabric (I’m simplifying) that we want to use to automate network deployment when deploying bare metal.
To be clear, in MaaS, I’ll have a pool of ready machines that I can deploy but these machines will need a few vlans in order to operate.
We have 2 categories of vlans :

some that are “static” and mandatory that will always be configured for a machine : those ones will be configured directly within MaaS (either with the UI or the CLI/API)
some are “dynamic” and depend of the end usage of the machine and are not always even created in advance. In that case, we’d like to have a way to automatically create them based on a few criteria attached to the machine (tags for example) like executing a pre-deployment hook from MaaS because the deploying machine itself might not have the accesses to control the Cisco ACI API.

We also don’t expect people using MaaS to use the UI for machine deployment, they can use things like Juju or Terraform so ideally, this would be something configured at the MaaS level that is run whichever the tool you use to control MaaS.

Another use case would be a “post-deployment” hook to remove some vlans.
For example, our security team is concerned about the network used for PXE booting.
Even though this network is not configured on the deployed machine (and you have to manually remove it in MaaS to do so), the vlan is still present on the network port of the fabric so a malicious individual could re-enable it and use that network like a bridge between every MaaS controlled machines.
This kind of “pst-deployment” hook would give us the ability to unconfigure that network through ACI.

Hope I made myself clear about what we would like to achieve.

cgrabowski · 8 June 2021 13:25

Thanks for the reply! While MAAS itself doesn’t have pre/post deployment hooks it can execute locally, you may be able to accomplish this with commissioning scripts https://maas.io/docs/snap/2.9/ui/commission-machines#heading--commissioning-scripts

dandruczyk · 8 June 2021 14:06

The trouble is commissioning scripts run AFTER boot, so if you need to pre-configure ports’ native (untrunked) vlan ahead of time, that won’t necessarily work. You could partially get around this if the default VLAN in your configuration is PXEbootable to maas as a “provisioning network”, and everything else is trunked and configured as part of a commissioning script that you write Then at deployment you can either configure a persistent provisioning/mgmt IP on the nodes or just use VLAN interfaces for all connectivity.

cgrabowski · 8 June 2021 14:11

Yeah that is true. I was more so suggesting commission scripts for this specific situation, but if that doesn’t address the issue, I’ve moved this to a feature request.

gregoryo2017 · 8 June 2021 23:33

I don’t know if it would help, but we do a bootstrap run of our config management (currently Puppet, sometimes Ansible) initiated by a curtin_userdata entry to do minimal config such as network. The main config management does a full run on first normal boot.

Details: curtin_userdata actually calls the 00 file in https://bitbucket.pawsey.org.au/projects/CLOUD/repos/cloud-baremetal-provisioning/browse which does the above. Tags dictate some of what happens.

Hybrid512 · 9 June 2021 09:03

Thanks for the tip but still, this is ran by the deployed machine and this is not what I want.
I need those scripts to be run before/after deployment by the MaaS server itself.

gregoryo2017 · 9 June 2021 09:29

Could you signal the MAAS server from the partially deployed machine and have it then run those scripts, and then ‘release the signal’ for it to do the final reboot?

Another clumsy thought which you might be able to refine: Have curtin sleep during that first boot, while the scripts on MAAS watch for the machine to become available for ssh, do their work then kill the sleep.

Hybrid512 · 9 June 2021 09:32

Here is a small example :

In MaaS, I have a “Ready” machine.
This machine has access to 4 configured vlans :

the untagged network for the PXE
another vlan (let’s call it “vlan1000”) for any usage you like, maybe server admin
2 more vlans for iSCSI with multipath
This machine is configured to only have 3 networks configured once deployed, the vlan1000 and the iSCSI because we don’t want the PXE network to stay plugged once the machine is deployed.

Doing so forces us to have all vlans (PXE + the vlan1000 + iSCSI) configured on the switch port where our machine is attached.
This is not an issue when the machine is in “Ready” state but keeping the PXE vlan configured on this port (even if it is not used at the OS level) is a security concern for us so we want it to be unconfigured from the switch port once the machine is fully deployed.
We can do it by hand but since we’re using MaaS to automate bare meta deployment, we would like to automate this action so the reason for these pre/post deployment scripts.

Basically, the wokflow would be this :

machine in “Ready” state with PXE and VLAN1000 configured on the attached switch port
machine starts deployment process
MaaS execute a “pre-deployment hook” that execute a script which configures the iSCSI vlans on the attached switch ports (the machine has a “iscsi” tag which we would use to detect that we need to configure such network)
machine is fully deployed
MaaS execute a post-deployment hook that unconfigure PXE vlan on the attached switch port (that can be done because we know the machine ID and then, all the associated informations such as MAC addresses, …)

Some times later :

machine is going to be released
MaaS executes a “pre-release hook” which reinstate the PXE vlan on the attached switch port
machine is released
MaaS executes a “post-release hook” which removes the iSCSI vlans on teh attached switch port (they might not be needed for another deployment)

Best regards

khbkhb · 16 June 2021 15:19

@cgrabowski issues with doing such things with commissioning scripts include:

in a tightly secured environment, the BMC/LOM/iDRAC should not be on the same network or accessible directly from the managed device. So “can’t get there from here.”
the ONLY system which should have access to the management port is the Rack controller. Keeping each “rack” of devices administratively isolated.

Critical operations like:

Secure erase (all “drives”)
Loading firmware
BIOS configuration

Should be executable with the data ports shut off entirely.

Otherwise, a hostile client could change the boot order from PXE to local boot, install trojaned firmware (disk, BMC, networking, etc.), so when the machine is released, MAAS will cheerfully try to PXE boot, leaving the system plenty of time to access the internal network and potentially the external network(s) as well.

Only after the secure erase, reloading known good firmware, and reconfiguration of the BIOS have occurred should the system have network access on its data ports.

Hybrid512 · 3 August 2021 16:22

Where do we see all the feature requests and their assignments ?
This would be a really welcomed feature for us and I’d like to upvote/help for it.

billwear · 5 August 2021 15:04

@Hybrid512, you nailed it! i’m going to spend the day seeing what i can do about this!

Hybrid512 · 5 August 2021 15:53

@billwear Nice !
I can be a lot more precise for what I’m thinking of … how can I send you my proposal ?
On Discourse or through another way ?

billwear · 5 August 2021 15:56

bill.wear@canonical.com

You can also book some time with me today (or any Thursday, really) to talk about it, if you want.

szeestraten · 29 October 2021 11:12

Any chance of something like this being on the roadmap for 3.1?

szeestraten · 17 November 2021 10:35

Just FYI, this type of feature has been requested since 2017.

See https://lists.ubuntu.com/archives/maas-devel/2017-March/002515.html and https://lists.ubuntu.com/archives/maas-devel/2017-February/002387.html which had a thumbs up from @sabdfl.

Hope it can be considered for upcoming roadmaps as it would make integrating MAAS with other tools and procedures a lot simpler.