About commissioning machines

billwear · 6 February 2024 19:37

Errors or typos? Topics missing? Hard to read? Let us know.

When MAAS commissions a machine, the following sequence of events takes place:

DHCP server is contacted
kernel and initrd are received over TFTP
machine boots
initrd mounts a Squashfs image ephemerally over HTTP
cloud-init runs built-in and custom commissioning scripts
machine shuts down

The commissioning scripts will talk to the region API server to ensure that everything is in order and that eventual deployment will succeed.

MAAS chooses the latest Ubuntu LTS release as the default image for commissioning. If desired, you can select a different image in the “Settings” page of the web UI, by selecting the “General” tab and then scrolling down to the Commissioning section.

Commissioning requires 60 seconds.

NUMA and SR-IOV

If you are using the NUMA architecture, MAAS versions 2.7 and higher guarantee that machines are assigned to a single NUMA node that contains all the machine’s resources. Node boundaries are critical, especially in vNUMA situations. Splitting nodes can create unnecessary latency. You want the NUMA node boundaries to match VM boundaries if at all possible.

You must recommission NUMA/SR-IOV machines that were previously commissioned under version 2.6 or earlier.

When using these nodes, you can specify a node index for interfaces and physical block devices. MAAS will display the NUMA node index and details, depending upon your configuration, to include the count of NUMA nodes, number of CPU cores, memory, NICs, and node spaces for bonds and block devices. You can also filter machines by CPU cores, memory, subnet, VLAN, fabric, space, storage, and RAID, among others.

Commissioning scripts

MAAS runs scripts during enlistment, commissioning and testing to collect data about nodes. Both enlistment and commissioning run all builtin commissioning scripts, though enlistment runs only built-ins. Commissioning also runs any user-uploaded commissioning scripts by default, unless the user manually provides a list of scripts to run. MAAS uses these commissioning scripts to configure hardware and perform other tasks during commissioning, such as updating the firmware. Similarly, MAAS employs hardware testing scripts to evaluate system hardware and report its status.

Scripts can be selected to run from web UI during commissioning, by testing hardware, or from the command line. Note that MAAS only runs built-in commissioning scripts during enlistment. Custom scripts can be run when you explicitly choose to commission a machine. A typical administrator workflow (with machine states), using customised commissioning scripts, can be represented as:

Add machine -> Enlistment (runs built-in commissioning scripts MAAS) -> New -> Commission (runs built-in and custom commissioning scripts) -> Ready -> Deploy

NOTE: Scripts are run in alphabetical order in an ephemeral environment. We recommend running your scripts after any MAAS built-in scripts. This can be done by naming your scripts 99-z*. It is possible to reboot the system during commissioning using a script, however, as the environment is ephemeral, any changes to the environment will be destroyed upon reboot (barring, of course, firmware type updates).

When a machine boots, MAAS first instructs it to run cloud-init to set up SSH keys (during commissioning only), set up NTP, and execute a script that runs other commissioning scripts. Currently, the sequence of MAAS-provided commissioning scripts proceeds like this:

maas-support-info: MAAS gathers information that helps to identify and characterise the machine for debugging purposes, such as the kernel, versioning of various components, etc. *Runs in parallel with other scripts.
maas-lshw: this script pulls system BIOS and vendor info, and generates user-defined tags for later use. *Runs in parallel with other scripts.
20-maas-01-install-lldpd: this script installs the link layer discovery protocol (LLDP) daemon, which will later capture networking information about the machine. This script provides some extensive logging.
maas-list-modaliases: this script figures out what hardware modules are loaded, providing a way to autorun certain scripts based on which modules are loaded. *Runs in parallel with other scripts.
20-maas-02-dhcp-unconfigured-ifaces: MAAS will want to know all the ways the machine is connected to the network. Only PXE comes online during boot; this script brings all the other networks online so they can be recognised. This script provides extensive logging.
maas-get-fruid-api-data: this script gathers information for the Facebook wedge power type. *Runs in parallel with other scripts.
maas-serial-ports: this script lists what serial ports are available on the machine. *Runs in parallel with other scripts.
40-maas-01-network-interfaces: (MAAS v2.9 only) this script is just used to get the IP address, which can then be associated with a VLAN/subnet.
50-maas-01-commissioning: (MAAS v3.0 and higher) this script is the main MAAS tool, gathering information on machine resources, such as storage, network devices, CPU, RAM, details about attached USB and PCI devices, etc. We currently pull this data using lxd: We use a Go binary built from lxd source that just contains the minimum source to gather the resource information we need. This script also checks whether the machine being commissioning is a virtual machine, which may affect how MAAS interacts with it.
50-maas-01-commissioning: (MAAS v2.9 only) this script is the main MAAS tool, gathering information on machine resources, such as storage, network devices, CPU, RAM, etc. We currently pull this data using lxd: We use a Go binary built from lxd source that just contains the minimum source to gather the resource information we need. This script also checks whether the machine being commissioning is a virtual machine, which may affect how MAAS interacts with it.
maas-capture-lldp: this script gathers LLDP network information to be presented on the logs page; this data is not used by MAAS at all. *Runs in parallel with other scripts.
maas-kernel-cmdline: this script is used to update the boot devices; it double-checks that the right boot interface is selected.

Commissioning runs the same dozen or so scripts as enlistment, gathering all the same information, but with these caveats:

Commissioning also runs user-supplied commissioning scripts, if present. Be aware that these scripts run as root, so they can execute any system command.
Commissioning runs test scripts which are not run during enlistment.
Commissioning scripts can send BMC configuration data, and can be used to configure BMC data.
The environment variable BMC_CONFIG_PATH is passed to serially run commissioning scripts; these scripts may write BMC power credentials to BMC_CONFIG_PATH in YAML format, where each key is a power parameter. The first script to write BMC_CONFIG_PATH is the only script allowed to configure the BMC, allowing you to override MAAS’ built-in BMC detection. If the script returns 0, that value will be send to MAAS.
All built-in commissioning scripts have been migrated into the database.
maas-run-remote-scripts is capable of enlisting machines, so enlistment user-data scripts have been removed.
The metadata endpoints http://<MAAS>:5240/<latest or 2012-03-01>/ and http://<MAAS>:5240/<latest or 2012-03-01>/meta-data/ are now available anonymously for use during enlistment.

In both enlistment and commissioning, MAAS uses either the MAC address or the UUID to identify machines. Currently, because some machine types encountered by MAAS do not use unique MAC addresses, we are trending toward using the UUID.

To commission a node, it must have a status of “New”.

You have the option of setting some parameters to change how commissioning runs:

enable_ssh: Optional integer. Controls whether to enable SSH for the commissioning environment using the user’s SSH key(s). ‘1’ == True, ‘0’ == False. Roughly equivalent to the Allow SSH access and prevent machine powering off in the web UI.
skip_bmc_config: Optional integer. Controls whether to skip re-configuration of the BMC for IPMI based machines. ‘1’ == True, ‘0’ == False.
skip_networking: Optional integer. Controls whether to skip re-configuring the networking on the machine after the commissioning has completed. ‘1’ == True, ‘0’ == False. Roughly equivalent to Retain network configuration in the web UI.
skip_storage: Optional integer. Controls whether to skip re-configuring the storage on the machine after the commissioning has completed. ‘1’ == True, ‘0’ == False. Roughly equivalent to Retain storage configuration in the web UI.
commissioning_scripts: Optional string. A comma separated list of commissioning script names and tags to be run. By default all custom commissioning scripts are run. Built-in commissioning scripts always run. Selecting update_firmware or configure_hba will run firmware updates or configure HBA’s on matching machines.
testing_scripts: Optional string. A comma separated list of testing script names and tags to be run. By default all tests tagged commissioning will be run. Set to none to disable running tests.
parameters: Optional string. Scripts selected to run may define their own parameters. These parameters may be passed using the parameter name. Optionally a parameter may have the script name prepended to have that parameter only apply to that specific script.

Commissioning logs

MAAS keeps extensive logs of the commissioning process for each machine. These logs present an extremely detailed, timestamped record of completion and status items from the commissioning process.

Disabling boot methods

It is possible to disable individual boot methods. This must be done via the CLI. When a boot method is disabled MAAS will configure MAAS controlled isc-dhcpd to not respond to the associated boot architecture code^. External DHCP servers must be configured manually.

To allow different boot methods to be in different states on separate physical networks using the same VLAN ID configuration is done on the subnet in the UI or API. When using the API boot methods to be disabled may be specified using the MAAS internal name or boot architecture code^ in octet or hex form.

For MAAS 3.0 and above, the following boot method changes have been implemented:

UEFI AMD64 HTTP(00:10) has been re-enabled.
UEFI ARM64 HTTP(00:13) has been enabled.
UEFI ARM64 TFTP(00:0B) and UEFI ARM64 HTTP(00:13) will now provide a shim and GRUB signed with the Microsoft boot loader keys.
grub.cfg for all UEFI platforms has been updated to replace the deprecated linuxefi and initrdefi commands with the standard linux and initrd commands.
GRUB debug may now be enabled by enabling rackd debug logging.

Automatic selection

When selecting multiple machines, scripts which declare the for_hardware field will only run on machines with matching hardware. To automatically run a script when ‘Update firmware’ or ‘Configure HBA’ is selected, you must tag the script with ‘update_firmware’ or ‘configure_hba’.

Similarly, scripts selected by tag on the command line which specify the for_hardware field will only run on matching hardware.

Interpreting scripts

A script can output its results to a YAML file, and those results will be associated with the hardware type defined within the script. MAAS provides the path for the results file in an environment variable, RESULT_PATH. Scripts should write YAML to this file before exiting.

If the hardware type is storage, for example, and the script accepts a storage type parameter, the result will be associated with a specific storage device.

The YAML file must represent a dictionary with these two fields:

result: The completion status of the script. This status can be passed, failed, degraded, or skipped. If no status is defined, an exit code of 0 indicates a pass while a non-zero value indicates a failure.
results: A dictionary of results. The key may map to a results key defined as embedded YAML within the script. The value of each result must be a string or a list of strings.

Optionally, a script may define what results to return in the YAML file in the metadata fields… The results field should contain a dictionary of dictionaries. The key for each dictionary is a name which is returned by the results YAML. Each dictionary may contain the following two fields:

title - The title for the result, used in the UI.
description - The description of the field used as a tool-tip in the UI.

Here is an example of “degrade detection”:

!/usr/bin/env python3

 --- Start MAAS 1.0 script metadata ---
 name: example
 results:
   memspeed:
     title: Memory Speed
     description: Bandwidth speed of memory while performing random read writes
 --- End MAAS 1.0 script metadata ---

import os
import yaml

memspeed = some_test()

print('Memspeed: %s' % memspeed)
results = {
   'results': {
       'memspeed': memspeed,
   }
}
if memspeed < 100:
   print('WARN: Memory test passed but performance is low!')
   results['status'] = 'degraded'

result_path = os.environ.get("RESULT_PATH")
if result_path is not None:
   with open(result_path, 'w') as results_file:
       yaml.safe_dump(results, results_file)

Tagging scripts

As with general tag management, tags make scripts easier to manage; grouping scripts together for commissioning and testing, for example:

maas $PROFILE node-script add-tag $SCRIPT_NAME tag=$TAG
maas $PROFILE node-script remove-tag $SCRIPT_NAME tag=$TAG

MAAS runs all commissioning scripts by default. However, you can select which custom scripts to run during commissioning by name or tag:

maas $PROFILE machine commission \
commissioning_scripts=$SCRIPT_NAME,$SCRIPT_TAG

You can also select which testing scripts to run by name or tag:

maas $PROFILE machine commission \
testing_scripts=$SCRIPT_NAME,$SCRIPT_TAG

Any testing scripts tagged with commissioning will also run during commissioning.

Debugging scripts

You can individually access the output from both completed and failed scripts.

If you need further details, especially when writing and running your own scripts, you can connect to a machine and examine its logs and environment.

Because scripts operate within an ephemeral version of Ubuntu, enabling this option stops the machine from shutting down, allowing you to connect and probe a script’s status.

As long as you’ve added your SSH key to MAAS, you can connect with SSH to the machine’s IP with a username of ubuntu. Type sudo -i to get root access.

Testing hardware

If you wish, you can tell MAAS to test machine hardware using well-known Linux utilities. MAAS can test machines that have a status of Ready, Broken, or Deployed. You can include testing as part of the commissioning process. When you choose the ‘Commission’ action, MAAS will display the dialog described below. Be aware, though, that if the hardware tests fail, the machine will become unavailable for Deployment.

The majority of testing scripts only work with machines that are backed by physical hardware (e.g. they may be incompatible with VM-based machines).

With MAAS, you can easily write, upload and execute your hardware testing scripts and see the results.

Interpreting logs

MAAS logs test results and allows you to view a summary of tests run against a particular machine. You can also example details on any particular tests:

You can also examine the “raw” log output:

Help interpreting these logs can be found under the Logging section of this documentation.

Testing networks

MAAS provides a comprehensive suite of network and link testing capabilities. MAAS can check whether or not links are connected, detect slow links, and report link and interface speeds via UI or API. In addition, you can test Internet connectivity against a user-provided list of URLs or IP addresses. Bonded NICS will be separated during this testing, so that each side of a redundant interface is fully evaluated.

Network testing also includes customisable network testing and commissioning scripts. There are no particular restrictions on these scripts, allowing you to test a wide variety of possible conditions and situations.

Delayed NW config

Once commissioned, you can configure the machine’s network interface(s). Specifically, when a machine’s status is either “Ready” or “Broken”, interfaces can be added/removed, attached to a fabric and linked to a subnet, and provided an IP assignment mode. Tags can also be assigned to specific network interfaces.