What is the right way to handle commissioning scripts that reboot

Hi All

I’m running maas 3.4.1

I have a commissioning script that I want to use to make sure the bios options are set correctly. After rebooting the node to apply the settings it causes other commissioning scripts to fail in different ways depending on the metadata I apply to the script and when I run it.

The docs say to try and run it after all the default scripts so I have tried both 51-set-bios-options.py and zzz–set-bios-options.py and parallel: disabled. Without parallel disabled it runs with the maas-* scripts and reboots in the middle causing random scripts to time out as they take >10 mins and not a few seconds as specified.

this is my most recent try with parallel disabled.

#!/usr/bin/env python3
# --- Start MAAS 1.0 script metadata ---
# name: zzz-set-bios-options
# title: Enforce required BIOS options from netbox
# description: Enforce required BIOS options from netbox
# script_type: commissioning
# recommission: False
# may_reboot: True
# parallel: disabled
# timeout: 00:30:00
# --- End MAAS 1.0 script metadata ---

in both cases it runs after 50-maas-01-commissioning and before all of:

maas-list-modaliases
maas-get-fruid-api-data
maas-kernel-cmdline
maas-serial-ports
maas-capture-lldpd

this is the log:

 Tue, 28 May. 2024 19:14:41	Script result - maas-lshw changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:14:41	Failed commissioning
 Tue, 28 May. 2024 19:14:41	Node changed status - From 'Commissioning' to 'Failed commissioning'
 Tue, 28 May. 2024 19:14:39	Script result - maas-support-info changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:14:38	Script result - maas-support-info changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:14:38	Script result - maas-get-fruid-api-data changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:14:38	Script result - maas-list-modaliases changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:14:37	Script result - zzz-set-bios-options changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:14:37	Script result - maas-kernel-cmdline changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:14:37	Script result - maas-serial-ports changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:14:37	Script result - maas-capture-lldpd changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:14:37	Script result - maas-lshw changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:14:37	Script result - maas-list-modaliases changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:14:37	Script result - maas-serial-ports changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:14:37	Script - maas-capture-lldpd failed
 Tue, 28 May. 2024 19:14:37	Script result - maas-kernel-cmdline changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:14:37	Script result - maas-get-fruid-api-data changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:14:18	Gathering information
 Tue, 28 May. 2024 19:14:14	HTTP Request - /images/ubuntu/amd64/hwe-22.04/jammy/stable/squashfs
 Tue, 28 May. 2024 19:13:27	HTTP Request - /images/ubuntu/amd64/hwe-22.04/jammy/stable/boot-initrd
 Tue, 28 May. 2024 19:13:26	TFTP Request - /grub/x86_64-efi/fs.lst
 Tue, 28 May. 2024 19:13:26	TFTP Request - /grub/x86_64-efi/crypto.lst
 Tue, 28 May. 2024 19:13:26	TFTP Request - /grub/x86_64-efi/command.lst
 Tue, 28 May. 2024 19:13:26	TFTP Request - /grub/x86_64-efi/terminal.lst
 Tue, 28 May. 2024 19:13:26	TFTP Request - /grub/grub.cfg
 Tue, 28 May. 2024 19:13:26	TFTP Request - /grub/grub.cfg-7c:c2:55:79:e7:88
 Tue, 28 May. 2024 19:13:26	PXE Request - commissioning
 Tue, 28 May. 2024 19:13:26	Performing PXE boot
 Tue, 28 May. 2024 19:13:26	HTTP Request - /images/ubuntu/amd64/hwe-22.04/jammy/stable/boot-kernel
 Tue, 28 May. 2024 19:13:25	TFTP Request - bootx64.efi
 Tue, 28 May. 2024 19:13:25	TFTP Request - bootx64.efi
 Tue, 28 May. 2024 19:13:25	TFTP Request - grubx64.efi
 Tue, 28 May. 2024 19:04:15	Script result - 50-maas-01-commissioning changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:04:15	Script result - zzz-set-bios-options changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:04:13	Script result - 41-debug-everything changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:04:13	Script result - 50-maas-01-commissioning changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:04:12	Script result - 30-maas-01-bmc-config changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:04:12	Script result - 40-maas-01-machine-config-hints changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:04:12	Script result - 40-maas-01-machine-config-hints changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:04:12	Script result - 41-debug-everything changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:04:10	Script result - 30-maas-01-bmc-config changed status from 'Installing dependencies' to 'Running'
 Tue, 28 May. 2024 19:04:06	Script result - 20-maas-03-machine-resources changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:04:06	Script result - 30-maas-01-bmc-config changed status from 'Pending' to 'Installing dependencies'
 Tue, 28 May. 2024 19:04:04	Script result - 20-maas-02-dhcp-unconfigured-ifaces changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:04:04	Script result - 20-maas-03-machine-resources changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:03:57	Script result - 20-maas-01-install-lldpd changed status from 'Installing dependencies' to 'Running'
 Tue, 28 May. 2024 19:03:57	Script result - 20-maas-01-install-lldpd changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 19:03:57	Script result - 20-maas-02-dhcp-unconfigured-ifaces changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 19:03:51	Script result - 20-maas-01-install-lldpd changed status from 'Pending' to 'Installing dependencies'
 Tue, 28 May. 2024 19:03:50	Node commissioning - 'cloudinit' running config-reset_rmc with frequency once-per-instance

here you can see that it fails in lldp

Traceback (most recent call last):
  File "/tmp/user_data.sh.kENMfO/scripts/commissioning/maas-capture-lldpd", line 52, in <module>
    lldpd_capture("/var/run/lldpd.socket", 60)
  File "/tmp/user_data.sh.kENMfO/scripts/commissioning/maas-capture-lldpd", line 40, in lldpd_capture
    time_ref = getmtime(reference_file)
  File "/usr/lib/python3.10/genericpath.py", line 55, in getmtime
    return os.stat(filename).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/lldpd.socket'

I’m pretty sure this is because after the reboot it no longer has lldp installed as this is installed into the ramdisk before the reboot by “20-maas-01-install-lldpd” and not repeated as maas thinks that script has already run

I had similar issues with the bios modification script as the way I first wrote it requires data provided by 30-maas-01-bmc-config to query the redfish api. Now I simply assume it has completed if this path is specified and not found (not ideal). To make this work I had to add a sleep 1800 command after the reboot so that the script doesn’t complete before the reboot has happened. Without this random maas-* scripts fail as it starts the scripts after the reboot command has issued but before the node actually reboots in a similar way to having parallel not set to disabled.

I thought that setting recommission: true would fix this but it does not. I thought this would recommission the node from the start on the next reboot, which is what I want.

If I change recommission to true it does exactly the same thing in the logs because LLDP still fails but then all of the earlier scripts go to aborted state which makes me think if I could make this run AFTER the maas-* scripts I’d be fine, this is why I tried prefixing with zzz as well as 51.
Alas, no joy.
here is the log

 Tue, 28 May. 2024 21:25:50	Script result - maas-lshw changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:25:50	Failed commissioning
 Tue, 28 May. 2024 21:25:50	Node changed status - From 'Commissioning' to 'Failed commissioning'
 Tue, 28 May. 2024 21:25:48	Script result - maas-support-info changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:25:47	Script result - maas-serial-ports changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 21:25:47	Script result - maas-kernel-cmdline changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 21:25:47	Script - maas-capture-lldpd failed
 Tue, 28 May. 2024 21:25:47	Script result - maas-kernel-cmdline changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:25:47	Script result - maas-support-info changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 21:25:47	Script result - maas-list-modaliases changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:25:47	Script result - maas-serial-ports changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:25:46	Script result - zzz-set-bios-options changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:25:46	Script result - maas-lshw changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 21:25:46	Script result - maas-list-modaliases changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 21:25:46	Script result - maas-get-fruid-api-data changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 21:25:46	Script result - maas-capture-lldpd changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 21:25:46	Script result - maas-get-fruid-api-data changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:25:30	Gathering information
 Tue, 28 May. 2024 21:24:38	HTTP Request - /images/ubuntu/amd64/hwe-22.04/jammy/stable/boot-initrd
 Tue, 28 May. 2024 21:24:37	TFTP Request - /grub/x86_64-efi/crypto.lst
 Tue, 28 May. 2024 21:24:37	TFTP Request - /grub/x86_64-efi/command.lst
 Tue, 28 May. 2024 21:24:37	TFTP Request - /grub/x86_64-efi/fs.lst
 Tue, 28 May. 2024 21:24:37	TFTP Request - /grub/grub.cfg
 Tue, 28 May. 2024 21:24:37	TFTP Request - /grub/x86_64-efi/terminal.lst
 Tue, 28 May. 2024 21:24:37	TFTP Request - /grub/grub.cfg-7c:c2:55:79:e7:88
 Tue, 28 May. 2024 21:24:37	PXE Request - commissioning
 Tue, 28 May. 2024 21:24:37	Performing PXE boot
 Tue, 28 May. 2024 21:24:37	HTTP Request - /images/ubuntu/amd64/hwe-22.04/jammy/stable/boot-kernel
 Tue, 28 May. 2024 21:24:36	TFTP Request - bootx64.efi
 Tue, 28 May. 2024 21:24:36	TFTP Request - grubx64.efi
 Tue, 28 May. 2024 21:24:35	TFTP Request - bootx64.efi
 Tue, 28 May. 2024 21:15:24	Script result - 50-maas-01-commissioning changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:15:24	Script result - zzz-set-bios-options changed status from 'Pending' to 'Running'
 Tue, 28 May. 2024 21:15:22	Script result - 41-debug-everything changed status from 'Running' to 'Passed'
 Tue, 28 May. 2024 21:15:22	Script result - 50-maas-01-commissioning changed status from 'Pending' to 'Running'

and the commission status

20-maas-01-install-lldpd
node
Aborted
Tue, 28 May. 2024 21:25:50

20-maas-02-dhcp-unconfigured-ifaces
node
Aborted
Tue, 28 May. 2024 21:25:50

20-maas-03-machine-resources
deploy-info, node
Aborted
Tue, 28 May. 2024 21:25:50

30-maas-01-bmc-config
bmc-config, node
Aborted
Tue, 28 May. 2024 21:25:50

40-maas-01-machine-config-hints
node
Aborted
Tue, 28 May. 2024 21:25:50

41-debug-everything
node
Passed
Tue, 28 May. 2024 21:15:22
0:00:00

50-maas-01-commissioning
deploy-info, node
Aborted
Tue, 28 May. 2024 21:25:50

maas-capture-lldpd
node
Failed
Tue, 28 May. 2024 21:25:47
0:00:00

maas-get-fruid-api-data
node
Passed
Tue, 28 May. 2024 21:25:46
0:00:00

maas-kernel-cmdline
node
Passed
Tue, 28 May. 2024 21:25:47
0:00:00

maas-list-modaliases
deploy-info, node
Passed
Tue, 28 May. 2024 21:25:47
0:00:00

maas-lshw
deploy-info, node
Passed
Tue, 28 May. 2024 21:25:50
0:00:03

maas-serial-ports
deploy-info, node
Passed
Tue, 28 May. 2024 21:25:47
0:00:00

maas-support-info
deploy-info, node
Passed
Tue, 28 May. 2024 21:25:48
0:00:01

zzz-nscale-set-bios-options
node
Passed
Tue, 28 May. 2024 21:25:46
0:10:22

Does anyone have any suggestions? I can be the only one to have this issue.

I fixed this in the end my moving this to 05 and keeping parallel disabled and moving lots of features into the script from “outside”

Here are my notes so that they are accessible to my future self when I have forgotten all this.

MAAS commissioning scripts that reboot must:

  • run before all standard MAAS scripts
  • not run in parallel or they will break the scripts running in parallel.
  • not terminate before the reboot has occurred (use a sleep command to prevent this)
  • create their own BMC users via ipmitool etc if required to access features like redfish api.
    • I created a random pw and store it in a file similar to the 30-maas-01-bmc-config script but one that runs every reboot not just if the user is not existing. It also overwrites the user on the 2nd boot with a different random pw so we can check the BIOS settings are as we requested.
    • I disable it at the end of the script too (this requires ipmitool being reinstalled after the reboot).
  • not use packages metadata to install dependencies as the dependencies will not be installed post reboot. if you need dependencies after the reboot like to check that a change has been made then then the commissioning script must apt install or curl it when it runs so that it reinstalls it on the 2nd boot.
  • not rely on the output of any prior commissioning scripts for anything that happens after the reboot. Any required data files needed to be recreated by the rebooting script on the 2nd run.
  • run before any other commissioning script that creates output that is used by other commissioning scripts that run after the reboot.
2 Likes

Thank you very much for sharing this @antony-cleave !

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.