Rocky 9.3 Deployment

Issue:

Automated deployment of Rocky Linux 9.3 using Ubuntu MAAS as the delivery mechanism failed to boot successfully and hung with a blinking cursor. Packer MAAS Rocky 9 templates were used for the build - https://github.com/canonical/packer-maas/blob/main/rocky9/

Investigation:

Step 1:
My initial investigation of the matter led me to edit the grub boot by booting the machine and selecting “e” to edit the grub environment. I removed the “console=ttyS0” to view the console messages for more troubleshooting. The debug console output revealed more information related to disk timeouts and it eventually dropped into the “dracut” emergency console. This allowed me to unpack the issue further.

Step 2:
The “dracut” emergency console and disk timeout messages indicated a timeout to a specific UUID. I conducted an “ ls -l /dev/disk/by-uuid/ “ which revealed a different UUID to the one which was timing out. I copied the UUID, rebooted the system, followed step 1 and replaced the UUID which was timing out with the one found on disk and it booted perfectly fine. This led me to believe that GRUB was not being executed during Curtin once the machine was deployed. However, trolling through the Ubuntu MAAS installation output showed that Grub was indeed updating and should have taken the new UUID into account. This was not the case.

Step 3:
After updating the correct UUID in the GRUB bootloader ( which is temporary for boot ) I managed to boot Rocky 9.3 successfully. So now I know that the updated UUID is not being updated after Ubuntu MAAS deploys the node. Having a glance through the Packer Rocky Linux 9.3 kickstart template, I discovered “ sed -i ‘s/“GRUB_ENABLE_BLSCFG=.*”/“GRUB_ENABLE_BLSCFG=false”/g’ /etc/default/grub “ but on the node which was deployed with Rocky 9.3 the /etc/default/grub had “GRUB_ENABLE_BLSCFG=true”. What is BLS though? More googling is required. BLS is BootLoaderSpec and is a project run by Fedora - https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault which seems to have found its way into the Rocky Linux project. After a few more hours of reading, I discovered that GRUB is no longer responsible for building the boot loading function but instead hands it off to BLS. Does this mean that Curtin is not configured with BLS support? Again, browsing through the Ubuntu MAAS node installation output, there was no mention of BLS in the logs so I assume Curtin, which is responsible for preseeding installation and configuration of the node, does not support BLS. I assume the reason why they added “GRUB_ENABLE_BLSCFG=false” was that they didn’t want to deal with the complexity or that Curtin did not support it. Something is overriding that variable to “true” and forcing the use of BLS in recent versions of Rocky, Fedora, and Redhat.

Step 4:
Hold on, surely Grub2 should call BLS to update the bootloader? When I execute “ grub2-mkconfig -o /etc/grub2.cfg “ it does not update the BLS entries located at “/boot/loader/entries/”. Could this be a bug? Just so happened that I had a look at the syntax entries for “ grub2-mkconfig --help “ and in there, it mentions the addition of “ --update-bls-cmdline ”. So then I proceeded to run “grub2-mkconfig --update-bls-cmdline -o /etc/grub2.cfg“ which updated the BLS entries. Great. However, this doesn’t solve my problem for Curtin. Curtin is responsible for the configuration.

Solution:

Ubuntu MAAS has a feature called “Curtin” and “Cloud-init” which is responsible for the configuration of a node during OS deployment and cloud-init for configuration post-OS deployment. The preseed file called “ curtin_userdata “ executes commands as various points of the OS deployment, eg: early and late commands.

As a workaround to get Rocky 9.3 deployed and running, I updated the Ubuntu MAAS preseed curtin_userdata file with a late command entry
“ grub2_mkconfig: ["curtin", "in-target", "--", "sh", "-c", "grub2-mkconfig --update-bls-cmdline -o /boot/grub2/grub.cfg"] “

Suggestions:

Cheers
Tim

1 Like

Hi, thank you for spending time to get to the root of this issue.

I’ve created a bug in packer-maas to track this: https://github.com/canonical/packer-maas/issues/191