MAAS 3.5.3 unable to deploy Rocky 8.10 to bare metal

Currently running MAAS 3.5.3 and got some new hardware in. I have even gone so far as to recreate my previously working Rocky 8.10 image with the latest version of packer-maas to no avail.

I can deploy in memory just fine as well as a VM. I can also install the CentOS 7 and 8 cloud images with no issues. I have a custom CentOS 7.4 that also deploys. My RHEL 8.8, Rocky 8.8, and Rocky 8.10 do not.

Here is the tail of the install log, happy to upload more if it helps.

        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/install-grub: FAIL: installing grub to target devices
        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/configuring-bootloader: FAIL: configuring target system bootloader
        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: FAIL: curtin command curthooks
        Traceback (most recent call last):
          File "/curtin/curtin/commands/main.py", line 202, in main
            ret = args.func(args)
          File "/curtin/curtin/commands/curthooks.py", line 1918, in curthooks
            builtin_curthooks(cfg, target, state)
          File "/curtin/curtin/commands/curthooks.py", line 1883, in builtin_curthooks
            setup_grub(cfg, target, osfamily=osfamily,
          File "/curtin/curtin/commands/curthooks.py", line 821, in setup_grub
            install_grub(instdevs, target, uefi=uefi_bootable, grubcfg=grubcfg)
          File "/curtin/curtin/commands/install_grub.py", line 444, in install_grub
            in_chroot.subp(cmd, env=env, capture=True)
          File "/curtin/curtin/util.py", line 792, in subp
            return subp(*args, **kwargs)
          File "/curtin/curtin/util.py", line 280, in subp
            return _subp(*args, **kwargs)
          File "/curtin/curtin/util.py", line 144, in _subp
            raise ProcessExecutionError(stdout=out, stderr=err,
        curtin.util.ProcessExecutionError: Unexpected error while running command.
        Command: ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpsqh5v72m/target', 'efibootmgr', '--create', '--write-signature', '--label', 'rocky', '--disk', '/dev/nvme0n1', '--part', '1', '--loader', '/EFI/rocky/shimx64.efi']
        Exit code: 5
        Reason: -
        Stdout: ''
        Stderr: Could not prepare Boot variable: No such file or directory
                
        Unexpected error while running command.
        Command: ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpsqh5v72m/target', 'efibootmgr', '--create', '--write-signature', '--label', 'rocky', '--disk', '/dev/nvme0n1', '--part', '1', '--loader', '/EFI/rocky/shimx64.efi']
        Exit code: 5
        Reason: -
        Stdout: ''
        Stderr: Could not prepare Boot variable: No such file or directory```

Hi,

Does this help Unable to Deploy Centos on Dell Boss N-1 card - #9 by jhusakowski ?

That did seem to work, although I really don’t want to go in and manually edit grub to boot that first time.

Would there be a way for a post script to edit that grub config to remove the kernel flags?

Is this worth a bug post as I don’t think its much of a work around when needing to deploy 5k machines.

What if you use tags in order to specify the kernel parameters to pass to the machines? Would that work for you?

I can use the tag for the initial install, but the boot after fails until you remove the flag. I’m not following how to use the flag only for the initial provisioning yet remove it before it reboots.

Right, I suspect the only workaround at the moment is to manually patch the deb/snap with the grub config you need to craft

That sounds like a bug report then… I didn’t see any open issues on it and not sure why no one else isn’t having this issue.

Also, what’s the command you are using to upload the rocky custom image to MAAS?

maas admin boot-resources create name=‘custom/rocky810’ title=‘Rocky Linux 8.10’ architecture=‘amd64/generic’ base_image=‘rhel/8’ filetype=‘tgz’ content@=‘rocky8.tar.gz’

Same one I’ve been using to deploy Rocky 8.6 and 8.8 to our compute hosts for the past year. :slight_smile:

It’s only these new Dell servers with the “BOSS N-1” cards that are having the issues.