Unable to Deploy Centos on Dell Boss N-1 card

mistryyy · 20 April 2023 15:35

Hello!
I am trying to deploy centos image on dell boss N-1 card that’s in Raid 1. Ubuntu 18 and 20 deployment works fine but when it comes to any centos image, it fails right before the reboot with the following error.

        finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/install-grub: FAIL: installing grub to target devices
    finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/configuring-bootloader: FAIL: configuring target system bootloader
    finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: FAIL: curtin command curthooks
    Traceback (most recent call last):
      File "/curtin/curtin/commands/main.py", line 202, in main
        ret = args.func(args)
      File "/curtin/curtin/commands/curthooks.py", line 1886, in curthooks
        builtin_curthooks(cfg, target, state)
      File "/curtin/curtin/commands/curthooks.py", line 1851, in builtin_curthooks
        setup_grub(cfg, target, osfamily=osfamily,
      File "/curtin/curtin/commands/curthooks.py", line 804, in setup_grub
        install_grub(instdevs, target, uefi=uefi_bootable, grubcfg=grubcfg)
      File "/curtin/curtin/commands/install_grub.py", line 401, in install_grub
        in_chroot.subp(cmd, env=env, capture=True)
      File "/curtin/curtin/util.py", line 787, in subp
        return subp(*args, **kwargs)
      File "/curtin/curtin/util.py", line 275, in subp
        return _subp(*args, **kwargs)
      File "/curtin/curtin/util.py", line 139, in _subp
        raise ProcessExecutionError(stdout=out, stderr=err,
    curtin.util.ProcessExecutionError: Unexpected error while running command.
    Command: ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmp49r58y9m/target', 'efibootmgr', '--create', '--write-signature', '--label', 'centos', '--disk', '/dev/nvme2n1', '--part', '1', '--loader', '/EFI/centos/shimx64.efi']
    Exit code: 5
    Reason: -
    Stdout: ''
    Stderr: Could not prepare Boot variable: No such file or directory
            
    Unexpected error while running command.
    Command: ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmp49r58y9m/target', 'efibootmgr', '--create', '--write-signature', '--label', 'centos', '--disk', '/dev/nvme2n1', '--part', '1', '--loader', '/EFI/centos/shimx64.efi']
    Exit code: 5
    Reason: -
    Stdout: ''
    Stderr: Could not prepare Boot variable: No such file or directory

any insight on this is appreciated!

Thank you!

alexsander-souza · 20 April 2023 15:58

Hi @mistryyy,

What CentOS version?
can you show us the full Curtin logs? This error is about efibootmgr not finding the boot partition, so we need to check the partitioning output.

mistryyy · 20 April 2023 17:01

hi @alexsander-souza
Thank you for responding.
I am trying to install centos7, see below for the installation log.
install.log

alexsander-souza · 20 April 2023 19:43

The partitioning looks OK, but there are a few unexpected errors while extracting the image:
tar: setxattrat: Cannot set 'security.selinux' extended attribute for file ...

Did you use packer-maas to build this CentOS 7 image?

mistryyy · 20 April 2023 19:59

I am using the built in centos 7 image from maas.

mistryyy · 26 April 2023 20:59

Hi @alexsander-souza,
I tried setting the kernel parameters in the setting > kernel parameter to nvme_core.multipath=N and the machine gets to reboot portion but it’s now stuck at centos dracut cli. When there are no kernel parameters, it just fails and gets stuck at ephemeral. Any suggestions? Thanks!

alexsander-souza · 26 April 2023 22:36

my feeling is that this hardware is too “modern” for CentOS 7. Curtin is running the CentOS-supplied efibootmgr to setup the EFI boot, and this is failing for some reason. The fact that more recent Ubuntu versions work also suggests this.

I would try to run the CentOS installer in this machine manually, and check what kind of trickery is needed. An alternative is to check with the CentOS community if anyone has managed to make this work.

mistryyy · 4 May 2023 20:36

Hi @alexsander-souza,

Thanks for the response.

I have the same issue for other OS as well, including Rhel and Rocky linux. Only Ubuntu 18, 20 and 22 seems to work.

it looks like commissioning machine with global kernel parameters in settings > Kernel Parameters -> “nvme_core.multipath=N” and then trying to deploy centos, gets me to dracut prompt, and once i remove the above kernel parameters during the reboot by hitting “e” in the grub config the install goes through and the machine gets deployed. However that parameter Breaks Ubuntu install, so I can’t use that for centos deployment.
Thanks!

jhusakowski · 10 May 2023 10:40

Hi @mistryyy,

Have you considered specifying kernel options via tags instead of globally? https://maas.io/docs/how-to-customise-machines#heading--create-tags-with-built-in-kernel-options

It seems that you could specify the required kernel parameters in tag options and tag the CentOS machines appropriately.

agrebennikov · 16 May 2023 17:46

@jhusakowski I don’t believe this is going to help even if this is working solution - the machines represent common pool and that’s not known upfront which OS the particular machine is going to be provisioned with. It rather should be something like “assign the tag to the image”.

agrebennikov · 16 May 2023 18:01

small update - indeed if I set the nvme_core.multipath to 0 in the tag and assign to the machine, the node is deployed but it falls into “dracut-initqueue timeout” during the boot. And the machine has to get the nvme_core.multipath kernel param removed in order to boot.

Is this maybe something to do with the particular version of the efibootmgr that we package into the centos7 image?

billwear · 17 May 2023 17:26

I’m just a Technical Author, so take my observations with the appropriate spices, but based on I see here, there might be a few things to consider:

nvme_core.multipath Kernel Parameter: From your descriptions, it seems like the nvme_core.multipath kernel parameter is causing issues during the boot process. This kernel parameter controls the multipath feature for NVMe devices, with N being the feature toggle (0 = off, 1 = on). From your experiment, it looks like this option might be causing the problems when set to 1.
Compatibility Issue with CentOS 7: As pointed out by alexsander-souza, this issue could be related to the hardware being “too modern” for CentOS 7, which means that CentOS 7 might not have the necessary drivers or support for your hardware. You mention having problems with RHEL, Rocky, and so on – everything except Ubuntu. You can see the drivers currently in use by the system with the lsmod command. This command shows the modules currently loaded into your kernel, and if you want specific information on one or more drivers, you can isolate them with modinfo module_name.
Possible efibootmgr Version Issue: agrebennikov’s suggestion that this could be related to the version of efibootmgr packaged in the CentOS 7 image is a possibility. The efibootmgr utility is used to manipulate the EFI Boot Manager, and different versions might have different behaviors or bugs.

Here’s what I’d try:

Try a more recent version of CentOS (like CentOS 8 or CentOS Stream) if it’s possible. This may help overcome potential incompatibility issues between your hardware and the OS. Sounds like you might have done this, or at least done this by proxy, trying Rocky, et al. YMMV.
Investigate further on the usage of the nvme_core.multipath kernel parameter. It might be that the RAID controller or NVMe drives you’re using requires a specific configuration or driver that’s not included in the CentOS 7 image.
Lastly, you may need to consider customizing your CentOS 7 image to include the specific drivers or software necessary for your hardware, or to use a different version of efibootmgr.

I’m not sure that these suggestions will actually solve your problem, but I think they could help isolate the issue. Again, what do I know, I’m just the writer guy, so take it FWIW.