Not handling >1 kernel version correctly

kevin-reeuwijk · 14 October 2024 11:53

I believe I submitted this bug for Ubuntu in the past, maybe that has been fixed since, but I just hit the same issue for RHEL:

Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpydu_muv6/target', 'rpm', '-q', '--queryformat', '%{VERSION}-%{RELEASE}.%{ARCH}', 'kernel'] with allowed return codes [0] (capture=True)
Found kver=5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64
Rebuilding initramfs with: ['dracut', '-f', '/boot/initramfs-5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64.img', '5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64']
Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpydu_muv6/target', 'dracut', '-f', '/boot/initramfs-5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64.img', '5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64'] with allowed return codes [0] (capture=True)
Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
TIMED subp(['udevadm', 'settle']): 0.016
Running command ['mount', '--make-private', '/tmp/tmpydu_muv6/target/sys/firmware/efi/efivars'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpydu_muv6/target/sys/firmware/efi/efivars'] with allowed return codes [0] (capture=False)
Running command ['mount', '--make-private', '/tmp/tmpydu_muv6/target/sys'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpydu_muv6/target/sys'] with allowed return codes [0] (capture=False)
Running command ['mount', '--make-private', '/tmp/tmpydu_muv6/target/run'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpydu_muv6/target/run'] with allowed return codes [0] (capture=False)
Running command ['mount', '--make-private', '/tmp/tmpydu_muv6/target/proc'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpydu_muv6/target/proc'] with allowed return codes [0] (capture=False)
Running command ['mount', '--make-private', '/tmp/tmpydu_muv6/target/dev'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpydu_muv6/target/dev'] with allowed return codes [0] (capture=False)
finish: cmd-install/stage-curthooks/builtin/cmd-curthooks/updating-initramfs-configuration: FAIL: updating initramfs configuration
finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: FAIL: curtin command curthooks
Traceback (most recent call last):
  File "/curtin/curtin/commands/main.py", line 202, in main
    ret = args.func(args)
  File "/curtin/curtin/commands/curthooks.py", line 1918, in curthooks
    builtin_curthooks(cfg, target, state)
  File "/curtin/curtin/commands/curthooks.py", line 1863, in builtin_curthooks
    redhat_update_initramfs(target, cfg)
  File "/curtin/curtin/commands/curthooks.py", line 1696, in redhat_update_initramfs
    in_chroot.subp(dracut_cmd, capture=True)
  File "/curtin/curtin/util.py", line 792, in subp
    return subp(*args, **kwargs)
  File "/curtin/curtin/util.py", line 280, in subp
    return _subp(*args, **kwargs)
  File "/curtin/curtin/util.py", line 144, in _subp
    raise ProcessExecutionError(stdout=out, stderr=err,
curtin.util.ProcessExecutionError: Unexpected error while running command.
Command: ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpydu_muv6/target', 'dracut', '-f', '/boot/initramfs-5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64.img', '5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64']
Exit code: 1
Reason: -
Stdout: /etc/dracut.conf.d/50-curtin-storage.conf:add_dracutmodules+=" lvm"
        
Stderr: 
        dracut: WARNING: <key>+=" <values> ": <values> should have surrounding white spaces!
        dracut: WARNING: This will lead to unwanted side effects! Please fix the configuration file.
        
        dracut: Cannot find module directory /lib/modules/5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64/
        dracut: and --no-kernel was not specified
        
Unexpected error while running command.

For the custom image in question, there happens to be 2 kernel versions installed, due to yum update being run by image-builder and it not having a cleanup of old kernel versions by default. This results in

/lib/modules/5.14.0-427.13.1.el9_4.x86_64
/lib/modules/5.14.0-427.37.1.el9_4.x86_64

Which gets concatenated into one big kver variable of: 5.14.0-427.13.1.el9_4.x86_645.14.0-427.37.1.el9_4.x86_64, resulting in a failed deployment.

Similar to under Ubuntu, it should only select the newest version from /lib/modules and use that, or run dracut for each version separately.

andrew-boatrocker · 10 February 2025 16:44

I’m running into this same bug with MAAS 3.5.3-16341-g.7adb035d6. Is there a fix for it?

r00ta · 10 February 2025 16:50

If you could provide the packer-maas template you are using to reproduce this we can try to investigate and keep track of the bug

andrew-boatrocker · 10 February 2025 19:00

I see that there’s a bug report open for it here:

Bug #2065299 “Find redhat kernel version command gives invalid o…” : Bugs : MAAS

This patch fixes it, but it’s a bit of a hack:

--- /var/lib/snapd/snap/maas/current/usr/lib/python3/dist-packages/curtin/commands/curthooks.py 2024-03-13 19:30:49.000000000 -0400
+++ /var/lib/snapd/snap/maas/fixes/curtin/commands/curthooks.py 2025-02-10 12:16:22.798303797 -0500
@@ -1684,11 +1684,13 @@
     if not redhat_update_dracut_config(target, cfg):
         LOG.debug('Skipping redhat initramfs update, no custom storage config')
         return
-    kver_cmd = ['rpm', '-q', '--queryformat',
-                '%{VERSION}-%{RELEASE}.%{ARCH}', 'kernel']
+    #kver_cmd = ['rpm', '-q', '--queryformat',
+    #            '%{VERSION}-%{RELEASE}.%{ARCH}', 'kernel']
+    kver_cmd = ['ls', '-1t', '/lib/modules']
     with util.ChrootableTarget(target) as in_chroot:
         LOG.debug('Finding redhat kernel version: %s', kver_cmd)
-        kver, _err = in_chroot.subp(kver_cmd, capture=True)
+        kver_lines, _err = in_chroot.subp(kver_cmd, capture=True)
+        kver = kver_lines.split()[0]
         LOG.debug('Found kver=%s' % kver)
         initramfs = '/boot/initramfs-%s.img' % kver
         dracut_cmd = ['dracut', '-f', initramfs, kver]

To get this failure, I added this to our Rocky 9.3 packer-maas template:

  provisioner "shell" {
    inline = [
      "sudo yum -y install https://dl.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/k/kernel-5.14.0-503.23.1.el9_5.x86_64.rpm https://dl.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/k/kernel-tools-libs-5.14.0-503.23.1.el9_5.x86_64.rpm https://dl.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/k/kernel-modules-core-5.14.0-503.23.1.el9_5.x86_64.rpm https://dl.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/k/kernel-core-5.14.0-503.23.1.el9_5.x86_64.rpm https://dl.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/k/kernel-modules-5.14.0-503.23.1.el9_5.x86_64.rpm https://dl.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/k/kernel-tools-5.14.0-503.23.1.el9_5.x86_64.rpm",
      "sudo dracut --force --regenerate-all --verbose",
      "sudo grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg",
      "sudo grub2-mkconfig -o /boot/grub2/grub.cfg",
    ]
  }

This will break as soon as Rocky upgrades their kernels and the URLs are no longer valid, so it’s not something I’d use as a test on your end. (We’ve also got a bunch of Ansible stuff and a heavily modified Kickstart file, so also not a good source for testing on your end.) But hopefully you can see the basic principle: Install an older version of Rocky (Rocky 9.3 in our case), and then install a newer kernel to the image with rpm/yum/dnf.

(We’re having to do this because Autodesk Maya only supports Rocky 9.3, but we get an instant kernel panic after deployment on 172 of our nodes with the Rocky 9.3 kernel. So we have to upgrade the kernel on the image before deployment without upgrading the rest of the install.)