UEFI boot issues

I have a Dell R630 server in UEFI boot mode that can’t boot from the local disks after it’s deployed through MAAS.

The boot configuration seems fine:

# efibootmgr -v
BootCurrent: 0003
BootOrder: 0003,0000,0005
Boot0000* ubuntu	HD(1,GPT,6fcd020b-7eeb-43b0-9070-c366c49ddde9,0x800,0x100000)/File(\EFI\ubuntu\shimx64.efi)
Boot0003* Integrated NIC 1 Port 3 Partition 1	VenHw(56e94a54-7c81-443a-bb9f-c0d240845f54)
Boot0005* Integrated NIC 1 Port 4 Partition 1	VenHw(d5a9b8fe-4303-4bda-a6fc-1aca23b5a2ed)

# blkid | grep sda1
/dev/sda1: LABEL_FATBOOT="efi" LABEL="efi" UUID="1E39-FB3F" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="6fcd020b-7eeb-43b0-9070-c366c49ddde9"

# fdisk /dev/sda
Command (m for help): i
Partition number (1,2, default 2): 1

         Device: /dev/sda1
          Start: 2048
            End: 1050623
        Sectors: 1048576
           Size: 512M
           Type: EFI System
      Type-UUID: C12A7328-F81F-11D2-BA4B-00A0C93EC93B
           UUID: 6FCD020B-7EEB-43B0-9070-C366C49DDDE9

The files it complains about do exist:

# ls -l /boot/efi/EFI/ubuntu/
total 4332
-rwxr-xr-x 1 root root     108 Mar  9 20:11 BOOTX64.CSV
-rwxr-xr-x 1 root root     126 Mar  9 20:11 grub.cfg
-rwxr-xr-x 1 root root 2598792 Mar  9 20:11 grubx64.efi
-rwxr-xr-x 1 root root  860824 Mar  9 20:11 mmx64.efi
-rwxr-xr-x 1 root root  960472 Mar  9 20:11 shimx64.efi

Does anyone have any advice for how to get it to boot from the local disk? I wouldn’t like to depend on the MAAS controller to be up at all times for the servers to boot properly after they are deployed.

Hi @gtirloni ,

What Maas version are you using? MAAS requires all the machines it manages to always boot from the first nic, then it’s Maas itself that can instruct the machine to boot from the first disk. However, if the racks or the controller are not working it’s not a big issue for you as the machine will timeout to pxe and will boot from the first disk in any case.

However, I suspect there are some misconfiguration in UEFI/bios boot mode in your case. That’s the first thing I would double check

I tried every imaginable option available in the BIOS/UEFI settings but nothing helped. Even tried SecureBoot on/off.

The UEFI settings appear to be correct (from the output of efibootmgr).

Since MAAS is able to boot this machine correct, I wonder what is locally wrong but I’m out of ideas for what to check next.

When I enter the BIOS and try the option to “Boot from file”, it says “No filesystems found”. There are 2 SATA disks in a RAID-1 volume. Mounting that volume shows the contents are correct in the /boot/efi filesystem (and it’s correctly marked as being EFI type).

I think I should try to install Ubuntu without MAAS and see what happens.

Hardware raid or software raid?

I have tried both: a RAID-1 volume in the PERC controller and software raid directly in Linux. Both install fine, none boots successfully on its own.

did you recommission the machine after you changed the settings in the controller?

yes, I have also erased the disks, commissioned and redeployed multiple times.

Is there any place I could check besides efibootmgr/fdisk/lsblk to check what could be making the UEFI firmware to look in the wrong place? At this point I’m thinking this could actually be a bug in the Dell firmware but…

This machine was running Windows Server before (AFAIK), I wonder if there is some leftover configs somewhere but I have also reset the BIOS to factory defaults and that didn’t help at all.

Do you have an h730m as raid controller? have you tried to remove one disk, configure the other one with raid 0 (use AHCI mode), recommission the machine and deploy?

It’s a H730p. I tried the single disk RAID-0 but it didn’t help.

The server sometimes gets stuck in “Initializing firmware interfaces” for 5 minutes. When that happens (often), the RAID controller shows the 2 boot disks as failed.

I suspect some timeout is happening and causing UEFI to fail with that very misleading error about not finding the boot files.

I switched the controller to HBA mode once again, cleared all the configuration and did the whole release/commision/deploy dance. Again, same error as with the RAID-1 volume… and disks in failed state.

So I think this narrows it down to bad disk controller and/or disks. I have been saying “a server” but the issue is actually happening in a few servers. I redeployed all of them and now I’m checking the ones that are failing with this error. All with the same RAID controller and disk models.

The workaround seems to be a cold boot / power cycle. If I’m lucky, it doesn’t stay in “Initializing firmware interfaces” for 5 minutes and then it’s able to boot from the local disk.

Hardware issues are a huge waste of time every time.

Sorry for the noise and thanks for all the suggestions, much appreciated.

Btw I also have an old R430 in my racks but I don’t recall if I have a H730p. In the next days I’ll double check and get back to you (but I never had such issues as far as I remember)

1 Like

Hi @gtirloni ,

I just tried on my R430 (but it has a H730 MINI) with 2 disks in raid 1 and it worked fine (UEFI boot). Did you have any luck so far?

I used ubuntu 22.04 for commissioning and for deployment

Hi @r00ta thanks a lot for checking this and confirming it should work.

We’ve replaced the boot disks and it seems that fixed the issue.

For anyone having this issue in the future, here are the details:

Dell R630 (BIOS 2.18.1)
PERC H730P Mini (firmware
KINGSTON SKC6002 (rev S4500105) - Issues booting (failed state)
KINGSTON SEDC60/450/500/600 - no issues

With the SEDCxx disks, the “Initializating firmware interfaces” takes <5 seconds.

The UEFI firmware will show it can’t find the /efi/... files but the hint is that it’s not complaining about missing files but “no device found”. The mention of “device” should redirect the troubleshooting to the physical disks and not to the filesystem, PXE or boot options. It took me a while to realize this because I was focused on the filenames it was complaining about and sometimes I would reboot the server and the issue wasn’t present… just to happen again in the next reboot.

I have 5% less hair in my head, thanks Kingston and Dell :slight_smile:

Out of curiosity, did you try to install Ubuntu from a USB stick on the faulty disks?

Yes, not locally but through the virtual media mount option (painful). I got lucky and Ubuntu installed because on that particular boot the disks weren’t failed. Next reboot, they failed and Ubuntu couldn’t boot.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.