PXE Deployment across tagged subnet failing

Hey folks!

I am having more PXE woes with a box that seems to throw up issues regardless of my efforts and compromises. Here’s the details:
ASUS Prime X470-Pro mobo with Nvme 512Gb boot drive. The onboard Ethernet is an Intel I211-AT. With CSM disabled in the BIOS I am able to enlist, commission and deploy up to the final reboot. At that point I see it post and PXE messages:

>>>Checking Media Presence......
>>>Media Present......
>>>Start PXE over IPv4 on MAC: FO-OO-BA-RR-BA-ZZ.
 Downloading MBP file...
Booting local disk...
Failed to open \efi\boot\grubx64.efi - Not Found
Failed to load image \efi\boot\grubx64.efi: Not Found
start_image() returned not found
error: unknown error.

In the MAAS status for the node, it’s still deploying. The node does actually boot at that point but I see all the curtin actions related to posting events to MAAS failing, and deploy never gets marked as Deployed. This seems odd as presumably the files it’s trying to find should be coming from MAAS, or is this trying to load them from the boot drive? Perhaps there’s a problem with the PXE boot environment not having access to the Nvme? There’s an option in the BIOS for enabling an AMI driver for Nvme, which I’ve tried on/off with no discernable difference.

With CSM enabled, or set to auto, or set to ‘Legacy Only’ in various combinations (I can set UEFI first or legacy only for Net, Disk, USB etc boot individually) I get a different error, my old friend "Invalid MBR Magic, Treating as RAW", which presumably is because the drive gets formatted with GPT when partitioned for install, regardless of if it’s EFI or not. I also tried manually partitioning the drive with a /boot 500MB FAT partition followed by a 510GB ext4 root, that didn’t help.

I’ve stepped through the other threads related to PXE again and I’m pretty much stumped right now. I would welcome any advice onto directions of investigation to pursue next. I’ll be trying a different NIC in there today also.

EDIT: Some more info - MAAS is providing DHCP to the untagged subnet

Hi seffyroff,

we need some information to helps us to identify the issue:

  1. what’s the MAAS version and packaging (deb or snap)?
  2. how is this box connected to the rack controller? (using a switch? routers or proxies?)
  3. did you check maas.log, regiond.log and rackd.log for errors? (these files are located at /var/snap/maas/common/log/ or /var/log/maas/)

Please attach the full POST and PXE output during the deploy.

Hi Alexander, thanks for your reply.

Well, I think I’ve made a woopsie. I thought it must be, as I seemed to be having more trouble than usual, and I’ve been down this road before.

In summary, I got the offending box to deploy successfully, but only from one of the subnets MAAS has DHCP control of(the untagged one). If I attempt to deploy the box into a tagged subnet, the error I posted above happens. I am now looking at my rack controllers config to see what’s going on, it seems like it might be as simple as a subnet routing issue.l

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.