I have the exact same error, however, my MAAS version is 2.4.2.
I’m seeing this with MAAS 2.6 Beta 4 (installed via apt) but only on one box (a Dell R900).
I ended up replacing my 10g nic cards.
I guess this might be to do with the card in this box (bnx2) being unsupported by iPXE (although i thought MAAS doesn’t actually use iPXE, rather it uses Pxelinux, no?)
I wonder if this could be somewhat related to Pxelinux, I am able to successfully netboot the troublesome box using a separate PXE server running Pxelinux version 4.06. Apparently after versions >5.0 some file dependencies are introduced for pxelinux.0 and the c32 modules (and they are arch specific also since version 6.0).
Noseying around in /var/lib/maas/boot-resources/current/bootloader/ there does seem to be several version of the boot resources but they differ from the docs (or at least appear to).
This may be something that’s already handled within MAAS code but I haven’t delved in to see. I’m posting this here in case it hadn’t been considered
after some more digging it does seem like the root of the problem is in the bnx2 NIC support (or lack thereof)…
Would either of the following work?:
1: With my (working) external PXE server, set MAAS as next-server for commissioning and deployment, or does the machine need to PXE boot from MAAS directly?
2: Could I replace the pxelinux.0 that MAAS uses with my older version, or with a new version that has the bnx2 driver compiled in? (And probably replace the initrd also)?
Boot from CD to MaaS
I’ve the same issue…my ver is 2.5.3. I’ve made the same lab on another server with the 2.4.X and it works fine!!!
I’ve tried to make the upgrade from 2.5 to 2.6 as suggested here, and that issue has been resolved.
But to make up on my lab I’ll wait that the 2.6 go to ppa stable.
Used the stable repository the issue has not been fixed, it’s still present.
2.6 hasn’t fixed it for me. I think in my case the issue is the fw for the bnx2 is not distributable as it’s under a proprietary license.
I found an old article describing a workaround here to add the bnx2 firmware into the initrd for pxebooting to work. It’s not clear to me which initrd is used by Maas for pxeboot. is it /var/lib/maas/boot-resources/current/boot-initrd? Would this approach still work although the advice is 10 years old?
Here’s how my boot looks right now:
More head scratching here. I managed to get a second NIC installed into 1 box to test, (a modern intel NIC) and I see same issue still.
Right, I have one of these boxes in front of me now, and am able to reproduce this issue 100%. It seems like some other network issue may be causing this, as I’ve installed a test MAAS region/rack to bare metal on a simple LAN, and the issue still occurs as stated above regardless of if I’m attempting to boot from the bnx2 card or the test second NIC I’ve installed (an Intel 82574L card). I guess next steps are to run some Wireshark captures and see what’s actually happening.
I wasn’t able to glean much from Wireshark. 2.4.x successfully commissions/deploys the servers however.
I need pod deployment for those machines to complete my environment buildout. Is it possible to restore a 2.5.x package to a ppa please? I assume some iteration on 2.6 is in the works to improve the compatibility regression that was introduced, but I’d really appreciate an interim solution that I might get from a 2.5.x version.
So MAAS 2.4 vs MAAS 2.5/2.6 has differences in that 2.4 fully uses TFTP and in MAAS 2.5/2.6 the PXE process for legacy (which seems to be the case here) leverages HTTP boot. That means that lpxelinux.0 gets downloaded over TFTP and the initrd/kernel do so over HTTP.
This was introduced in MAAS 2.5. What’s interesting here is that MAAS 2.6 has not had any changes wrt to this for legacy systems. The changes that MAAS 2.6 has introduced all relate to:
- KVM VMs will now use iPXE
- For EFI machines and arm64 with older firmware, we do something similar in which grub is downloaded over TFTP and the rest over HTTP.
- For EFI machines with newer firmware, it supports full HTTP boot.
- Changes in the DHCP config to support all of the above.
So I’m surprised this won’t work on 2.6 when it did work on 2.5. While the machine is PXE booting, can you paste the output of rackd.log ?
It’s possible the issue did occur on 2.5 also, but I haven’t been able to verify as 2.5 seems to be unavailable. I was suggesting 2.5 as to me it was the last known good version to my environment.
hi, did you resolve that?
No. Reverting back to 2.4.2 allows me to PXE boot those servers, but at a loss of all the 2.5.x+ functionality.