I’m experiencing some problem with MAAS 2.8.2 and PXE boot.
The box (hp BL460G6) is pretty old, but seems to work correctly with a standard tftp server and a centos 7.7 image.
In this case I used a pure TFTP deploy, no HTTP at all.
In MAAS this seems to be difficult, at least for me.
Here’s what I noticed:
standard PXE boot implies HTTP after a couple of TFTP transaction, everything get stuck at a random point after few seconds.
looking in /var/lib/maas/boot-resources/current it seems that pxelinux.0 is never used because of sylink to lpxelinux.0 [xenial and bionic beaver]
changing the symlink to point both to pxelinux.0 causes that ldlinux.c32 is not available in pXE boot.
it’s not clear how to have a clear tftp server log even with rackd in debug mode.
So here’s my question:
is there a way to force legacy PXE boot in maas, NO HTTP involved ?
how I can enable the TFTP server log, it looks like a part of rackd
Found these bugs:
This stuff is really similar to my problem… but looks like there’s no solution yet.
MAAS has always strived to be the fastest bare metal installer that exists. We discovered a few years ago that one area of slow down was booting. TFTP is a much slower protocol than HTTP no matter what server you use. We decided the best way to improve performance was by moving as much of the boot process over to HTTP as possible. PXELinux allows us to do that. The system firmware requests the bootloader via TFTP and then PXELinux takes over and uses HTTP for the rest of the process.
There is no supported way to go back to full TFTP. The following would have to be done:
When dhcpd.conf is written the path-prefix shouldn’t be given, it tells PXELinux to get everything over HTTP.
The rendered pxelinux.cfg would need to be modified to use TFTP again.
TFTP/HTTP requests are logged in a couple of places
/var/log/maas/rackd.log
/var/log/maas/http/*.log
As a node event
Two things you could try to fix this:
Update the system’s firmware
Try using lpxelinux from Focal. We currently use all bootloaders from Bioinc and are planning on making this switch soon but havn’t had time to fully test all bootloaders.
Feel free to file a new bug on this. Please include all MAAS logs as well as the hardware you are experiencing this problem with.
changing the whole dhcpd.conf without any path prefix didn’t allow me to have a full tftp boot; probably my fault that I’m not that in dhcp snippets.
The nic and box firmware is uptodate
focal is not working either
So, I’m getting a bit curious if is there a way to make these old box working with maas 2.8.2.
Seems that the problem can be the same on g7 and g8, for us this mean more or less 100 blades, this can really compromise MAAS future in our infrastructure…
I had many PXE issues after the HTTP switchover happened with MAAS initially, and I am also using some older hardware in my racks. However I was able to pretty much overcome all of them by changing the network configuration to be simpler and more in line with what MAAS would expect. I think TFTP based PXE is less fussy than HTTP, and making small changes like having a rack controller on the same L2 space as the racks in question was a big step towards resolving this myself. I am afraid I don’t recall specifics but I hope you can take some assurance that there’s likely a solution to the issue you’re facing by reworking your deployment placements.
In your case (and with a healthy dose of willingness-to-experiment - even if it’s not your “ideal”), the major deviations would likely be:
Ensure that the enlist-able machine(s) you want to commission are, in fact, set to boot legacy BIOS
Modify the dnsmasq.conf entries to map to each enlist-able machine’s architecture (RFC4578) (see this comment for background/more info; you’re likely fine with arch,7 and arch,9)
Modify the dnsmasq.conf entry for dhcp-boot to be lpxelinux.0 (the “legacy” version of PXE boot loader - i.e. non-UEFI version)
That said, if your enlist-able machine(s) are capable of supporting UEFI, definitely give that a go since things are made much smoother (and faster) by doing so - especially if you modify the UEFI-BIOS setting for boot order to be PXE HTTP. That said, you mentioned them being “old” machines, so I outlined the above with their likely lack of UEFI support in mind.
As for logging (namely to watch MAAS attempt to PXE-boot your enlist-able machines), simply log in to your MAAS host (via SSH) and then follow along:
Broadcom 57711
HP Proliant BL460c G6 servers with Broadcom BCM 57711 10Gbit NICs were reported to have an issue with gpxelinux.0 (gPXE + PXELINUX) as of v4.02. The workaround (implemented in v4.04 as gpxe/gpxelinuxk.0; emphasis on the single “k”) was to:
Using 18.04 LTS I can commission and deploy Centos 8.1 using MAAS 2.8.2
16.04 LTS goes kernel panic during commissioning
20.04 LTS can commission but loops forever during deploy (maybe I need to test it again… looks odd)
Basically I changed the bootloader to support that odd card (Broadcom BCM 57711) and then switched back to iPXE MAAS workflow.
Does this make sense ?
Does anyone took the idea of supporting “undionly” pxe device ?
The machines are IPMI LAN2.0 using legacy boot option.