PXE not working after upgrading to 2.5.0


#1

Hi all,

I installed maas yesterday using default package repos, apt installed v2.4.0.
I configured it and added a server to it for testing, all was working fine.
Later I added the maas ppa to update maas to 2.5.0, after that pxe booting does not work anymore.

I always hangs after loading lpxelinux:

After the Boot failed message nothing happens anymore.
In the maas rackd.log I see this:

2019-02-21 13:18:59 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: ens18.
2019-02-21 13:19:09 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.
2019-02-21 13:28:59 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: ens18.
2019-02-21 13:29:09 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.
2019-02-21 13:29:43 provisioningserver.rackdservices.tftp: [info] lpxelinux.0 requested by 10...91
2019-02-21 13:29:43 provisioningserver.rackdservices.tftp: [info] lpxelinux.0 requested by 10...91

To me it looks like the request from the machine to load ldlinux.c32 is never received by the rack controller.
If I’m not wrong, lpxelinux have to load a config file before, which says what to do before loading more? Unfortunately I didn’t find a place where I could find / check it.

I already tried:

  • rebooting server
  • dpkg-reconfigure the rack controller
  • maas boot-resources import
  • completely removing/deleting node from maas and readded it
  • removing boot resources folder in var/lib/maas and restart server to force redownload / recreation of all boot resources

Maybe someone is able to assist with this problem, thanks in advance.


#2

Further troubleshooting results:

  1. I booted a live cd on the server and connected via tftp to the rackcontroller.
    Looks fine so far (i think):

  2. I compared the logs from before upgrading to 2.5.0 and after upgrading to 2.5.0 and noted:
    v2.4.0:

2019-02-20 13:44:51 provisioningserver.rackdservices.tftp: [info] pxelinux.0 requested by 9c:b6:54:99:12:f0
2019-02-20 13:44:51 provisioningserver.rackdservices.tftp: [info] pxelinux.0 requested by 9c:b6:54:99:12:f0
v2.5.0:
2019-02-21 15:22:50 provisioningserver.rackdservices.tftp: [info] lpxelinux.0 requested by 10...91
2019-02-21 15:22:50 provisioningserver.rackdservices.tftp: [info] lpxelinux.0 requested by 10...91

on 2.4.0 pxelinux.0 is used instead of lpxelinux.0.
I edited the dhcpd.conf in /var/lib/maas to use pxelinux.0 and restarted maas-dhcpd.
The machine used pxelinux.0 but the result was the same as in the image in my last post. After that I reverted the dhcpd.conf back to the original state.
Also you can see that on 2.4.0 it shows a mac address after “requested by”, on 2.5.0 it shows an ip address.
Has this any meaning or is this just a change of the log message?


#3

It looks like only my test machine is affected by this problem.
I turned another machine into pxe booting which was able to boot fine.


#4

I tried a lot more now.
I did a tcpdump on the rack controller and captured the dhcp request and the tftp transfer of lpxelinux.0.
I was also able to verify the hashes of lpxelinux.0 file on the disk of the rack controller and the udp datagrams of the tftp transfer.
Interesting is also that there is no tftp request received for ldlinux.c32 on the rack controller.
The server just downloads lpxelinux.0 and nothing happens anymore.

After many many times of restarting the machine it worked once. However at the next try I had the same problem again.

If I read through my previous findings, I would actually say that the problem is caused by the machine.
However, this does not fit with the fact that the problem has occurred since the update to 2.5.0 and that it worked perfectly before.


#5

I downgraded back to 2.4.2 and it’s working again


#6

Also having issues with some 10gb nic’s on 2.5 when back to to 2.4, to many bugs in 2.5 here’s my orgininal post MAAS 2.5 won’t PXE Qlogic 10G nic HP model# NC523SFP


#7

I have the exact same error, however, my MAAS version is 2.4.2.