MAAS never realizes the server is in PXE Boot

So I have a problem where sometimes (like 50% of the times) the node gets the IP from MAAS DHCP server, goes into PXE Boot, but never gets the OS, so it’s essentially stuck. There are three options that can happen:

  • the node gets the IP and instantly gets the OS from PXE
  • the node gets the IP, goes to PXE Boot and after about a minute gets the OS
  • the node gets the IP, goes to PXE Boot and never gets the OS

The situations above doesn’t seem to follow any pattern and any of those three can happen for all stages (comissioning, testing, deploying etc). Sometimes manually rebooting the server via iDRAC to PXE helps, but it seems to be another 50/50 situation.

The servers are managed by MAAS via IPMI 2.0 and when the servers are stuck MAAS UI shows stuff like “powering node on” or “power cycling”. So it seems like MAAS never realizes the node is in PXE waiting for OS.

Looking for a suggestion where I should at least look to find the problem.

Servers: PowerEdge M630 (inside PowerEdge M1000e w/iDRAC 8 Ent)
MAAS version: 2.6.1 (7832-g17912cdc9-0ubuntu1~18.04.1) upgraded from 2.4.2

Hey - I was having the same issue ended up replacing all Qlogic 10gb NIC’s. Here’s my original post

No idea why after MaaS 2.4 the qlogic 10gb nic stop working, although nobody from the #MAAS team has taken the courtesy to investigate this issue.