So I have a problem where sometimes (like 50% of the times) the node gets the IP from MAAS DHCP server, goes into PXE Boot, but never gets the OS, so it’s essentially stuck. There are three options that can happen:
- the node gets the IP and instantly gets the OS from PXE
- the node gets the IP, goes to PXE Boot and after about a minute gets the OS
- the node gets the IP, goes to PXE Boot and never gets the OS
The situations above doesn’t seem to follow any pattern and any of those three can happen for all stages (comissioning, testing, deploying etc). Sometimes manually rebooting the server via iDRAC to PXE helps, but it seems to be another 50/50 situation.
The servers are managed by MAAS via IPMI 2.0 and when the servers are stuck MAAS UI shows stuff like “powering node on” or “power cycling”. So it seems like MAAS never realizes the node is in PXE waiting for OS.
Looking for a suggestion where I should at least look to find the problem.
Servers: PowerEdge M630 (inside PowerEdge M1000e w/iDRAC 8 Ent)
MAAS version: 2.6.1 (7832-g17912cdc9-0ubuntu1~18.04.1) upgraded from 2.4.2