I’m running into inconsistently successful commissions/deploys against rack controllers that host a large number of subnets (they have a lot of interfaces). The rack controller itself is practically IDLE, however PXEboots fail occasionally with netconn_connect error -10 and during boot, Hitting enter will usually continue the PXEboot failure and succeed. However during boot ocasionally connecting to the MAAS metadata endpoint seems to hang and eventually fail out a lot more than expected. During these the rack controller is only service requests for a single subnet. The machines being provisioned are using HA DHCP helper to the HA rack controller(s) for that subnet.
netconn_error
It’ll eventuall end up at an “ubuntu login” prompt, and since it doesn’t show it’s hostname as declared in maas, this teels me the metadata fetch failed…
The only solution is to reset the instance and hope it works…
The issue was a lack of DNS setup for the subnet where this machine was. It resulted in the machine booting but unable to resolve anything and eventually hanging up.
@dandruczyk, thanks for the response. yep, DHCP and DNS will sometimes blitz even experts like yourself. ask me about my 196 machines blowing up DHCP because I forgot about DHCPv6…