I have a mixed ipv4 - ipv6 environment.
All ipv4 hosts work correctly.
Booting ipv6 hosts, they get their address over DHCPv6 and their lease is saved in /var/lib/maas/dhcp/dhcpd6.leases
However, even when commissioned, if I deploy any of them they boot as maas-enlisting-node and commission again while the UI is stuck on performing PXE boot
How can I debug this? What’s the matching criterium that MAAS uses to determine which host is booting?
I can see that once a host is Ready, if I set one of it’s interfaces to get their address over DHCPv6, /var/lib/maas/dhcpd6.conf shows an extra section with the host’s MAC address with no mention of DUID’s.
Moreover the server is currently booting with DUID-UUID. Even if I set DUID-LLT in the UEFI settings the behaviour is unchanged.
I don’t see any relevant logs in rackd.log or regiond.log
The ipv4 and ipv6 subnets are on different VLANs and shouldn’t interfere.
The issue you mentioned is now sorted because i’m using the custom kernel parameters to bypass it.
I appreciate your patch : ), thank you
I’m now having issues not with booting hosts, but with MAAS not recognizing them.
It seems like either my host is providing a wrong DUID to the dhcpv6 server or something goes wrong and MAAS always thinks it’s a new node when I commission / deploy it, until it reads the serial in the enlistment scripts. At that point the host shuts down and nothing else happens. Deployment times out.
I’m trying to figure out how MAAS decides if the node is enlisting or doing anything else, some mailing list messages lead me to dhcpv6 DUID’s.
And tldr of the previous post for clarity: I deploy / commission nodes but they follow the enlistment process instead of doing the task they’re supposed to do. MAAS shows they’re still booting til it times out.
I’ve seen a bug report in 2017 that talks about big or little endian in the client’s DUID but that has nothing to do with my case.
While digging with wireshark I found out that the server’s firmware is using a fixed DUID type 4 (DUID-UUID basically) while Ubuntu does DHCPv6 with a type 2 (DUID-EN with systemd’s PEN https://www.iana.org/assignments/enterprise-numbers/?q=43793).
I’m guessing that MAAS has no reliable way of telling this is the same host besides looking at the source MAC address? (is it even provided? Wireshark shows it but I wouldn’t be sure MAAS looks at it (I found something here https://datatracker.ietf.org/doc/rfc6355/))
I can try enabling SLAAC on my subnet and see if it does any better.
That’s correct. During the PXE boot process the machine makes a call to the rack like /grub/grub.cfg-<the_mac_address> and MAAS will elaborate on the fly the configuration to send to the machine. In particular, if it’s an unknown mac it will send the enlisting configuration
I see the request in wireshark. The server correctly requests that file with it’s MAC address.
Even if the server is registered as Ready in MAAS, when I select the deployment process it still boots in enlistment mode.
Where does MAAS decide “on the fly” what it’s going to do?
I see that the server requests the enlistment preseed metadata, then changes address, then asks for the hostname and gets maas-enlisting-node and system id i-maas-enlistment.
Could this be the cause?
When the host boots in enlistment mode I see an http call: GET /MAAS/api/2.0/machines/?op=is_registered&mac_address=my%mac%address%etc HTTP/1.1
Which gets a response of true.
So once the node is already enlisting, MAAS figures it’s already registered.
I’d say the issue is before this then.
The TFTP file for grub.cfg-my.mac.addr looks like this!
Note that there are 2 cloud-config-url's and I think that’s what’s causing it!
My custom kernel parameter used to patch the issue @r00ta mentioned overrides the deployment URL with the enlistment one T.T
I am defeated, this was very hard to debug.
Maybe a warning on overriding such dangerous parameters in the debugging docs could help?
Now; @r00ta I should ask you how to avoid the ipv6 URL bug, without using this workaround as I cannot change the global custom kernel parameter every time.
I think the problem is actually with your custom kernel parameter cloud-config-url="http://[my:prefix:141::rack]:5248/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed" because you are telling maas to always enlist the node. Once the node is added to MAAS, it would use get_preseed instead of get_enlist_preseed.