Debugging ipv6 DUID's

I have a mixed ipv4 - ipv6 environment.
All ipv4 hosts work correctly.

Booting ipv6 hosts, they get their address over DHCPv6 and their lease is saved in /var/lib/maas/dhcp/dhcpd6.leases
However, even when commissioned, if I deploy any of them they boot as maas-enlisting-node and commission again while the UI is stuck on performing PXE boot

How can I debug this? What’s the matching criterium that MAAS uses to determine which host is booting?
I can see that once a host is Ready, if I set one of it’s interfaces to get their address over DHCPv6, /var/lib/maas/dhcpd6.conf shows an extra section with the host’s MAC address with no mention of DUID’s.

Moreover the server is currently booting with DUID-UUID. Even if I set DUID-LLT in the UEFI settings the behaviour is unchanged.
I don’t see any relevant logs in rackd.log or regiond.log
The ipv4 and ipv6 subnets are on different VLANs and shouldn’t interfere.

Hi @rhxto ,

Is this the same bug you reported here https://bugs.launchpad.net/maas/+bug/2045020 ?

Hello

The issue you mentioned is now sorted because i’m using the custom kernel parameters to bypass it.
I appreciate your patch : ), thank you

I’m now having issues not with booting hosts, but with MAAS not recognizing them.
It seems like either my host is providing a wrong DUID to the dhcpv6 server or something goes wrong and MAAS always thinks it’s a new node when I commission / deploy it, until it reads the serial in the enlistment scripts. At that point the host shuts down and nothing else happens. Deployment times out.
I’m trying to figure out how MAAS decides if the node is enlisting or doing anything else, some mailing list messages lead me to dhcpv6 DUID’s.

And tldr of the previous post for clarity: I deploy / commission nodes but they follow the enlistment process instead of doing the task they’re supposed to do. MAAS shows they’re still booting til it times out.
I’ve seen a bug report in 2017 that talks about big or little endian in the client’s DUID but that has nothing to do with my case.

While digging with wireshark I found out that the server’s firmware is using a fixed DUID type 4 (DUID-UUID basically) while Ubuntu does DHCPv6 with a type 2 (DUID-EN with systemd’s PEN https://www.iana.org/assignments/enterprise-numbers/?q=43793).

I’m guessing that MAAS has no reliable way of telling this is the same host besides looking at the source MAC address? (is it even provided? Wireshark shows it but I wouldn’t be sure MAAS looks at it (I found something here https://datatracker.ietf.org/doc/rfc6355/))

I can try enabling SLAAC on my subnet and see if it does any better.

That’s correct. During the PXE boot process the machine makes a call to the rack like
/grub/grub.cfg-<the_mac_address> and MAAS will elaborate on the fly the configuration to send to the machine. In particular, if it’s an unknown mac it will send the enlisting configuration

I see the request in wireshark. The server correctly requests that file with it’s MAC address.
Even if the server is registered as Ready in MAAS, when I select the deployment process it still boots in enlistment mode.

Where does MAAS decide “on the fly” what it’s going to do?

I see that the server requests the enlistment preseed metadata, then changes address, then asks for the hostname and gets maas-enlisting-node and system id i-maas-enlistment.

Could this be the cause?
When the host boots in enlistment mode I see an http call:
GET /MAAS/api/2.0/machines/?op=is_registered&mac_address=my%mac%address%etc HTTP/1.1
Which gets a response of true.

So once the node is already enlisting, MAAS figures it’s already registered.
I’d say the issue is before this then.

The TFTP file for grub.cfg-my.mac.addr looks like this!

menuentry 'Ephemeral' {
    echo   'Booting under MAAS direction...'
    linux  (http,[rack-ipv6-addr]:5248)/images/ubuntu/amd64/ga-22.04/jammy/stable/boot-kernel nomodeset ro root=squash:http://[rack-ipv6-addr]:5248/images/ubuntu/amd64/ga-22.04/jammy/stable/squashfs ip=off ip6=dhcp overlayroot=tmpfs overlayroot_cfgdisk=disabled cc:\{'datasource_list': ['MAAS']\}end_cc cloud-config-url=http://rack-ipv6----80.maas-internal:5248/MAAS/metadata/latest/by-id/y3s463/?op=get_preseed log_host=rack0ipv6 log_port=5247 --- cloud-config-url="http://[rack-ipv6-addr]:5248/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed"  BOOTIF=01-${net_default_mac}
    initrd (http,[rack-ipv6-addr]:5248)/images/ubuntu/amd64/ga-22.04/jammy/stable/boot-initrd
}

Note that there are 2 cloud-config-url's and I think that’s what’s causing it!
My custom kernel parameter used to patch the issue @r00ta mentioned overrides the deployment URL with the enlistment one T.T

I am defeated, this was very hard to debug.
Maybe a warning on overriding such dangerous parameters in the debugging docs could help?

Now; @r00ta I should ask you how to avoid the ipv6 URL bug, without using this workaround as I cannot change the global custom kernel parameter every time.

I think the problem is actually with your custom kernel parameter cloud-config-url="http://[my:prefix:141::rack]:5248/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed" because you are telling maas to always enlist the node. Once the node is added to MAAS, it would use get_preseed instead of get_enlist_preseed.

Now, I think you should

Are you using deb or snap?

Yeah I had edited the post and included the part where I realised that…

I’m using deb because snap does not support ipv6 : )
Applying the patch sounds good to me

You can then edit

sudo nano /usr/lib/python3/dist-packages/maasserver/rpc/boot.py

according to the patch here Merge into master : lp-2045020-preseed-ipv6 : lp:~r00ta/maas : Git : Code : MAAS

and then restart the region

sudo systemctl restart maas-regiond.service

Wonderful. My debugging nightmare is over and I can start using MAAS.

Grazie mille

Np, thanks for reporting the bug!

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.