My primary question is how to debug this? I have seen this problem twice now, and both times using Ubuntu 18.04 (instead of 20.04) made the problem go away. I also tried adding the address of my MAAS instance to the DNS entry for the boot net subnet, but that did not help.
Details.
I have seen this on MAAS 2.9 and 3.0:
root@maas3:~# snap list
Name Version Rev Tracking Publisher Notes
core18 20210722 2128 latest/stable canonical* base
core20 20210702 1081 latest/stable canonical* base
maas 2.9.2-9165-g.c3e7848d1 12555 2.9/stable canonical* -
root@plano101:~# snap list
Name Version Rev Tracking Publisher Notes
core18 20210722 2128 latest/stable canonical✓ base
core20 20210702 1081 latest/stable canonical✓ base
maas 3.0.0-10029-g.986ea3e45 15003 3.0/stable canonical✓ -
In the case of 3.0, I was commissioning 7 roughly identical machines. They were all the same make, same BIOS, but some had more disk than others and some had better CPUs and more memory. But no system was unique. One of the 7 machines consistently refused to commission. It either hung up trying to boot initrd or it would fail with the infamous “no datasource” error. The one that failed was one of 4 identical machines.
I switched to using 18.04 for commissioning and it passed first try.
So again, just to be clear. I am asking for help diagnosing the issue. Are there logs I can/should look at or do I need to use tcpdump?
Is there anything that I should look at on this particular host (now that UB20.04 is deployed on it).