Error prevents adding machines to MAAS - 'no datasource found'

A strange error message appeared while I was attempting to add a new machine to MAAS. The PXE boot starts normally and it does seemingly boot in the correct network. However, the machine doesn’t get added and an error message like this appears on the machine I’m trying to add:

While searching around, I found out that this used to be a problem with older versions of MAAS, where the machine being added didn’t have a connection with both the rack and the controller. Could this be caused by something similar?

(https://bugs.launchpad.net/maas/+bug/1779970) A link to an older conversation about the error, from a user running the 2.3 version of MAAS. I should note that unlike the original poster here, I haven’t added any rack controllers in different networks or anything similar.

For the record, my server is running the 2.7 version of MAAS. I was planning to upgrade to 2.8 after figuring out this error, but of course I don’t mind upgrading sooner if that might help in solving the problem.

Thank you in advance to anyone who wants to try solving this!

As an update, I upgraded to 2.8 and the problem persists. Any help would still be welcomed!

Am having the same problem. Randomly machines get stuck and can not find datasource when doing the first boot for commissioning.
Repeating the step a few times and all machines finally can continue provisioning. Why is this happening?

How can I debug this more? Is it DNS that is not working from MAAS?

MAAS version: **2.9.0~rc1 (9117-g.ae6569d91)

Do you know what was a root cause? I have the same problem.

Are you able to login after the failure? if its the same problem i am having it seems curtin is not writing the maas datasource to /etc/cloud/cloud.cfg.d/ files.

Restarting maas-regiond fixed this for me

My primary question is how to debug this? I have seen this problem twice now, and both times using Ubuntu 18.04 (instead of 20.04) made the problem go away. I also tried adding the address of my MAAS instance to the DNS entry for the boot net subnet, but that did not help.

Details.

I have seen this on MAAS 2.9 and 3.0:

root@maas3:~# snap  list 
Name          Version                 Rev    Tracking       Publisher   Notes
core18        20210722                2128   latest/stable  canonical*  base
core20        20210702                1081   latest/stable  canonical*  base
maas          2.9.2-9165-g.c3e7848d1  12555  2.9/stable     canonical*  -


root@plano101:~# snap list 
Name               Version                  Rev    Tracking         Publisher   Notes
core18             20210722                 2128   latest/stable    canonical✓  base
core20             20210702                 1081   latest/stable    canonical✓  base
maas               3.0.0-10029-g.986ea3e45  15003  3.0/stable       canonical✓  -

In the case of 3.0, I was commissioning 7 roughly identical machines. They were all the same make, same BIOS, but some had more disk than others and some had better CPUs and more memory. But no system was unique. One of the 7 machines consistently refused to commission. It either hung up trying to boot initrd or it would fail with the infamous “no datasource” error. The one that failed was one of 4 identical machines.

I switched to using 18.04 for commissioning and it passed first try.

So again, just to be clear. I am asking for help diagnosing the issue. Are there logs I can/should look at or do I need to use tcpdump?

Is there anything that I should look at on this particular host (now that UB20.04 is deployed on it).

I’m facing same issue deploying Openstack with juju, turns on machines(3 HP Proliant DL360 Gen8 servers ), and some of them throws PXE errors then they are able to boot but are getting stuck on DataSourceMAAS, I dont know if some DNS issue is happening.