Maas enlistment with external DHCP server is failing with url_helper.py[ERROR]: Timed Out, no response from URLs

Hello,

I am using MAAS 3.2.8, with an external DHCP server.

It used to work perfectly end to end, from enlistment/commissioning to deployment.

However, i see that the enlistment suddenly stopped working.
It looks like it fails from the step during initial cloud init, when enlisting.

It hangs at Step:

Starting Initial Cloud-init job (metadata service crawler ) for about 2 mins,

and later gives errors such as -

url_helper.py[ERROR]: Timed Out, no response from URLs: [‘http://10.x.x.x./MAAS/metadata/2012-03-01/metadata/instance-id’]

DataSourceMAAS.py[CRITICAL]: Giving up on md from [‘http://10.x.x.x./MAAS/metadata/2012-03-01/metadata/instance-id’] after 126 seconds

util.py[WARNING]: No instance datasource found! Likely bad things to come!

I would like to get some help debugging on why this could happen, when it was working previously.

I tried enabling debug , but it did not give me much insight that could help.

Any guidance in how to debug this, and the reason that this might be happen would greatly help.

Thank you.

I would also like to add that it is only enlistment that is failing, while commissioning and deployment work.

One more thing that i observed that is different when enlistment fails is that , the hostname defaults to ubuntu, and does not show the hostname as maas-enlisting-node.

I also see that enlistment works fine, when i enable DHCP on MAAS, and it is only failing when i use the external DHCP.
I would like to debug, on why enlistment with external DHCP suddenly stopped working.

Hey @vallerul!

Could you check if this is the same scenario reported here Enlistment times out, fails with external DHCP server by another user? If so, does the answer provided by @billwear there work for you?

Hi @r00ta, original poster from the thread you were referring to here: that looks like the same problem I’ve been facing too. I wouldn’t say there was a solution provided in the other thread really as the service is still broken for me and for the moment I’ve stopped working on it (and may very well need to switch to something other than MAAS). If I find a solution though I’ll post back here.

Unfortunately the solution has to be specific to your env as the setup with an external DHCP server is not something we can generalize in this thread without any additional information.

The post I linked contains some info to enable you to start investigating the issue.

In case you get back working on it and you want to share some details in this community we are willing to listen and hopefully help :slight_smile:

Thank you @r00ta

I did check the answers reported in that link.

And the behavior looks to be the same, where i don’t see the respective access logs beyond the time, when the node hangs waiting for metadata crawler, and errors out.

One of the differences with my scenario is that, the external dhcp server that i had configured was working perfectly fine, until something changed and it stopped working.

As part of the debugging process:
I had repeated the steps that i did initially to configure the external dhcp … such as:

  1. Copy the exact dhcpd.conf that maas uses , when dhcp is enabled in MAAS to an external DHCP and try enlisting. When i tried this about an year ago, it did work perfectly fine, where MAAS was able to enlist fine and recognize the server.

  2. I had later removed each of the options from that dhcpd.conf until i saw what was the bare minimum needed. This configuration worked for many months, until it broke few days ago.

I do suspect that it could be related to instance id metadata, as mentioned in this statement:

One possibility could be that your DHCP server is not correctly set up to provide the necessary boot options for MAAS. For MAAS to work with an external DHCP server, the DHCP server needs to be configured with specific boot options that tell the machines where to get their PXE boot images and metadata. The MAAS documentation provides more information about PXE and DHCP for MAAS.

May i please know what boot options are these and if they are included in the dhcpd.conf that MAAS uses or if MAAS dynamically adds something to its dhcpd.conf, or leases file that the external dhcp is might be missing when i make a copy?

I see that the client is able to get pxe images, but i am not able to confirm if the client is able to get the metadata, and what dhcp boot options would help the same.

Also - i see that regiond logs are not in the same time zone, as the rackd controller. I did mention ntp and only to use ntp server, but will that not make regiond use the same ntp? I am just thinking if metadata retrieval is somehow failing because of this.

Thank you.

Was port omitted intentionally or there was no port?
I am wondering if thats the same issue as in Cloud-init fails to fetch MAAS datasource from metadata_url missing port