During cloud-init of a successfully configured node–with a bone-stock image and without any curtin customizations–cloud-init attempts to retrieve metadata from MaaS BEFORE initializing the network.
2024-10-02 02:18:39,147 - __init__.py[DEBUG]: Detected platform: DataSourceMAAS [None]. Checking for active instance data
2024-10-02 02:18:39,151 - url_helper.py[DEBUG]: [0/1] open 'http://10.10.10.10:5248/MAAS/metadata/2012-03-01/meta-data/instance-id' with {'url': 'http://10.10.10.10:5248/MAAS/metadata/2012-03-01/meta-data/instance-id', 'stream': False, 'allow_redirects': True, 'method': 'GET', 'timeout': 50.0, 'headers': {'User-Agent': 'Cloud-Init/23.4-7.el9_4.6.alma.1', 'Authorization': 'OAuth oauth_nonce="****", oauth_timestamp="1727835519", oauth_version="1.0", oauth_signature_method="PLAINTEXT", oauth_consumer_key="****", oauth_token="****", oauth_signature="****"'}} configuration
2024-10-02 02:18:39,154 - url_helper.py[DEBUG]: Calling 'None' failed [0/120s]: request error [HTTPConnectionPool(host='10.10.10.10', port=5248): Max retries exceeded with url: /MAAS/metadata/2012-03-01/meta-data/instance-id (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6e41abcd30>: Failed to establish a new connection: [Errno 101] Network is unreachable'))]
It also fails to post status messages. In fact, the cloud-init log has dozens and dozens of reports of network failure. After about 10 pages of this, we get down to:
2024-10-02 02:20:45,347 - util.py[DEBUG]: Reading from /sys/class/net/enp129s0f0/address (quiet=False)
2024-10-02 02:20:45,347 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/enp129s0f0/address
...snip repeat for each interface...
2024-10-02 02:20:45,347 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2024-10-02 02:20:45,347 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address
2024-10-02 02:20:45,348 - util.py[DEBUG]: Reading from /sys/class/net/enp129s0f0/address (quiet=False)
2024-10-02 02:20:45,348 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/enp129s0f0/address
...snip repeat for each interface...
2024-10-02 02:20:45,348 - util.py[DEBUG]: Reading from /usr/lib/python3.9/site-packages/cloudinit/config/schemas/schema-network-config-v1.json (quiet=False)
and then we configure the network. This is clearly out of order, no?
Naive ideas about improvements:
- Configure the network before querying the network for metadata
- Queue all status reports until the network is available
I’m guessing this is an ordering problem in the cloud-init configured by MaaS. If I’m misattributing this, or If there’s anything I can do in the configuration to address it, that would be great. If someone wants to toss me at the relevant portions of the code I’m happy to see if I can provide a patch.