Confusing cloud-init log events

I’m running MAAS snap 3.3.3-13184-g.3e9972c19

When I deploy systems, I see warnings in syslog from handler.py with "result": "SUCCESS"

Jun 12 14:00:50 test cloud-init[714]: Cloud-init v. 23.1.1-0ubuntu0~22.04.1 running 'init-local' at Mon, 12 Jun 2023 14:00:34 +0000. Up 4.50 seconds.
Jun 12 14:00:50 test cloud-init[714]: 2023-06-12 14:00:34,340 - handlers.py[WARNING]: Failed posting event: {"name": "init-local/check-cache", "description": "attempting to read from cache [trust]", "event_type": "start", "origin": "cloudinit", "timestamp": 1686578434.287253}. This was caused by: HTTPConnectionPool(host='192.168.98.2', port=5248): Max retries exceeded with url: /MAAS/metadata/status/exhqds (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe9a512f4f0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
Jun 12 14:00:50 test cloud-init[714]: 2023-06-12 14:00:34,342 - handlers.py[WARNING]: Failed posting event: {"name": "init-local/check-cache", "description": "no cache found", "event_type": "finish", "origin": "cloudinit", "timestamp": 1686578434.287579, "result": "SUCCESS"}. This was caused by: HTTPConnectionPool(host='192.168.98.2', port=5248): Max retries exceeded with url: /MAAS/metadata/status/exhqds (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe9a512cd30>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
Jun 12 14:00:50 test systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
Jun 12 14:00:50 test systemd[1]: message repeated 2 times: [ Condition check resulted in OpenVSwitch configuration for cleanup being skipped.]
Jun 12 14:00:50 test cloud-init[714]: 2023-06-12 14:00:34,523 - handlers.py[WARNING]: Failed posting event: {"name": "init-local", "description": "searching for local datasources", "event_type": "finish", "origin": "cloudinit", "timestamp": 1686578434.5219314, "result": "SUCCESS"}. This was caused by: HTTPConnectionPool(host='192.168.98.2', port=5248): Max retries exceeded with url: /MAAS/metadata/status/exhqds (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe9a512f9d0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
Jun 12 14:00:50 test cloud-init[714]: 2023-06-12 14:00:34,523 - handlers.py[WARNING]: Multiple consecutive failures in WebHookHandler. Cancelling all queued events.
Jun 12 14:00:50 test systemd[1]: Finished Initial cloud-init job (pre-networking).
Jun 12 14:00:50 test systemd[1]: Reached target Preparation for Network.
Jun 12 14:00:50 test systemd[1]: Starting Network Configuration...
Jun 12 14:00:50 test systemd-networkd[732]: lo: Link UP
Jun 12 14:00:50 test systemd-networkd[732]: lo: Gained carrier
Jun 12 14:00:50 test systemd-networkd[732]: Enumeration completed
Jun 12 14:00:50 test systemd[1]: Started Network Configuration.

It seems that MAAS builds a cloud-init config to 90_dpkg_local_cloud_config.cfg that wants to make a network connection:

root@test:~# cat /etc/cloud/cloud.cfg.d/90_dpkg_local_cloud_config.cfg
# written by cloud-init debian package per preseed entry
# cloud-init/local-cloud-config
manage_etc_hosts: true
manual_cache_clean: true
reporting:
  maas:
    consumer_key: trim
    endpoint: http://192.168.98.2:5248/MAAS/metadata/status/exhqds
    token_key: trim
    token_secret: trim
    type: webhook

However, this cloud-init config is set to run via cloud-init/local-cloud-config, which according to cloud-init docs, won’t have network access.

I would think that failed posting event would be an ERROR not WARNING, and the event message wouldn’t contradict itself with "result": "SUCCESS"

Is this expected behavior?

hey, @nateybobo!

It looks like the root cause is that the MAAS cloud-init config is trying to contact the MAAS metadata API before networking is available, resulting in those errors. Here are some suggestions:

  • You’re correct that since this MAAS config runs during the ‘init-local’ cloud-init stage, network is not yet available so those API calls will fail.

  • Changing the log level to ERROR for those failures would make more sense than WARNING. I agree the successful result seems misleading.

  • This behavior does seem incorrect - the MAAS metadata integration should be deferred to later cloud-init stages where networking is available.

  • I would suggest opening a bug against MAAS detailing the problem. The logs and config file you shared are useful examples to include.

  • As a workaround, you could disable or remove that 90_dpkg_local_cloud_config.cfg file until the behavior is fixed.

  • Making the API endpoint configurable or disabling MAAS metadata integration during init-local may be ways to address it in MAAS.

  • I’m not sure if this is a known limitation or regression - the devs would have good insight once they can review a bug report.

Let me know if you need any other help or info for filing the bug! Getting the MAAS team to look at it directly will be the fastest path to improving this initialization process.

I also have the same problem! Between this error and the fact that you enable by default the use of a proxy for a reason still unknown to me, maas is driving me crazy. I have been dealing with this errno 101 for three days. How is it possible that people use maas for production environments!?? I have uninstalled maas I can’t remember how many times, and reinstalled with everything minimum but still there is this bug.

How do I disable or remove 90_dpkg_local_cloud_config.cfg??

If you want people to help you, first you should try to be clear in your issue description: what’s the problem, what’s your setup, what version of Maas you are running and what did you try in addition to reinstalling Maas for days hoping that something will change (sorry to tell, but not every issue can be fixed by powering on and off your laptop).

From what I’ve seen up to now, most of the time people get “Network is unreachable” because of their broken network setup. Please check yours.
Then, if everything is ok, look at the maas logs.

my network is working ok, it is maas that is broken when provisioning the last update of ubuntu 20.04

Than you for providing more info. You mean this image Index of /ephemeral-v3/stable/focal/amd64/20240228 (updated on 28th Feb) ? Is it working with 22.04?