MAAS failed since upgrading from 3.4.4/stable to 3.5/stable

Hello there,

I just snap-refreshed our 3.4.4/stable deployment to 3.5/stable, and it now become a disaster.
The HTTP server still works, but MAAS/accounts/login/ returns a 502 error, and all subordinary services like DNS fail too.

Syslog shows:

Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [alert] 1151821#1151821: connect() failed (2: No such file or directory)
Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [error] 1151821#1151821: *880 no live upstreams while connecting to upstream, client: 192.168.122.1, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://regiond-webapp/MAAS/rpc/", host: "192.168.122.1"
Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [alert] 1151821#1151821: connect() failed (2: No such file or directory)
Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [error] 1151821#1151821: *881 no live upstreams while connecting to upstream, client: 192.168.0.2, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://regiond-webapp/MAAS/rpc/", host: "192.168.0.2"
Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [alert] 1151821#1151821: connect() failed (2: No such file or directory)
Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [error] 1151821#1151821: *882 no live upstreams while connecting to upstream, client: 192.168.0.208, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://regiond-webapp/MAAS/rpc/", host: "192.168.0.2"
Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [alert] 1151821#1151821: connect() failed (2: No such file or directory)
Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [error] 1151821#1151821: *883 no live upstreams while connecting to upstream, client: 192.168.0.208, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://regiond-webapp/MAAS/rpc/", host: "192.168.0.2"
Sep 23 10:52:26 maas maas-http[1151821]: 2024/09/23 10:52:26 [alert] 1151821#1151821: connect() failed (2: No such file or directory)
Sep 23 10:52:26 maas maas-rackd[1151778]: provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://192.168.0.2:5240/MAAS)
Sep 23 10:52:27 maas maas-http[1151821]: 2024/09/23 10:52:27 [error] 1151821#1151821: *884 connect() to unix:/var/snap/maas/36878/maas-regiond-webapp.sock.3 failed (111: Unknown error) while connecting to upstream, client: 192.168.0.2, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/36878/maas-regiond-webapp.sock.3:/MAAS/rpc/", host: "192.168.0.2"
Sep 23 10:52:27 maas maas-http[1151821]: 2024/09/23 10:52:27 [error] 1151821#1151821: *884 connect() to unix:/var/snap/maas/36878/maas-regiond-webapp.sock.2 failed (111: Unknown error) while connecting to upstream, client: 192.168.0.2, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/36878/maas-regiond-webapp.sock.2:/MAAS/rpc/", host: "192.168.0.2"
Sep 23 10:52:27 maas maas-http[1151821]: 2024/09/23 10:52:27 [crit] 1151821#1151821: *884 connect() to unix:/var/snap/maas/36878/maas-regiond-webapp.sock.1 failed (2: No such file or directory) while connecting to upstream, client: 192.168.0.2, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/36878/maas-regiond-webapp.sock.1:/MAAS/rpc/", host: "192.168.0.2"
Sep 23 10:52:27 maas maas-http[1151821]: 2024/09/23 10:52:27 [error] 1151821#1151821: *884 connect() to unix:/var/snap/maas/36878/maas-regiond-webapp.sock.0 failed (111: Unknown error) while connecting to upstream, client: 192.168.0.2, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/36878/maas-regiond-webapp.sock.0:/MAAS/rpc/", host: "192.168.0.2"
Sep 23 10:52:27 maas maas-http[1151821]: 2024/09/23 10:52:27 [alert] 1151821#1151821: connect() failed (2: No such file or directory)

The issue still exists even when I snap remove maas --purge and then re-install the 3.5/stable from the upstream.

And the situation is getting worse:

root@maas:~# snap install maas --channel=3.5/stable
maas (3.5/stable) 3.5.1-16317-g.409891638 from Canonical✓ installed
root@maas:~# snap start maas
Started.
root@maas:~# snap start maas.supervisor
error: snap "maas" has no service "supervisor"
root@maas:~# lsof -i
COMMAND       PID            USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
systemd-r     724 systemd-resolve   13u  IPv4     24602      0t0  UDP localhost:domain
systemd-r     724 systemd-resolve   14u  IPv4     24603      0t0  TCP localhost:domain (LISTEN)
rsyslogd      740          syslog    7u  IPv4     24129      0t0  UDP *:syslog
rsyslogd      740          syslog    8u  IPv6     24130      0t0  UDP *:syslog
sshd          828            root    3u  IPv4     17298      0t0  TCP *:ssh (LISTEN)
sshd          828            root    4u  IPv6     17300      0t0  TCP *:ssh (LISTEN)
postgres      893        postgres    6u  IPv4     20008      0t0  TCP *:postgresql (LISTEN)
postgres      893        postgres    7u  IPv6     20009      0t0  TCP *:postgresql (LISTEN)
dnsmasq      1017 libvirt-dnsmasq    3u  IPv4     20195      0t0  UDP *:bootps
dnsmasq      1017 libvirt-dnsmasq    5u  IPv4     20198      0t0  UDP maas:domain
dnsmasq      1017 libvirt-dnsmasq    6u  IPv4     20199      0t0  TCP maas:domain (LISTEN)

I have no access to the Web UI.

When you migrate from 3.4 to 3.5 the first startup takes some time because the images are transferred from the database to the disk. Depending on the number of images you have, it can take some time.

Also, 3.5 is now using journal for logging. Please see https://maas.io/docs/about-maas-logging and extract the maas-regiond logs to understand what’s happening

1 Like

Hello r00ta, thanks for the info.

After I performed snap restart maas, it says

Sep 23 11:17:40 maas maas.pebble[1157443]: 2024-09-23T10:17:40.883Z [pebble] Exiting on terminated signal.
Sep 23 11:17:40 maas systemd[1]: Stopping Service for snap application maas.pebble...
Sep 23 11:17:40 maas systemd[1]: snap.maas.pebble.service: Deactivated successfully.
Sep 23 11:17:40 maas systemd[1]: Stopped Service for snap application maas.pebble.
Sep 23 11:18:05 maas systemd[1]: Started Service for snap application maas.pebble.
Sep 23 11:18:05 maas maas.pebble[1157548]: 2024-09-23T10:18:05.805Z [pebble] Started daemon.
Sep 23 11:18:05 maas maas.pebble[1157548]: 2024-09-23T10:18:05.806Z [pebble] POST /v1/services 83.655µs 400
Sep 23 11:18:05 maas maas.pebble[1157548]: 2024-09-23T10:18:05.806Z [pebble] Cannot start default services: no default services

Anything the regiond logs?

Nothing from the regiond, but the rackd keep complaining:

Sep 23 11:40:05 maas maas-rackd[1158195]: provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://192.168.0.2:5240/MAAS).
Sep 23 11:40:06 maas maas-rackd[1158195]: provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://192.168.0.2:5240/MAAS).
Sep 23 11:40:07 maas maas-rackd[1158195]: provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://192.168.0.2:5240/MAAS).
Sep 23 11:40:08 maas maas-rackd[1158195]: provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://192.168.0.2:5240/MAAS).
# sudo maas status
Service          Startup   Current   Since
agent            disabled  inactive  -
apiserver        enabled   active    today at 10:39 UTC
bind9            disabled  inactive  -
dhcpd            disabled  inactive  -
dhcpd6           disabled  inactive  -
http             disabled  active    today at 10:40 UTC
ntp              disabled  inactive  -
proxy            disabled  inactive  -
rackd            enabled   active    today at 10:39 UTC
regiond          enabled   active    today at 10:39 UTC
syslog           disabled  inactive  -
temporal         disabled  inactive  -
temporal-worker  disabled  inactive  -

the first startup takes some time

You’re right. Now the MAAS is back!

But still something wrong with the rackd:

Sep 23 12:01:47 maas maas-regiond[1158976]: maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:10'.
Sep 23 12:01:47 maas maas-regiond[1158976]:         Traceback (most recent call last):
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
Sep 23 12:01:47 maas maas-regiond[1158976]:             current_context.run(_inlineCallbacks, r, gen, status)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1740, in _inlineCallbacks
Sep 23 12:01:47 maas maas-regiond[1158976]:             status.deferred.errback()
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 700, in errback
Sep 23 12:01:47 maas maas-regiond[1158976]:             self._startRunCallbacks(fail)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
Sep 23 12:01:47 maas maas-regiond[1158976]:             self._runCallbacks()
Sep 23 12:01:47 maas maas-regiond[1158976]:         --- <exception caught here> ---
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
Sep 23 12:01:47 maas maas-regiond[1158976]:             current.result = callback(  # type: ignore[misc]
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/rack_controller.py", line 281, in <lambda>
Sep 23 12:01:47 maas maas-regiond[1158976]:             d.addErrback(lambda f: f.trap(NoConnectionsAvailable))
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
Sep 23 12:01:47 maas maas-regiond[1158976]:             self.raiseException()
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
Sep 23 12:01:47 maas maas-regiond[1158976]:             raise self.value.with_traceback(self.tb)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
Sep 23 12:01:47 maas maas-regiond[1158976]:             current.result = callback(  # type: ignore[misc]
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/rack_controller.py", line 300, in unwatch_if_does_not_exist
Sep 23 12:01:47 maas maas-regiond[1158976]:             f.trap(RackController.DoesNotExist)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
Sep 23 12:01:47 maas maas-regiond[1158976]:             self.raiseException()
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
Sep 23 12:01:47 maas maas-regiond[1158976]:             raise self.value.with_traceback(self.tb)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
Sep 23 12:01:47 maas maas-regiond[1158976]:             result = current_context.run(
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
Sep 23 12:01:47 maas maas-regiond[1158976]:             return g.throw(self.type, self.value, self.tb)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/dhcp.py", line 867, in configure_dhcp
Sep 23 12:01:47 maas maas-regiond[1158976]:             config = yield deferToDatabase(get_dhcp_configuration, rack_controller)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
Sep 23 12:01:47 maas maas-regiond[1158976]:             result = inContext.theWork()  # type: ignore[attr-defined]
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
Sep 23 12:01:47 maas maas-regiond[1158976]:             inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
Sep 23 12:01:47 maas maas-regiond[1158976]:             return self.currentContext().callWithContext(ctx, func, *args, **kw)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
Sep 23 12:01:47 maas maas-regiond[1158976]:             return func(*args, **kw)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 856, in callInContext
Sep 23 12:01:47 maas maas-regiond[1158976]:             return func(*args, **kwargs)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
Sep 23 12:01:47 maas maas-regiond[1158976]:             result = func(*args, **kwargs)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/utils/orm.py", line 771, in call_within_transaction
Sep 23 12:01:47 maas maas-regiond[1158976]:             return func_outside_txn(*args, **kwargs)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/utils/orm.py", line 574, in retrier
Sep 23 12:01:47 maas maas-regiond[1158976]:             return func(*args, **kwargs)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/usr/lib/python3.10/contextlib.py", line 79, in inner
Sep 23 12:01:47 maas maas-regiond[1158976]:             return func(*args, **kwds)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/dhcp.py", line 786, in get_dhcp_configuration
Sep 23 12:01:47 maas maas-regiond[1158976]:             config = get_dhcp_configure_for(
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/dhcp.py", line 639, in get_dhcp_configure_for
Sep 23 12:01:47 maas maas-regiond[1158976]:             peer_name, peer_config, peer_rack = make_failover_peer_config(
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/dhcp.py", line 515, in make_failover_peer_config
Sep 23 12:01:47 maas maas-regiond[1158976]:             peer_address = get_ip_address_for_rack_controller(
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/dhcp.py", line 258, in get_ip_address_for_rack_controller
Sep 23 12:01:47 maas maas-regiond[1158976]:             return get_ip_address_for_interface(interface, vlan, ip_version)
Sep 23 12:01:47 maas maas-regiond[1158976]:           File "/snap/maas/36889/lib/python3.10/site-packages/maasserver/dhcp.py", line 232, in get_ip_address_for_interface
Sep 23 12:01:47 maas maas-regiond[1158976]:             for ip_address in interface.ip_addresses.all():
Sep 23 12:01:47 maas maas-regiond[1158976]:         builtins.AttributeError: 'NoneType' object has no attribute 'ip_addresses'
Sep 23 12:01:47 maas maas-regiond[1158976]:

What’s your topology? How many region+racks, regions and racks do have?

All nodes are located in the same L2.
Single region.
Controllers: 1 region+rack (main), 1 rack (subordinate)

It looks like a controller interface does not have any ip associated. I’d restart all the regions and the racks and ensure that the automatic/recurrent commissioning scripts on all the controllers are executed successfully

Thanks r00ta.
After reboot all regions and racks, and deleted all unused fabrics, the issue has been resolved.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.