MaaS DHCP service no longer running

I have a server running MaaS as rack+region controller, installed via snap with version 3.4.1-14343-g.a552d2522.

I am using one of the network interfaces on this machine to act as a DHCP server to assign IPs (via DHCP snippets) to all of the BMCs on one subnet.

When I recently checked the MaaS UI to check the status of a machine, I noticed all of the machines are reporting IPMI power errors (the BMCs cannot be accessed anymore). Running a scan on the subnet they should be on, using nmap, shows that the BMCs are not getting assigned IPs.

So I run sudo maas status and the dhcpd service is “stopped”. I tried restarting the MaaS server, but the dhcpd service never starts up.

The only thing out of the ordinary with the server running MaaS is that it is reporting a redundant PSU error/failure. I’m not sure if this would affect a network interface or not, but the server is otherwise running fine. Just wanted to mention it, since I saw this post: How to enable DHCP server? - #3

I’m not sure if I can manually try starting the service, or if there are specific logs that are relevant.

You might want to check the regiond and rackd logs to get more info and understand what’s going on

Thanks. I’ll include some info I got from logs. The most common seem to be related to the following:

In regiond.log:

2024-06-26 13:52:11 twisted.scripts: [info] twistd 22.1.0 (/snap/maas/34087/bin/python3 3.10.12) starting up.
2024-06-26 13:52:11 twisted.scripts: [info] reactor class: twisted.internet.asyncioreactor.AsyncioSelectorReactor.
2024-06-26 13:52:11 maasserver.eventloop_1308.master: [info] Calling start_up to start region process
2024-06-26 13:52:18 maasserver.regiondservices.active_discovery: [info] Active network discovery: Discovery interval set to 3600 seconds.
2024-06-26 13:52:18 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller '<maas_hostname>' (mhq6xb).
2024-06-26 13:52:18 maasserver.regiondservices.active_discovery: [info] Active network discovery: Unable to initiate network scanning on any rack controller. Verify that the rack controllers are started and have connected to the region.

In rackd.log:

2024-06-26 13:52:14 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:14 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:15 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:15 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:16 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:16 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:16 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:17 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:17 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:17 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:18 provisioningserver.rpc.clusterservice: [info] Region is not advertising RPC endpoints. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS)
2024-06-26 13:52:18 provisioningserver.rpc.clusterservice: [info] Region not available: Connection was refused by other side: 111: Connection refused. (While requesting RPC info at http://<maas_server_ip>:5240/MAAS).
2024-06-26 13:52:29 provisioningserver.rpc.clusterservice: [info] Making connections to event-loops: <maas_hostname>:pid=1554, <maas_hostname>:pid=1555, <maas_hostname>:pid=1556, <maas_hostname>:pid=1557
2024-06-26 13:52:29 Uninitialized: [info] ClusterClient connection established (HOST:IPv6Address(type='TCP', host='::ffff:<maas_server_ip>', port=44422, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:<maas_server_ip>', port=5251, flowInfo=0, scopeID=0))
2024-06-26 13:52:29 Uninitialized: [info] ClusterClient connection established (HOST:IPv6Address(type='TCP', host='::ffff:<maas_dhcp_interface_ip>', port=46712, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:<maas_dhcp_interface_ip>', port=5250, flowInfo=0, scopeID=0))
2024-06-26 13:52:29 Uninitialized: [info] ClusterClient connection established (HOST:IPv6Address(type='TCP', host='::ffff:<maas_server_ip>', port=45196, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:<maas_server_ip>', port=5253, flowInfo=0, scopeID=0))
2024-06-26 13:52:29 Uninitialized: [info] ClusterClient connection established (HOST:IPv6Address(type='TCP', host='::ffff:<maas_server_ip>', port=33558, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:<maas_server_ip>', port=5252, flowInfo=0, scopeID=0))
2024-06-26 13:52:29 provisioningserver.rpc.clusterservice: [info] Event-loop '<maas_hostname>:pid=1555' authenticated.
2024-06-26 13:52:29 provisioningserver.rpc.clusterservice: [info] Event-loop '<maas_hostname>:pid=1554' authenticated.
2024-06-26 13:52:29 provisioningserver.rpc.clusterservice: [info] Event-loop '<maas_hostname>:pid=1557' authenticated.
2024-06-26 13:52:30 provisioningserver.rpc.clusterservice: [info] Event-loop '<maas_hostname>:pid=1556' authenticated.
2024-06-26 13:52:30 provisioningserver.rpc.clusterservice: [info] Rack controller 'mhq6xb' registered (via <maas_hostname>:pid=1555) with MAAS version 3.4.1-14343-g.a552d2522.
2024-06-26 13:52:30 provisioningserver.rpc.clusterservice: [info] Fully connected to all 4 event-loops on all 1 region controllers (<maas_hostname>).
2024-06-26 13:52:30 HTTP11ClientProtocol,client: [critical] Unhandled Error

In maas.log:

2024-06-26T13:52:23.708334+00:00 <maas_hostname> maas.bootsources: [info] Updated boot sources cache.
2024-06-26T13:52:26.487103+00:00 <maas_hostname> maas.networks.monitor: [info] networks-monitoring: Process ID 1307 assumed responsibility.
2024-06-26T13:52:26.487110+00:00 <maas_hostname> maas.networks.monitor: [info] version-update-check: Process ID 1307 assumed responsibility.
2024-06-26T13:52:26.487124+00:00 <maas_hostname> maas.dhcp.probe: [error] Can't initiate DHCP probe; no RPC connection to region.
2024-06-26T13:52:28.728396+00:00 <maas_hostname> maas.service_monitor: [info] Service 'maas-http' has been restarted. Its current state is 'on' and 'running'.
2024-06-26T13:52:28.739469+00:00 <maas_hostname> maas.networks.monitor: [info] certificate-expiration-check: Process ID 1308 assumed responsibility.
2024-06-26T13:52:28.739474+00:00 <maas_hostname> maas.networks.monitor: [info] vault-secrets-cleanup: Process ID 1308 assumed responsibility.
2024-06-26T13:52:28.739478+00:00 <maas_hostname> maas.service_monitor: [info] Service 'maas-syslog' is not on, it will be started.
2024-06-26T13:52:28.739482+00:00 <maas_hostname> maas.service_monitor: [error] Service 'maas-syslog' failed to start. Its current state is 'off' and 'dead'.
2024-06-26T13:52:28.739485+00:00 <maas_hostname> maas.service_monitor: [info] Service 'maas-http' has been restarted. Its current state is 'on' and 'running'.
2024-06-26T13:52:29.523940+00:00 <maas_hostname> maas.rpc.rackcontrollers: [info] Existing rack controller '<maas_hostname>' running version 3.4.1-14343-g.a552d2522 has connected to region '<maas_hostname>'.
2024-06-26T13:52:29.639829+00:00 <maas_hostname> maas.service_monitor: [info] Service 'chrony' has been restarted. Its current state is 'on' and 'running'.
2024-06-26T13:52:30.563655+00:00 <maas_hostname> maas.rpc.rackcontrollers: [info] Existing rack controller '<maas_hostname>' running version 3.4.1-14343-g.a552d2522 has connected to region '<maas_hostname>'.
2024-06-26T13:52:30.739649+00:00 <maas_hostname> maas.rpc.rackcontrollers: message repeated 2 times: [ [info] Existing rack controller '<maas_hostname>' running version 3.4.1-14343-g.a552d2522 has connected to region '<maas_hostname>'.]
2024-06-26T13:52:30.748235+00:00 <maas_hostname> maas.refresh: [info] Refreshing rack controller hardware information.
2024-06-26T13:52:39.537586+00:00 <maas_hostname> maas.service_monitor: [info] Service 'maas-syslog' has been restarted. Its current state is 'on' and 'running'.
2024-06-26T13:52:42.787912+00:00 <maas_hostname> maas.service_monitor: [info] Service 'maas-http' has been restarted. Its current state is 'on' and 'running'.

Several errors in http/error.log like this:

2024/06/26 13:52:12 [error] 1323#1323: *1 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.0 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.0:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:12 [error] 1323#1323: *1 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.1 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.1:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:12 [error] 1323#1323: *1 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.2 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.2:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:12 [error] 1323#1323: *1 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.3 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.3:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:12 [error] 1325#1325: *6 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.0 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.0:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:12 [error] 1325#1325: *6 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.1 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.1:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:12 [error] 1325#1325: *6 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.2 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.2:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:12 [error] 1325#1325: *6 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.3 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.3:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:12 [error] 1325#1325: *7 no live upstreams while connecting to upstream, client: <maas_dhcp_interface_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://regiond-webapp/MAAS/rpc/", host: "<maas_dhcp_interface_ip>"
2024/06/26 13:52:13 [error] 1359#1359: *1 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.0 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.0:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:13 [error] 1359#1359: *1 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.1 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.1:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:13 [error] 1359#1359: *1 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.2 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.2:/MAAS/rpc/", host: "<maas_ip>"
2024/06/26 13:52:13 [error] 1359#1359: *1 connect() to unix:/var/snap/maas/34087/maas-regiond-webapp.sock.3 failed (111: Unknown error) while connecting to upstream, client: <maas_ip>, server: , request: "GET /MAAS/rpc/ HTTP/1.1", upstream: "http://unix:/var/snap/maas/34087/maas-regiond-webapp.sock.3:/MAAS/rpc/", host: "<maas_ip>"

I’m not sure if I can upload the full logs or not, but let me know if that would help.

you can upload the full logs somewhere and share the link

maas.log: https://pastebin.com/5b8TbhBD
regiond.log: https://pastebin.com/efYfiQd7
rackd.log: https://pastebin.com/GJ8CcHMn
http/error.log: https://pastebin.com/JZQuzHAc

Is it similar to this situation?