Howdy
thins one is driving me nuts, a bit.
I have a small(?) installation (MAAS 3.3.4, DEB) with about 25 Machines.
Yet, the machines listing in the GUI as well as listing them on the cli takes an awful lot of time (sometimes >10minutes)
During that, it seems that the rackd
times out when contacting the regiond
(same server):
2023-08-09 15:01:53 provisioningserver.rpc.clusterservice: [info] Rack controller 'yadpqf' registered (via deploy:pid=1167) with MAAS version 3.3.4-13189-g.f88272d1e.
2023-08-09 15:02:03 ClusterClient,client: [info] ClusterClient connection lost (HOST:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=46564, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=5251, flowInfo=0, scopeID=0))
2023-08-09 15:02:03 provisioningserver.rpc.clusterservice: [critical] Failed to contact region. (While requesting RPC info at http://$DEPLOY_B_IP:5240/MAAS).
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 661, in callback
self._startRunCallbacks(result)
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
self._runCallbacks()
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
current_context.run(_inlineCallbacks, r, gen, status)
--- <exception caught here> ---
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1299, in _doUpdate
eventloops, maas_url = yield self._get_rpc_info(urls)
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1558, in _get_rpc_info
raise config_exc
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1529, in _get_rpc_info
eventloops, maas_url = yield self._parallel_fetch_rpc_info(urls)
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1503, in handle_responses
errors[0].raiseException()
File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
raise self.value.with_traceback(self.tb)
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1464, in _serial_fetch_rpc_info
raise last_exc
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1456, in _serial_fetch_rpc_info
response = yield self._fetch_rpc_info(url, orig_url)
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
result = current_context.run(
File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1558, in _get_rpc_info
raise config_exc
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1529, in _get_rpc_info
eventloops, maas_url = yield self._parallel_fetch_rpc_info(urls)
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1503, in handle_responses
errors[0].raiseException()
File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
raise self.value.with_traceback(self.tb)
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
result = current_context.run(
File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1464, in _serial_fetch_rpc_info
raise last_exc
File "/usr/lib/python3/dist-packages/provisioningserver/rpc/clusterservice.py", line 1456, in _serial_fetch_rpc_info
response = yield self._fetch_rpc_info(url, orig_url)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.defer.CancelledError: >]
The regiond
around that time seem benign, tho:
023-08-09 15:01:53 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:$DEPLOY_A_IP:46624'.
2023-08-09 15:01:53 maasserver.ipc: [info] Worker pid:1167 registered RPC connection to ('yadpqf', '$DEPLOY_A_IP', 5251).
2023-08-09 15:01:54 maasserver.dhcp: [info] Successfully configured DHCPv4 on rack controller 'deploy (yadpqf)'.
2023-08-09 15:01:54 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'deploy (yadpqf)'.
2023-08-09 15:02:03 RegionServer,3,::ffff:$DEPLOY_A_IP: [info] RegionServer connection lost (HOST:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=5251, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=46564, flowInfo=0, scopeID=0))
2023-08-09 15:02:03 maasserver.ipc: [info] Worker pid:1167 lost RPC connection to ('yadpqf', '$DEPLOY_A_IP', 5251).
2023-08-09 15:02:03 maasserver.dhcp: [info] Successfully configured DHCPv4 on rack controller 'deploy (yadpqf)'.
2023-08-09 15:02:03 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'deploy (yadpqf)'.
2023-08-09 15:02:23 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=5251, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=51808, flowInfo=0, scopeID=0))
2023-08-09 15:02:23 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=5251, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=51824, flowInfo=0, scopeID=0))
2023-08-09 15:02:23 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=5251, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:$DEPLOY_A_IP', port=51832, flowInfo=0, scopeID=0))
2023-08-09 15:02:23 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:$DEPLOY_A_IP:51808'.
2023-08-09 15:02:23 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:$DEPLOY_A_IP:51824'.
2023-08-09 15:02:23 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:$DEPLOY_A_IP:51832'.
2023-08-09 15:02:24 maasserver.ipc: [info] Worker pid:1167 registered RPC connection to ('yadpqf', '$DEPLOY_A_IP', 5251).
2023-08-09 15:02:24 maasserver.ipc: [info] Worker pid:1167 registered RPC connection to ('yadpqf', '$DEPLOY_A_IP', 5251).
2023-08-09 15:02:24 maasserver.ipc: [info] Worker pid:1167 registered RPC connection to ('yadpqf', '$DEPLOY_A_IP', 5251).
2023-08-09 15:02:25 maasserver.dhcp: [info] Successfully configured DHCPv4 on rack controller 'deploy (yadpqf)'.
2023-08-09 15:02:25 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'deploy (yadpqf)'.
The machine is a VM with all of rackd
, regiond
, and PostgreSQL installed.
It has 16G RAM and 16 Cores.
How do I start to debug this?
I already tried setting num_workers: 8
in regiond.conf
Best regards
-Tobias