Our rackd controllers are not registering with regiond. We took snapshots and backups of a mostly working 3.4.4 system, and upgraded to 3.4.7. We had problems with that and decided to revert for now, so we used our usual method: Restore database backup, revert to snapshots of rackd machines, and turn on the cloned copy of regiond. This time it hasn’t worked.
In a similar fashion to Rack controller not connected after upgrade to 3.2 (region endpoints not exposed) we see no endpoints:
root@maas-rackd-01:~# curl -L https://maas.pawsey.org.au/MAAS/rpc/; echo
{"eventloops": {}}
Using that post as a reference, we found public.maasserver_regioncontrollerprocessendpoint to be empty. We put 4 records back in there from the backup (modified to use current ids), and then things started to work.
root@maas-nimbus-rackd:~# curl -L https://maas.pawsey.org.au/MAAS/rpc
{"eventloops": {"maas:pid=15870": [["$regiondIP", 5252]], "maas:pid=15871": [["$regiondIP", 5251]], "maas:pid=15872": [["$regiondIP, 5253]], "maas:pid=15873": [["$regiondIP", 5250]]}}
The controller page in web UI started to look better, for nearly a minute. Then they dropped to all dead again, and the rpc curl was empty again. The database entries were also gone. We enabled debug, and sure enough it is deleting the entries. So we created them with current datestamps in the understanding that after 60 seconds they are replaced. No luck. Debug logs also show attempts to create, but we see no evidence of that in the DB.
2025-05-06 15:22:14 django.db.backends: [debug] (0.000) INSERT INTO "maasserver_regioncontrollerprocessendpoint" ("created", "updated", "process_id", "address", "port") VALUES ('2025-05-06T15:22:13.999466'::timestamp, '2025-05-06T15:22:13.999466'::timestamp, 6819, '$regiondIP'::inet, 5253) RETURNING "maasserver_regioncontrollerprocessendpoint"."id"; args=(datetime.datetime(2025, 5, 6, 15, 22, 13, 999466), datetime.datetime(2025, 5, 6, 15, 22, 13, 999466), 6819, Inet('$regiondIP'), 5253)
2025-05-06 15:22:14 django.db.backends: [debug] (0.061) INSERT INTO "maasserver_regioncontrollerprocessendpoint" ("created", "updated", "process_id", "address", "port") VALUES ('2025-05-06T15:22:14.021752'::timestamp, '2025-05-06T15:22:14.021752'::timestamp, 6819, '$regiondIP'::inet, 5253) RETURNING "maasserver_regioncontrollerprocessendpoint"."id"; args=(datetime.datetime(2025, 5, 6, 15, 22, 14, 21752), datetime.datetime(2025, 5, 6, 15, 22, 14, 21752), 6819, Inet('$regiondIP'), 5253)
I can paste that command into a psql window and it creates, but then gets deleted again.
CPU is also full 100% on regiond, and over 1000 events are in the queue, which look attempted machine status updates, presumably not working because of controller issues.
$ maas $USER events query limit=1000 | jq '.events | length'
1000
Please help!