Hello guys!
I will first try to explain what’s happening and then proceed with the question.
I’m noticing that some of my Deploys sometimes fails because the Deploy stuck in some steps, for example:
HTTP Request - /images/ubuntu/amd64/ga-22.04/jammy/stable/boot-initrd
or
Node installation - 'cloudinit' searching for network data from DataSourceMAAS
And I saw that this happens when some of MaaS services restarts.
Example of one Rack Controller status:
Is valid to say that sometimes just the proxy service restarts, but almost everytime its proxy, http and syslog restarting together.
Then I dug in MaaS code and was able to see that we have some Postgres Triggers, for example:
subnet_sys_proxy_subnet_insert
subnet_sys_proxy_subnet_update
subnet_sys_proxy_subnet_delete
This triggers are responsible to make the proxy restart on regions.
I began to search to see if I saw something similar to Rack Controllers, but I didn’t found yet.
Other thing that I noticed is that the table maasserver_regioncontrollerprocessendpoint updates a lot, changing the registers constantly. My guess is that the services are restarting everytime that MaaS renovates its connections between Rack Controllers and Regions (and consequently the table registers). Is that makes sense?
Example of the table:
maasv2=# select * from maasserver_regioncontrollerprocessendpoint;
id | created | updated | address | port | process_id
--------+-------------------------------+-------------------------------+----------------+------+------------
316390 | 2024-01-10 13:48:20.719268+00 | 2024-01-10 13:48:20.719268+00 | IP_HERE | 5251 | 8440
316398 | 2024-01-10 13:48:40.321683+00 | 2024-01-10 13:48:40.321683+00 | IP_HERE | 5252 | 8483
316367 | 2024-01-10 13:36:29.920189+00 | 2024-01-10 13:36:29.920189+00 | IP_HERE | 5250 | 8421
316368 | 2024-01-10 13:36:30.771181+00 | 2024-01-10 13:36:30.771181+00 | IP_HERE | 5251 | 8425
316369 | 2024-01-10 13:36:31.735024+00 | 2024-01-10 13:36:31.735024+00 | IP_HERE | 5252 | 8428
316370 | 2024-01-10 13:36:32.890751+00 | 2024-01-10 13:36:32.890751+00 | IP_HERE | 5253 | 8430
316384 | 2024-01-10 13:48:09.747454+00 | 2024-01-10 13:48:09.747454+00 | IP_HERE | 5250 | 8431
316387 | 2024-01-10 13:48:17.713554+00 | 2024-01-10 13:48:17.713554+00 | IP_HERE | 5253 | 8438
316391 | 2024-01-10 13:48:27.187671+00 | 2024-01-10 13:48:27.187671+00 | IP_HERE | 5252 | 8435
316400 | 2024-01-10 13:50:05.606275+00 | 2024-01-10 13:50:05.606275+00 | IP_HERE | 5250 | 8488
316401 | 2024-01-10 13:50:07.736542+00 | 2024-01-10 13:50:07.736542+00 | IP_HERE | 5251 | 8491
316402 | 2024-01-10 13:50:09.913182+00 | 2024-01-10 13:50:09.913182+00 | IP_HERE | 5253 | 8493
(12 rows)
Other question is, what triggers MaaS to renew these connections between RCs <-> Regions (and update the table all the time)? Why its not stable? There is something that I can do to avoid that behavior?
My setup has 3 regions running with HA and 19 RCs.
Looking forward for your help!
Thanks!