Help recovering from temporal persistent storage failure

jsullivan3 · 17 April 2026 19:53

Hello! One of our MaaS environments exhibited a strange sequence of log messages yesterday. I have the logs from the syslog, since unfortunately the journal rolled by the time I looked:

2026-04-16T15:18:29.264719+00:00 smaasc1008 maas-temporal[763597]: {"level":"error","ts":"2026-04-16T15:18:29.263Z","msg":"transaction rollb
ack error","error":"sql: transaction has already been committed or rolled back","logging-call-at":"common.go:83","stacktrace":"go.temporal.i
o/server/common/log.(*zapLogger).Error\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/log/zap_logger.go:156\ngo.temporal.io/server/com
mon/persistence/sql.(*SqlStore).txExecute\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/sql/common.go:83\ngo.temporal.io/
server/common/persistence/sql.(*sqlTaskManager).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/sql/task.go
:149\ngo.temporal.io/server/common/persistence.(*taskManagerImpl).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persi
stence/task_manager.go:122\ngo.temporal.io/server/common/persistence.(*taskRateLimitedPersistenceClient).UpdateTaskQueue\n\t/build/temporal-
dq8CT2/temporal-1.24.2/src/common/persistence/persistence_rate_limited_clients.go:523\ngo.temporal.io/server/common/persistence.(*taskPersis
tenceClient).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_metric_clients.go:598\ngo.temporal
.io/server/common/persistence.(*taskRetryablePersistenceClient).UpdateTaskQueue.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/p
ersistence/persistence_retryable_clients.go:700\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/build/temporal-dq8CT2/tempora
l-1.24.2/src/common/backoff/retry.go:143\ngo.temporal.io/server/common/persistence.(*taskRetryablePersistenceClient).UpdateTaskQueue\n\t/bui
ld/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_retryable_clients.go:704\ngo.temporal.io/server/service/matching.(*tas
kQueueDB).UpdateState\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/db.go:177\ngo.temporal.io/server/service/matching.(*tas
kReader).persistAckLevel\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:304\ngo.temporal.io/server/service/ma
tching.(*taskReader).getTasksPump\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:202\ngo.temporal.io/server/i
nternal/goro.(*Group).Go.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/internal/goro/group.go:58"}
2026-04-16T15:18:29.264956+00:00 smaasc1008 maas-temporal[763597]: {"level":"error","ts":"2026-04-16T15:18:29.264Z","msg":"Operation failed 
with internal error.","error":"Failed to lock task queue. Error: context canceled","operation":"UpdateTaskQueue","logging-call-at":"persiste
nce_metric_clients.go:1314","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/
common/log/zap_logger.go:156\ngo.temporal.io/server/common/persistence.updateErrorMetric\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/commo
n/persistence/persistence_metric_clients.go:1314\ngo.temporal.io/server/common/persistence.(*metricEmitter).recordRequestMetrics\n\t/build/t
emporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_metric_clients.go:1291\ngo.temporal.io/server/common/persistence.(*taskPer
sistenceClient).UpdateTaskQueue.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_metric_clients.go:596\ngo
.temporal.io/server/common/persistence.(*taskPersistenceClient).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persist
ence/persistence_metric_clients.go:598\ngo.temporal.io/server/common/persistence.(*taskRetryablePersistenceClient).UpdateTaskQueue.func1\n\t
/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_retryable_clients.go:700\ngo.temporal.io/server/common/backoff.Thr
ottleRetryContext\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/backoff/retry.go:143\ngo.temporal.io/server/common/persistence.(*task
RetryablePersistenceClient).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_retryable_clients.g
o:704\ngo.temporal.io/server/service/matching.(*taskQueueDB).UpdateState\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/db.g
o:177\ngo.temporal.io/server/service/matching.(*taskReader).persistAckLevel\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:304\ngo.temporal.io/server/service/matching.(*taskReader).getTasksPump\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:202\ngo.temporal.io/server/internal/goro.(*Group).Go.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/internal/goro/group.go:58"}
2026-04-16T15:18:29.265103+00:00 smaasc1008 maas-temporal[763597]: {"level":"error","ts":"2026-04-16T15:18:29.264Z","msg":"Persistent store operation failure","component":"matching-engine","wf-task-queue-name":"/_sys/4y84xy@agent:main/2","wf-task-queue-type":"Workflow","wf-namespace":"default","worker-build-id":"_unversioned_","store-operation":"update-task-queue","error":"Failed to lock task queue. Error: context canceled","logging-call-at":"task_reader.go:205","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/log/zap_logger.go:156\ngo.temporal.io/server/service/matching.(*taskReader).getTasksPump\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:205\ngo.temporal.io/server/internal/goro.(*Group).Go.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/internal/goro/group.go:58"}

This sequence seems fairly scary, especially because since it occurred we see this error repeating, associated with a DHCP server restart:

2026-04-16T15:20:02.557714+00:00 smaasc1008 maas-agent[763896]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=1 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=b3011ae7-bff5-45ac-942e-10da2118aca2 TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:763896 WorkflowID=configure-dhcp:4y84xy

The DHCP server restarts also unfortunately seem to be affecting successful PXE boot and DHCP renewals for MaaS-managed machines, causing various automated workflows to fail.

A well-intentioned but admittedly desperate attempt to recover the system by rebooting failed to effect any change in behavior.

I’m unclear as to what cleanup is needed because the IP address of the MaaS controller is correctly reported through the web UI, so I’m struggling to determine where the IP address was set to None. Any help would be greatly appreciated - thank you in advance!

javier-fs · 20 April 2026 08:44

Hello @jsullivan3

It seems the Temporal workflow is stuck, possibly due to a corrupted payload from the first error. Since Temporal provides durable execution, the problem remains even after rebooting

If you check your controllers in the MAAS UI, do any of them show a degraded status? If so, that should help to identify the specific controller causing issues with the none IP

In any case, if the workflow payload is corrupted, the best option is to terminate it manually using the temporal CLI (tctl). This is not part of MAAS, so you will need to install it

After that, retrieve the workflow ID for the problematic configure-dhcp workflow, and terminate it

tctl --address $MAAS_IP:5271 workflow terminate --workflow_id $WORKFLOW_ID

Then, MAAS should trigger a new configure-dhdp workflow. If it does not show up after some minutes, restart the maas-agent service, followed by the DHCP services (maas-dhcpd, maas-dhcpd)

jsullivan3 · 20 April 2026 11:30

Excellent - thank you very much, @javier-fs! I dug around a bit and am having difficulty finding the tctl command, but the https://docs.temporal.io/cli/workflow#terminate mentions the temporal command which appears to have the same syntax. I plan to run a couple of experiments with this command, but I’d like to confirm with you that it’s an appropriate binary to use.

I plan to run some tests this morning to get comfortable with temporal and will report back with the results. Thank you again!

jsullivan3 · 20 April 2026 17:23

I realized I didn’t answer your question, @javier-fs - sorry about that… This environment has a single region+rack controller, and the MaaS UI shows no indication of a degraded status for that controller. All services on that controller appear to be normally operating.

I did find the tctl binary here: Releases · temporalio/tctl · GitHub

It appears the temporalbinary is the newer version, and that tctl is now deprecated.

I tried both and got a similar error using each:

$ ./tctl  --address jfs-maas-region-controller-02:5271 workflow list 
DEPRECATION NOTICE: tctl will enter End of Support September 30, 2025. Please transition to Temporal CLI (https://docs.temporal.io/cli).

2026-04-20T17:18:11.585Z        FATAL   Failed to create SDK client     {"error": "failed reaching server: last connection error: connection error: desc = \"error reading server preface: EOF\"", "logging-call-at": "factory.go:122"}

I’ve confirmed that port 5271 is indeed one on which temporal-server is listening:

$ sudo ss -anp | grep "LISTEN.*5271"
tcp   LISTEN    0      4096                                                        *:5271                           *:*      users:(("temporal-server",pid=33059,fd=17))

I tried the other ports on which temporal-server is listening with no change in behavior; each one resulted in the same connection error reported by the client.

I’m fairly confident I’m doing something silly here and would appreciate any insight you can offer.

Thank you again for your time and your help!

John

jsullivan3 · 20 April 2026 17:32

Sometimes just asking the question triggers a thought that reveals the answer…

I took a peek into /var/snap/maas/current/temporal/production.yaml and found that temporal is configured using TLS. I noticed that the client has a slew of TLS-related options, so I used the values specified in the production.yaml file and got a much better response:

$ sudo ./tctl --address localhost:5271 --tls_cert_path /var/snap/maas/x1/certificates/cluster.pem --tls_key_path /var/snap/maas/x1/certificates/cluster.key --tls_ca_path /var/snap/maas/x1/certificates/cacerts.pem --tls_server_name maas workflow list
DEPRECATION NOTICE: tctl will enter End of Support September 30, 2025. Please transition to Temporal CLI (https://docs.temporal.io/cli).

         WORKFLOW TYPE        |             WORKFLOW ID              |                RUN ID                |    TASK QUEUE     | START TIME | EXECUTION TIME | END TIME  
  configure-agent             | configure-agent:4y84xy               | 9212e68c-d3e5-4f19-a402-761764c05df3 | region            | 16:23:48   | 16:23:48       | 16:23:55  
  configure-resolver-service  | 551ee2f9-6606-4c9e-bb8a-1cdf3b8e4c46 | 0650fb93-fc15-4080-bbd1-3e66284be618 | 4y84xy@agent:main | 16:23:54   | 16:23:54       | 16:23:55  
  configure-dhcp-service      | configure-dhcp-service:4y84xy        | 83af4275-8d0b-489e-a432-610c41b648e2 | 4y84xy@agent:main | 16:23:51   | 16:23:51       | 16:23:54  
  configure-dhcp-for-agent    | configure-dhcp:4y84xy                | 666dacca-2cc2-4b34-801c-ee700dbca0f4 | region            | 16:23:52   | 16:23:52       | 16:23:54  
  configure-httpproxy-service | configure-httpproxy-service:4y84xy   | c2bf57c2-e14f-41b3-a02d-58dfcba2d586 | 4y84xy@agent:main | 16:23:50   | 16:23:50       | 16:23:51  
  configure-dhcp              | ddca06fa-1ee2-4877-97b0-5bb9bcfbb81f | e0ba5ee1-7074-4d4c-bd63-a8b1658b8c33 | region            | 16:23:45   | 16:23:45       | 16:23:50  
  configure-power-service     | configure-power-service:4y84xy       | c820767a-422f-43c1-9b98-cc971de70831 | 4y84xy@agent:main | 16:23:49   | 16:23:49       | 16:23:50  
  configure-cluster-service   | configure-cluster-service:4y84xy     | 0ae5d172-fe0f-471d-91fd-6f09638669b9 | 4y84xy@agent:main | 16:23:48   | 16:23:48       | 16:23:49  
  configure-agent             | configure-agent:4y84xy               | 201068fa-420b-4164-9dbd-7300d5ebbc7b | region            | 15:44:44   | 15:44:44       | 15:44:54  
  configure-resolver-service  | 4ec86759-adf3-4652-bc67-c0d0401ddbdc | f9c075e5-655f-4fc9-8ca3-fab752a7df1a | 4y84xy@agent:main | 15:44:52   | 15:44:52       | 15:44:53

This result is from my sandbox environment in which I’m testing. I’ll give it a try on the production environment in a little while and report back with results… Thank you again!

jsullivan3 · 20 April 2026 19:57

Unfortunately, I didn’t have a whole lot of luck cancelling this workflow. In my sandbox environments, I either got an error indicating that a workflow can be cancelled only if it’s running, or I got success cancelling because the workflow was indeed running.

In the production environment, I got an SQL error:

$ sudo ./tctl --address localhost:5271 --tls_cert_path /var/snap/maas/current/certificates/cluster.pem --tls_key_path /var/snap/maas/current/certificates/cluster.key --tls_ca_path /var/snap/maas/current/certificates/cacerts.pem --tls_server_name maas workflow cancel --workflow_id=configure-dhcp:4x84xy
DEPRECATION NOTICE: tctl will enter End of Support September 30, 2025. Please transition to Temporal CLI (https://docs.temporal.io/cli).

Error: Cancel workflow failed.
Error Details: sql: no rows in result set
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)

I tried exporting the environment variable as the response suggested and tried the command again, but I saw no additional information.

Again, I appreciate your help and hope that you can offer some additional insight and suggestions.

John

jsullivan3 · 20 April 2026 20:08

Ah, well - the reason it didn’t work is because I had a typo. When I cancelled the correct workflow ID, the cancellation succeeded.

Unfortunately, the workflow is indeed restarted and the DHCP server restarts continue to occur.

jsullivan3 · 20 April 2026 20:15

The output of journalctl -u snap.maas.pebble.service -t maas-agent -f shows a repetition of messages similar to the following:

Apr 20 20:12:16 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=1 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=164772b3-e861-4d33-8caf-a23460b82efa TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy
Apr 20 20:12:17 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=2 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=164772b3-e861-4d33-8caf-a23460b82efa TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy
Apr 20 20:12:19 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=3 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=164772b3-e861-4d33-8caf-a23460b82efa TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy
Apr 20 20:12:23 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=4 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=164772b3-e861-4d33-8caf-a23460b82efa TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy
Apr 20 20:13:45 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=1 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=a233a301-0e95-4b9e-b9eb-973788f3178c TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy
Apr 20 20:13:46 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=2 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=a233a301-0e95-4b9e-b9eb-973788f3178c TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy
Apr 20 20:13:48 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=3 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=a233a301-0e95-4b9e-b9eb-973788f3178c TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy
Apr 20 20:13:52 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=4 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=a233a301-0e95-4b9e-b9eb-973788f3178c TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy

It would seem that the workflow is timing out and the error is propagating back to maas-agent, which is starting a new workflow with the same ID. Could the issue be with maas-agent? Or perhaps with a row in a database table that maas-agent uses to construct the workflow?

r00ta · 20 April 2026 20:54

Do you have TLS enabled?

jsullivan3 · 20 April 2026 20:56

MaaS native TLS is not enabled, but we are using HAProxy for TLS endpoint termination.

r00ta · 20 April 2026 21:05

3.7 has a bug related to that. I haven’t tested with HA proxy but I suspect that might be broken as well Bug #2148224 “Can't configure DHCP on MAAS 3.7 if TLS is enabled...” : Bugs : MAAS

jsullivan3 · 20 April 2026 21:08

Interesting - and thank you! I’m not seeing any tracebacks in the logs, or any messages indicating an SSL verification error. Would I need to enable some type of verbose logging to see tracebacks similar to the one listed in the ticket?

r00ta · 20 April 2026 21:16

can you check journalctl -t maas-apiserver?

jsullivan3 · 20 April 2026 22:13

I ran journalctl -o short-precise -u snap.maas.pebble.service -t maas-apiserver -t maas-agent and I see nothing in terms of SSL verification errors, but I do see a bunch of messages such as:

Apr 20 21:49:50.485625 smaasc1008.ci.eng.exagrid.com maas-apiserver[2358]: {"message": "Start processing request", "taskName": "Task-2691821
", "request_method": "GET", "request_path": "/MAAS/a/v3internal/agents/4y84xy/services/dhcp/config", "request_query": "", "request_remote_ip
": null, "context_id": "d28f81e3-c74b-4fc5-b601-ab7705f0feee", "timestamp": "2026-04-20T21:49:50.485101Z", "logger": "maasapiserver.v3.middl
ewares.context:34", "level": "INFO", "thread": "MainThread:139124266305344"}

They are fairly quickly followed by:

Apr 20 21:49:52.145513 smaasc1008.ci.eng.exagrid.com maas-apiserver[2358]: {"message": "End processing request", "taskName": "Task-2691821",
 "status_code": 200, "elapsed_time_seconds": 1.6602826118469238, "context_id": "d28f81e3-c74b-4fc5-b601-ab7705f0feee", "timestamp": "2026-04
-20T21:49:52.145331Z", "logger": "maasapiserver.v3.middlewares.context:43", "level": "INFO", "thread": "MainThread:139124266305344"}

They are not always associated with messages like the following, though they do seem to correlate with this particular sample:

Apr 20 21:49:52.238673 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=
1 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=dd9dec45-12ae-404a-a313-f723254238ba TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy
Apr 20 21:49:53.249895 smaasc1008.ci.eng.exagrid.com maas-agent[2910]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=2 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=dd9dec45-12ae-404a-a313-f723254238ba TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:2910 WorkflowID=configure-dhcp:4y84xy

I’m having difficulty determining if the maas-agent messages are caused by the same sequence that caused the maas-apiserver messages. The null IP address certainly seems to correlate with the invalid IP address: None, but I see no other correlation in terms of context, Run ID, or other identifier. Do they appear correlated to you?

jsullivan3 · 21 April 2026 16:21

My team made a discovery this morning as to the reason why the DHCP server is being frequently restarted.

Since the logs do not indicate a correlation between the null IP configuration and the DHCP server restarting (which is the opposite of my original assessment of the environment), we kept digging and found that each physical network interface in our environment with 269 machines is configured with an “Automatic” IP assignment, which appears to default to a static lease. When automated processes trigger a large number of concurrent deployments, the DHCP server is restarted many times in a short time interval because of the high frequency of DHCP configuration file updates to establish the static leases. These restarts can interfere with the PXE boot process of concurrently deploying machines, leading to deployment failures.

A quick proof-of-concept experiment to change the physical network interface’s IP assignment from “Automatic” to “Dynamic” resulted in no DHCP configuration file update and therefore no DHCP server restart, with the VM deployment succeeding as we would expect.

I still think this environment has an issue that is resulting in the null IP errors and the temporal workflow failures, but I think the bigger problem affecting our environment is the “Automatic” IP assignment. Should “Automatic” attempt to use the “Dynamic” if possible, and “Static” only if, for example, the dynamic address pool is exhausted?

r00ta · 21 April 2026 17:10

No, if it’s “automatic/static” and the pool is exhausted it should not take the decision to use the dynamic assignment

r00ta · 21 April 2026 17:11

Nice finding, thanks for the feedback! We’ll investigate

jsullivan3 · 21 April 2026 17:16

Understood. A little knowledge can be dangerous. I found the documentation that describes the different IP assignment values and now understand that the system is working exactly as it should with the Automatic assignment. For others who are reading this thread, here’s the link to the documentation - search for Keyword “mode”.

I think for now we’re going to change the machines we can to use DHCP mode as defined in that document, which should alleviate the number of DHCP server restarts in our environment.

Regarding the feedback - my pleasure. I wish I could also offer an idea or suggestion to improve the behavior…

r00ta · 21 April 2026 17:48

how many deployments in parallel do you start? Reported as Bug #2149832 “Parallel deployments keeps restarting DHCP, causin...” : Bugs : MAAS btw

jsullivan3 · 22 April 2026 13:38

Unfortunately my metrics collection wasn’t working in this environment so I don’t have a history of concurrent deployments that I would have liked to share, but the recent history shows that occurrences of as many as 72 concurrent deploys in the same subnet are common. We also have fairly regular occurrences of 40-50 concurrent deploys.