Hello! One of our MaaS environments exhibited a strange sequence of log messages yesterday. I have the logs from the syslog, since unfortunately the journal rolled by the time I looked:
2026-04-16T15:18:29.264719+00:00 smaasc1008 maas-temporal[763597]: {"level":"error","ts":"2026-04-16T15:18:29.263Z","msg":"transaction rollb
ack error","error":"sql: transaction has already been committed or rolled back","logging-call-at":"common.go:83","stacktrace":"go.temporal.i
o/server/common/log.(*zapLogger).Error\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/log/zap_logger.go:156\ngo.temporal.io/server/com
mon/persistence/sql.(*SqlStore).txExecute\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/sql/common.go:83\ngo.temporal.io/
server/common/persistence/sql.(*sqlTaskManager).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/sql/task.go
:149\ngo.temporal.io/server/common/persistence.(*taskManagerImpl).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persi
stence/task_manager.go:122\ngo.temporal.io/server/common/persistence.(*taskRateLimitedPersistenceClient).UpdateTaskQueue\n\t/build/temporal-
dq8CT2/temporal-1.24.2/src/common/persistence/persistence_rate_limited_clients.go:523\ngo.temporal.io/server/common/persistence.(*taskPersis
tenceClient).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_metric_clients.go:598\ngo.temporal
.io/server/common/persistence.(*taskRetryablePersistenceClient).UpdateTaskQueue.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/p
ersistence/persistence_retryable_clients.go:700\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/build/temporal-dq8CT2/tempora
l-1.24.2/src/common/backoff/retry.go:143\ngo.temporal.io/server/common/persistence.(*taskRetryablePersistenceClient).UpdateTaskQueue\n\t/bui
ld/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_retryable_clients.go:704\ngo.temporal.io/server/service/matching.(*tas
kQueueDB).UpdateState\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/db.go:177\ngo.temporal.io/server/service/matching.(*tas
kReader).persistAckLevel\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:304\ngo.temporal.io/server/service/ma
tching.(*taskReader).getTasksPump\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:202\ngo.temporal.io/server/i
nternal/goro.(*Group).Go.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/internal/goro/group.go:58"}
2026-04-16T15:18:29.264956+00:00 smaasc1008 maas-temporal[763597]: {"level":"error","ts":"2026-04-16T15:18:29.264Z","msg":"Operation failed
with internal error.","error":"Failed to lock task queue. Error: context canceled","operation":"UpdateTaskQueue","logging-call-at":"persiste
nce_metric_clients.go:1314","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/
common/log/zap_logger.go:156\ngo.temporal.io/server/common/persistence.updateErrorMetric\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/commo
n/persistence/persistence_metric_clients.go:1314\ngo.temporal.io/server/common/persistence.(*metricEmitter).recordRequestMetrics\n\t/build/t
emporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_metric_clients.go:1291\ngo.temporal.io/server/common/persistence.(*taskPer
sistenceClient).UpdateTaskQueue.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_metric_clients.go:596\ngo
.temporal.io/server/common/persistence.(*taskPersistenceClient).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persist
ence/persistence_metric_clients.go:598\ngo.temporal.io/server/common/persistence.(*taskRetryablePersistenceClient).UpdateTaskQueue.func1\n\t
/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_retryable_clients.go:700\ngo.temporal.io/server/common/backoff.Thr
ottleRetryContext\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/backoff/retry.go:143\ngo.temporal.io/server/common/persistence.(*task
RetryablePersistenceClient).UpdateTaskQueue\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/persistence/persistence_retryable_clients.g
o:704\ngo.temporal.io/server/service/matching.(*taskQueueDB).UpdateState\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/db.g
o:177\ngo.temporal.io/server/service/matching.(*taskReader).persistAckLevel\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:304\ngo.temporal.io/server/service/matching.(*taskReader).getTasksPump\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:202\ngo.temporal.io/server/internal/goro.(*Group).Go.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/internal/goro/group.go:58"}
2026-04-16T15:18:29.265103+00:00 smaasc1008 maas-temporal[763597]: {"level":"error","ts":"2026-04-16T15:18:29.264Z","msg":"Persistent store operation failure","component":"matching-engine","wf-task-queue-name":"/_sys/4y84xy@agent:main/2","wf-task-queue-type":"Workflow","wf-namespace":"default","worker-build-id":"_unversioned_","store-operation":"update-task-queue","error":"Failed to lock task queue. Error: context canceled","logging-call-at":"task_reader.go:205","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/common/log/zap_logger.go:156\ngo.temporal.io/server/service/matching.(*taskReader).getTasksPump\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/service/matching/task_reader.go:205\ngo.temporal.io/server/internal/goro.(*Group).Go.func1\n\t/build/temporal-dq8CT2/temporal-1.24.2/src/internal/goro/group.go:58"}
This sequence seems fairly scary, especially because since it occurred we see this error repeating, associated with a DHCP server restart:
2026-04-16T15:20:02.557714+00:00 smaasc1008 maas-agent[763896]: ERR Activity error. ActivityType=apply-dhcp-config-via-omapi Attempt=1 Error="unable to decode the activity function input payload with error: payload item 0: unable to decode: invalid IP address: None for function name: apply-dhcp-config-via-omapi" Namespace=default RunID=b3011ae7-bff5-45ac-942e-10da2118aca2 TaskQueue=4y84xy@agent:main WorkerID=4y84xy@agent:763896 WorkflowID=configure-dhcp:4y84xy
The DHCP server restarts also unfortunately seem to be affecting successful PXE boot and DHCP renewals for MaaS-managed machines, causing various automated workflows to fail.
A well-intentioned but admittedly desperate attempt to recover the system by rebooting failed to effect any change in behavior.
I’m unclear as to what cleanup is needed because the IP address of the MaaS controller is correctly reported through the web UI, so I’m struggling to determine where the IP address was set to None. Any help would be greatly appreciated - thank you in advance!