MAAS Rack Controller agent fails to connect to Temporal - "context deadline exceeded" causing PXE boot 502 errors

1

I have a MAAS 3.7.2 deployment with separate region and rack controllers. The rack controller (10.x.y.51) cannot connect to the Temporal service on the region controller (10.x.y.50), causing the maas-agent service to crash in a restart loop.

This prevents the httpproxy.sock from being created, which results in 502 errors when machines attempt PXE boot.

  - MAAS Version: 3.7.2-17972-g.35e297c4d (both)
  - Temporal Port: 5271 (gRPC frontend)

On Rack Controller
maas-agent Temporal connection failures:

maas-agent[46264]: ERR Temporal client error error="failed reaching server: context deadline exceeded while waiting   for connections to become ready"   maas.pebble[45464]: Service "agent" stopped unexpectedly with code 1   maas.pebble[45464]: Service "agent" on-failure action is "restart", waiting ~500ms before restart (backoff 1) 

PXE boot 502 errors (machine 10.x.z.18 requesting bootx64.efi):

  maas-rackd[45499]: provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.x.z.18
  maas-http[45608]: [crit] connect() to unix:/run/snap.maas/httpproxy.sock failed (2: No such file or directory)
  maas-http[45612]:  127.0.0.1 - - "GET /images/bootx64.efi HTTP/1.1" 502 166

On region controller:

Temporal is running and healthy

  $ sudo netstat -tlnp | grep temporal
  tcp   0   0 127.0.0.1:9000   0.0.0.0:*   LISTEN   temporal-serv
  tcp6  0   0 :::5271          :::*        LISTEN   temporal-serv
  tcp6  0   0 :::5272          :::*        LISTEN   temporal-serv
  tcp6  0   0 :::5273          :::*        LISTEN   temporal-serv

I’ve tried to open firewall tcp/udp all ports between region controller and rack controller and the problem persisted.

1. The Temporal config shows rpcAddress: “localhost:7233” but the server listens on port 5271. Is the rack agent trying to connect to the wrong endpoint?
2. How does the rack controller learn the Temporal endpoint from the region? Is there a configuration file I should check/edit?
3. The Temporal frontend requires TLS client auth (requireClientAuth: true). Could certificate trust be preventing the connection?
4. What’s the correct way to configure the rack’s agent to connect to the region’s Temporal service?

More info:

 maas --version
  MAAS 3.7.2 (3.7.2-17972-g.35e297c4d)
Postgres 16

Any suggestion / guidance is greatly appreciated!