Maas 2.9.2 (2.9 stable)
We’re experiencing some pain with “long distance” rack to region communications and deployments. We have a HA 3 node region controller setup as per maas docs in the US, with rack controllers in melbourne Austrailia (and many other places), Commissions work just fine, deployments however keep giving us TFTP timeouts, they eventually work, but the failures cause a lot of pain for the engineers trying to build/rebuild hosts as they need to retry multiple times for success. It is between 1 in 2 or 1 in 3 that WORK which is painful.
The fact the commissions work reasonably reliably but deploys fail a LOT MORE at the TFTP step and Metadata fetch steps tells me this is a latency, packet loss or routing related issue, as I’m making the assumption that during deployment there’s more rack<->region communication going on, during the dhcp/tftp/initial bootup phase.
Round trip time is about 268 ms from austrailia to US.