Network troubleshooting: Performance

Network troubleshooting: Performance troubleshooting

Sometimes the network is technically “up,” but performance is so poor that commissioning or deployment fails. This document shows how to troubleshoot throughput, latency, and reliability problems in MAAS environments.


Throughput is low, latency looks normal

Symptoms:

  • Machines boot, but image downloads take hours.
  • ping works fine, but curtin stalls on large files.

Possible causes:

  • MTU mismatch dropping large packets.
  • TCP offload features buggy on NIC.
  • Proxy or rack controller saturated.

Checks:

iperf3 -c <server> -P 4 -t 30
tracepath <maas-ip>
ethtool -k <iface>

Fixes:

  • Align MTU across bridges, bonds, tunnels, and uplinks.
  • Disable problematic offloads (tso, gso, gro) via ethtool -K.
  • Load-balance across multiple rack controllers.

Latency is high, throughput fine

Symptoms:

  • PXE works, downloads succeed, but commissioning takes excessively long.

Possible causes:

  • Asymmetric routing.
  • Excessive queueing (bufferbloat).
  • WAN link to remote racks.

Checks:

mtr -ezbwrc 100 <host>
ss -ti
tc qdisc show

Fixes:

  • Correct routing asymmetry.
  • Apply smart queue management (FQ_CoDel, CAKE) on congested edges.
  • Deploy local racks for remote sites.

Many small HTTP requests are slow

Symptoms:

  • Commissioning drags during script fetches.
  • Logs show repeated metadata retries.

Possible causes:

  • Proxy not reusing connections.
  • DNS TTL too short, overloading upstream resolvers.

Checks:

curl -v http://<rack-ip>:3128/
dig +trace <hostname>

Fixes:

  • Tune proxy connection reuse.
  • Configure longer DNS TTLs in MAAS.
  • Use caching resolvers close to racks.

Resource bottlenecks on rack controller

Symptoms:

  • Network appears slow only during mass deployment.
  • Rack CPU or disk usage spikes.

Checks:

top
iostat -x 5

Fixes:

  • Add more rack controllers.
  • Use SSD-backed storage for image caches.
  • Spread deployments across fabrics.

Checklist for performance issues

  1. Test throughput with iperf3.
  2. Trace MTU with tracepath.
  3. Inspect NIC offloads with ethtool -k.
  4. Measure latency/packet loss with mtr.
  5. Inspect rack CPU/disk with top/iostat.
  6. Confirm DNS and proxy behavior.

Next steps

Once performance is understood, the next section addresses security and compliance intersections: how enterprise controls interact with PXE, DHCP, and commissioning traffic.