Network troubleshooting: Performance troubleshooting
Sometimes the network is technically “up,” but performance is so poor that commissioning or deployment fails. This document shows how to troubleshoot throughput, latency, and reliability problems in MAAS environments.
Throughput is low, latency looks normal
Symptoms:
- Machines boot, but image downloads take hours.
pingworks fine, butcurtinstalls on large files.
Possible causes:
- MTU mismatch dropping large packets.
- TCP offload features buggy on NIC.
- Proxy or rack controller saturated.
Checks:
iperf3 -c <server> -P 4 -t 30
tracepath <maas-ip>
ethtool -k <iface>
Fixes:
- Align MTU across bridges, bonds, tunnels, and uplinks.
- Disable problematic offloads (
tso,gso,gro) viaethtool -K. - Load-balance across multiple rack controllers.
Latency is high, throughput fine
Symptoms:
- PXE works, downloads succeed, but commissioning takes excessively long.
Possible causes:
- Asymmetric routing.
- Excessive queueing (bufferbloat).
- WAN link to remote racks.
Checks:
mtr -ezbwrc 100 <host>
ss -ti
tc qdisc show
Fixes:
- Correct routing asymmetry.
- Apply smart queue management (FQ_CoDel, CAKE) on congested edges.
- Deploy local racks for remote sites.
Many small HTTP requests are slow
Symptoms:
- Commissioning drags during script fetches.
- Logs show repeated metadata retries.
Possible causes:
- Proxy not reusing connections.
- DNS TTL too short, overloading upstream resolvers.
Checks:
curl -v http://<rack-ip>:3128/
dig +trace <hostname>
Fixes:
- Tune proxy connection reuse.
- Configure longer DNS TTLs in MAAS.
- Use caching resolvers close to racks.
Resource bottlenecks on rack controller
Symptoms:
- Network appears slow only during mass deployment.
- Rack CPU or disk usage spikes.
Checks:
top
iostat -x 5
Fixes:
- Add more rack controllers.
- Use SSD-backed storage for image caches.
- Spread deployments across fabrics.
Checklist for performance issues
- Test throughput with
iperf3. - Trace MTU with
tracepath. - Inspect NIC offloads with
ethtool -k. - Measure latency/packet loss with
mtr. - Inspect rack CPU/disk with
top/iostat. - Confirm DNS and proxy behavior.
Next steps
Once performance is understood, the next section addresses security and compliance intersections: how enterprise controls interact with PXE, DHCP, and commissioning traffic.