Network troubleshooting: Known bads

This appendix lists frequent, high-impact failure patterns (the “known bads”), including quick signatures, fast confirmation steps, and practical fixes. Use it as a first-pass triage aid before deep dives.


KB-01: Rogue DHCP on the VLAN

Signature

  • Two DHCP offers from different MACs or with conflicting options.
  • Nodes pick unexpected gateway or DNS.

Fast Confirm

tcpdump -n -e -vv -i <iface> '(port 67 or 68)' -c 200

Look for multiple OFFERs with different siaddr or option 66/67.

Fix

  • Disable libvirt or LXD dnsmasq on hosts attached to the production VLAN.
  • Shut down consumer routers accidentally bridged into the fabric.
  • Keep only one authoritative DHCP per broadcast domain.

KB-02: DHCP Relay Mispointed or Missing

Signature

  • No offers when clients and rack are on different subnets.
  • Offers appear when client is moved to the rack subnet.

Fast Confirm

  • Check switch or router ip helper targets.
  • Capture on rack interface to see no inbound relayed packets.

Fix

  • Add or correct helper-address to point at rack IP.
  • Repeat per VRF if VRFs are in use.

KB-03: STP Blocks Port During PXE

Signature

  • Client link up, but PXE times out within 5 to 10 seconds.
  • Switch shows listening/learning state during boot.

Fast Confirm

show spanning-tree interface <port> detail

Fix

  • Enable PortFast or Edge on the access port.
  • Avoid loop guard or root guard on server access ports unless required.

KB-04: MTU Mismatch Across Overlays

Signature

  • tracepath shows PMTU reductions or “too big” ICMP.
  • HTTP image fetches stall partway.

Fast Confirm

tracepath <rack-ip>

Fix

  • Align MTU end to end; add 50–100 bytes headroom for VXLAN/GRE/MACsec.
  • Set the same MTU on NIC, bond, bridge, and uplink.

KB-05: SSL Interception Breaks Commissioning

Signature

  • apt or curl fails with X.509 errors in ephemeral OS.
  • Proxy works on desktops but not during commissioning.

Fast Confirm

curl -v https://archive.ubuntu.com/ 2>&1 | head -n 20

Fix

  • Install corporate root CA in ephemeral and deployed images.
  • Bypass SSL bumping for MAAS metadata and Ubuntu mirrors if possible.

KB-06: Libvirt or LXD dnsmasq Conflict

Signature

  • DHCP offers originate from 192.168.122.1 or lxdbr0 host.
  • Two offers with different lease times/options.

Fast Confirm

ps aux | grep -E 'dnsmasq|libvirtd|lxd'

Fix

  • Disable default libvirt dnsmasq and LXD dnsmasq on hosts bridged to production.
  • Isolate lab bridges from MAAS fabrics.

KB-07: DHCP Snooping or IP Source Guard Drops Replies

Signature

  • Discoveries seen, but offers never reach clients.
  • Works when rack moved to a different switchport.

Fast Confirm

  • Check switch show dhcp snooping for bindings and trusted ports.
  • Packet capture on rack shows offers leaving but not arriving at client.

Fix

  • Mark rack port as trusted.
  • Verify snooping database and source guard policies allow rack replies.

KB-08: Option 82 Expectations Mismatch

Signature

  • DHCP server ignores relayed requests, or client ignores offers.
  • Only fails behind specific relay devices.

Fast Confirm

  • Check relay policy for option 82 insertion or stripping.
  • Inspect offers in capture for relay agent options.

Fix

  • Standardize consistent option 82 handling.
  • If using external DHCP, ensure next-server and bootfile are set correctly.

KB-09: Wrong VLAN or Native VLAN on Access Port

Signature

  • No DHCP seen, or DHCP from an unexpected subnet.
  • LLDP shows the wrong VLAN name/ID.

Fast Confirm

lldpcli show neighbors
tcpdump -i <iface> '(port 67 or 68)'

Fix

  • Set the intended untagged/native VLAN on the access port.
  • Verify trunk allowed-VLANs for hypervisors or bonded hosts.

KB-10: Proxy Auth or Category Filter Blocks APT

Signature

  • 407 Proxy Auth Required, or 403 Blocked by Category.
  • Commissioning reaches metadata but Curtin fails.

Fast Confirm

apt-config dump | grep -i proxy
curl -I http://<rack-ip>:3128/

Fix

  • Inject APT proxy credentials via cloud-init.
  • Allowlist Ubuntu mirrors and Snap endpoints.

KB-11: DNS Split-Horizon Confusion

Signature

  • Name resolves differently from ephemeral vs deployed node.
  • Intermittent “Temporary failure in name resolution”.

Fast Confirm

resolvectl status
dig @<maas-dns-ip> <node>.maas A +noall +answer +ttlid

Fix

  • Set forwarders in MAAS for corporate zones.
  • Ensure deployed nodes use the intended stub or recursive resolver.

KB-12: UEFI HTTP Boot vs Legacy PXE Mismatch

Signature

  • Firmware attempts HTTP boot but TFTP is configured (or vice versa).
  • DHCP options 66/67 correct but firmware ignores them.

Fast Confirm

  • Read firmware boot entries and mode (UEFI vs Legacy).
  • Observe client requests in capture: HTTP vs TFTP.

Fix

  • Align firmware mode with MAAS boot method.
  • Prefer UEFI HTTP boot where supported.

KB-13: Predictable Interface Name Mismatch in Netplan

Signature

  • Deployed node returns without network.
  • Netplan references eno1 but hardware exposes enp3s0.

Fast Confirm

ip link
netplan get
journalctl -u systemd-networkd

Fix

  • Update MAAS interface mapping or disable renaming if appropriate.
  • Re-render netplan and apply.

KB-14: Region and Rack Out of Sync or Unreachable

Signature

  • Rack shows offline in UI.
  • Machines fail to fetch metadata despite DHCP success.

Fast Confirm

ss -ltnup | grep 5240
ip route get <region-ip>
journalctl -u snap.maas.rackd

Fix

  • Open required ports and correct routing between rack and region.
  • Restart services after link recovery; verify time sync.

KB-15: MAC Limits and Port Security

Signature

  • Node fails on first boot after cable moves.
  • Switch shows security violation on the port.

Fast Confirm

show port-security interface <port>
show mac address-table interface <port>

Fix

  • Increase max MACs on access ports for provisioning.
  • Clear sticky MAC entries or disable sticky on lab ports.

KB-16: NTP Unreachable, Time Skew

Signature

  • Token or TLS failures during commissioning.
  • Logs show clock skew warnings.

Fast Confirm

chronyc tracking

Fix

  • Permit UDP 123 to NTP servers.
  • Configure NTP sources reachable from ephemeral networks.

KB-17: HTTP Keep-Alive Disabled on Proxy

Signature

  • Many short HTTP requests are slow.
  • Proxy shows high connection churn.

Fast Confirm

  • Examine proxy config and access logs for reuse.
  • Measure with curl -v and look for “Connection: close”.

Fix

  • Enable keep-alive and tune connection pooling on the rack proxy.
  • Upgrade or scale out proxies under load.

KB-18: IPv6 RAs or DHCPv6 Interfering

Signature

  • Clients pick IPv6 route and fail to reach IPv4-only endpoints.
  • Logs show DHCPv6 attempts in mixed networks.

Fast Confirm

tcpdump -i <iface> 'icmp6 or port 546 or 547'

Fix

  • Ensure dual-stack reachability end to end, or limit to IPv4 during provisioning.
  • Set correct router advertisements for desired behavior.

KB-19: Content Filter Blocks Metadata Path

Signature

  • Curl to /MAAS/ returns 403 through middleboxes.
  • Works on admin subnet but not on commissioning VLAN.

Fast Confirm

curl -I http://<region-ip>:5240/MAAS/

Fix

  • Allowlist region metadata endpoints.
  • Remove path-based filtering for commissioning subnets.

KB-20: TFTP Bound to Wrong Interface

Signature

  • DHCP succeeds, TFTP times out, rack has multiple NICs.

Fast Confirm

ss -uanp | grep ':69 '

Fix

  • Bind TFTP to the VLAN interface serving PXE clients.
  • Prefer HTTP boot if firmware supports it.

Quick Triage Loop

  1. Reproduce and capture just long enough to confirm the signature.
  2. Match to a known bad pattern above.
  3. Apply the minimal fix and retest.
  4. If still failing, escalate to the symptom-driven playbooks and tools catalog.

Next Steps

Wrap up with Appendix E: Lab Patterns and Cookbook for Multipass, LXD, and KVM environments that coexist peacefully with MAAS.