Network troubleshooting: MAAS deep dives

Network troubleshooting: MAAS-specific deep dives

Some issues only appear in the way MAAS itself uses the network. This document explains how MAAS integrates with DHCP, PXE, metadata, DNS, and proxies, and how to troubleshoot when things go wrong.


DHCP modes in MAAS

MAAS supports several DHCP configurations:

  • MAAS-managed DHCP on a VLAN

    • MAAS provides DHCP directly on that broadcast domain.
    • Check with:
    maas $PROFILE vlan read <fabric-id> <vid> | jq '.dhcp_on'
    
  • DHCP relay / IP helper

    • Switch or router relays DHCP requests to the MAAS rack controller.
    • Ensure the relay points to the rack controller’s IP.
  • External DHCP server

    • MAAS does not provide DHCP.
    • You must configure next-server and bootfile-name options to point to MAAS.
  • Proxy DHCP

    • MAAS provides only boot parameters; another DHCP server provides IP addresses.

Common pitfalls:

  • Two DHCP servers active on the same VLAN.
  • Relay pointing to the wrong address.
  • Missing option 66/67 for external DHCP.

PXE vs HTTP boot

  • Legacy PXE (TFTP):
    • Relies on UDP 69. Sensitive to latency and firewalls.
  • UEFI HTTPBoot:
    • Fetches bootloaders directly via HTTP.
    • Faster and more reliable than TFTP.

Troubleshooting:

tcpdump -i <iface> port 69
curl -I http://<rack-ip>/images/pxelinux.0

Fabrics, VLANs, and spaces

  • MAAS maps physical fabrics and VLANs.
  • Commissioning often occurs on a “PXE VLAN” while deployment switches to a “Production VLAN.”
  • Spaces provide logical grouping for services (e.g. “storage,” “dmz”).

Checks:

maas $PROFILE subnets read | jq '.[].cidr'
maas $PROFILE vlans read <fabric-id>

Bonds and bridges

  • Hosts running MAAS often use Linux bridges or bonds to provide PXE networks.
  • Virtual labs (libvirt, LXD) add their own default bridges, which can conflict with MAAS DHCP.

Troubleshooting:

bridge link
ip -d link show type bond

Metadata and image delivery

  • Ephemeral and deployed nodes fetch metadata from the region controller.
  • Curtin pulls images over HTTP (via the rack proxy).

Check:

curl -I http://<region-ip>:5240/MAAS/

Common issues:

  • Firewall between commissioning VLAN and region.
  • Proxy misconfiguration.
  • SSL interception without trusted CA in ephemeral OS.

DNS integration

  • MAAS provides authoritative DNS for its managed domains.
  • Forwarding to upstream resolvers is configurable.
  • Reverse zones are auto-generated.

Checks:

dig @<maas-dns-ip> <node>.maas A
dig @<maas-dns-ip> -x <ip>

Pitfalls:

  • Stale DNS entries after machine reallocation.
  • Conflicts with systemd-resolved on MAAS host.

Proxies and mirrors

  • MAAS rack controllers proxy HTTP(S) traffic.
  • Ensures ephemeral machines can access package mirrors even if isolated.

Checks:

curl -I http://<rack-ip>:3128/

Pitfalls:

  • Corporate SSL bumping breaks commissioning.
  • Proxy overload slows image installs.

Rack ↔ Region communication

  • Rack and region must maintain continuous connectivity.
  • Lost contact shows racks as “offline” in the UI.

Checks:

journalctl -u snap.maas.rackd
ip route get <region-ip>

Next steps

With MAAS-specific behaviors in mind, the next reference tackles environment and topology gotchas, such as relays, VRFs, snooping, and overlays.