Structured checklists and runbooks keep troubleshooting consistent and repeatable. This document provides intake, failure-specific, and periodic health-check procedures tailored for MAAS.
Intake checklist
Use this checklist to establish a clear network baseline before troubleshooting.
Before you begin, capture key environmental details:
- Diagram: fabrics, VLANs, and switchport mappings
- Confirm: DHCP mode (MAAS, relay, external)
- Firmware: PXE/HTTP boot order enabled
- NTP: sources reachable from nodes and racks
- DNS: MAAS authoritative zones and forwarders configured
- Proxy: rack proxy reachable on TCP 3128
- Images: synced and available in MAAS
- Logs: collect
journalctl -u snap.maas.*
PXE failure runbook
Use this runbook to verify PXE readiness before investigating deeper DHCP or boot issues.
- Verify link and VLAN on the client port:
ip link show tcpdump -i <iface> port 67 or 68 - Confirm only MAAS DHCP is active on the VLAN:
ps aux | grep dnsmasq - Check relay or IP helper configuration on the upstream switch or router.
- Ensure the rack controller has DHCP enabled for the VLAN:
maas $PROFILE vlan read <fabric-id> <vid> | jq '.dhcp_on' - Capture DHCP discover/offer/ack packets to confirm the full handshake.
Commissioning failure runbook
Use this runbook when nodes fail to commission or stop early in the process.
- Confirm the kernel and initrd were downloaded successfully.
- Test metadata reachability from the ephemeral environment:
curl -I http://<region-ip>:5240/MAAS/ - Check rack logs for DHCP, proxy, or TFTP errors:
journalctl -u snap.maas.rackd - Validate DNS resolution from the ephemeral environment:
dig @<maas-dns-ip> rackd.maas - Review
/var/log/cloud-init.logand/var/log/curtin/install.logfor errors.
Deploy failure runbook (curtin/apt)
Use this runbook to diagnose deployment failures related to proxy or package retrieval.
- Confirm proxy configuration:
apt-config dump | grep -i proxy - Test package mirror reachability:
curl -I http://archive.ubuntu.com/ - If SSL interception is present, ensure the corporate CA is installed.
- Review installer and curtin logs:
less /var/log/installer/syslog less /var/log/curtin/install.log - Verify NTP synchronization to prevent clock-skew issues.
Post-deploy network down runbook
Use this runbook when a deployed system comes up without network connectivity.
- Log in to the console and inspect netplan configuration:
netplan get - Review systemd-networkd logs for errors:
journalctl -u systemd-networkd - Compare interface names between MAAS and the node.
- Adjust and reapply netplan if necessary:
netplan try netplan apply
Periodic health-check runbook
Use this runbook to perform regular validation of MAAS and network health.
Run these checks weekly or monthly to maintain reliability:
- List recent warnings:
maas $PROFILE events query level=WARNING limit=50 - Confirm rack controllers are online (UI or CLI).
- Test DHCP offers on each VLAN using
tcpdump. - Test metadata endpoint from an isolated host.
- Check NTP offset with
chronyc tracking. - Inspect rack health with
toporiostat. - Rotate logs and archive old pcaps or evidence.
Escalation bundle
Use this procedure to collect standard artifacts for escalation to higher-level support.
sudo maas dumpstate # if available
journalctl -u snap.maas.regiond -u snap.maas.rackd > /tmp/maas-services.log
ip a > /tmp/ip-a.txt
ip r > /tmp/ip-r.txt
bridge vlan show > /tmp/bridge-vlan.txt
resolvectl status > /tmp/resolvectl.txt
dig @<maas-dns-ip> <node>.maas A > /tmp/dns.txt
tar czf /tmp/maas-evidence-$(date +%F).tar.gz /tmp/*.txt /tmp/*.log
Include an sosreport to capture complete system diagnostics for kernel, storage, and network context:
# collect sosreport for full system diagnostics
sudo apt install sosreport -y
sudo sosreport --batch --tmp-dir /tmp
Next steps
The next version of this document will include appendices such as a port table, BPF filter cribsheet, switch configuration checklist, known-bads, and lab patterns.