Network troubleshooting: Evidence bundle for escalation
When a network or deployment issue cannot be resolved locally, a complete and consistent evidence bundle accelerates escalation and diagnosis. This document defines what to collect, how to collect it, and how to package it for higher-level support. Note the exact wall-clock time when symptoms start to align logs and events during analysis.
Why evidence matters
Use this section to understand why disciplined evidence collection is critical.
- Ensures the problem is reproducible and visible to others
- Prevents back-and-forth about missing details
- Speeds triage by showing both symptoms and baseline state
- Supports audits and incident response requirements
Standard evidence set
Use this section to collect all relevant MAAS and system-level data before escalation.
MAAS logs
Collect controller and event logs to capture MAAS activity at the time of failure.
-
Region and rack controller journals:
journalctl -u snap.maas.regiond -u snap.maas.rackd > /tmp/maas-services.log -
Recent MAAS events (warnings and errors):
maas $PROFILE events query level=WARNING limit=200 > /tmp/maas-events.json -
Commissioning and deploy logs:
cp /var/log/cloud-init.log /tmp/ cp /var/log/cloud-init-output.log /tmp/ cp /var/log/curtin/install.log /tmp/ cp /var/log/installer/syslog /tmp/installer-syslog.txt
Timebox and filter
Use these steps to capture only the time window around the failure and focus on relevant lines.
# mark incident window (note the exact wall clock times)
date -Is | tee /tmp/incident-start.iso
sleep 120 # reproduce / wait ~2 minutes
date -Is | tee /tmp/incident-end.iso
START=$(cat /tmp/incident-start.iso)
END=$(cat /tmp/incident-end.iso)
# confirm clock and timezone for correlation
timedatectl | tee /tmp/timedatectl.txt
# focus controller journals to the incident window and grep likely culprits
journalctl --since "$START" --until "$END" -u snap.maas.regiond -u snap.maas.rackd | grep -Ei 'dhcp|tftp|pxe|metadata|proxy|curtin|cloud-init|commission|deploy|dns' > /tmp/maas-services-window.log
# slice node logs to the same window
sed -n "/$START/,/$END/p" /var/log/cloud-init.log | grep -Ei 'error|fail|timeout|metadata|network|dns|ntp' > /tmp/cloud-init-window.log
sed -n "/$START/,/$END/p" /var/log/curtin/install.log | grep -Ei 'error|fail|timeout|mirror|proxy|apt|dpkg' > /tmp/curtin-window.log
# filter MAAS events by timestamp window
maas $PROFILE events query level=WARNING limit=1000 > /tmp/maas-events.json
jq -r --arg s "$START" --arg e "$END" 'map(select(.created >= $s and .created <= $e))' /tmp/maas-events.json > /tmp/maas-events-window.json
# add incident markers to system logs (optional but helpful)
logger -t incident "marker: repro start @ $START"
logger -t incident "marker: repro end @ $END"
System state
Collect baseline system configuration to reveal network and routing context.
-
Interfaces and addresses:
ip a > /tmp/ip-a.txt ip -d link > /tmp/ip-dlink.txt -
Routing table:
ip r > /tmp/ip-r.txt -
Bridges and VLANs:
bridge vlan show > /tmp/bridge-vlan.txt -
DNS and resolver state:
resolvectl status > /tmp/resolvectl.txt dig @<maas-dns-ip> <node>.maas A +noall +answer > /tmp/dns.txt -
Listening sockets and services:
ss -ltnup > /tmp/listeners.txt -
Firewall configuration:
nft list ruleset > /tmp/nftables.txt
Targeted captures
Use packet captures to verify DHCP, TFTP, and HTTP traffic flow during commissioning and PXE boot.
-
DHCP handshake:
tcpdump -n -e -vv -i <iface> '(port 67 or 68)' -c 50 -w /tmp/dhcp.pcap -
TFTP and HTTP boot attempts:
tcpdump -n -i <iface> port 69 -c 50 -w /tmp/tftp.pcap tcpdump -n -i <iface> tcp port 80 -c 200 -w /tmp/httpboot.pcap
Performance snapshots
Use performance data to detect system-level constraints contributing to network symptoms.
-
CPU, memory, and disk usage:
top -b -n1 > /tmp/top.txt iostat -x 5 3 > /tmp/iostat.txt -
Network throughput:
iperf3 -c <server> -P 4 -t 20 > /tmp/iperf3.txt
Full system diagnostics
Collect an sosreport to include kernel, storage, and network context.
sudo apt install sosreport -y
sudo sosreport --batch --tmp-dir /tmp
Packaging procedure
Use this procedure to assemble and compress the collected data into a single archive for handoff.
tar czf /tmp/maas-evidence-$(date +%F).tar.gz /tmp/maas-services.log /tmp/maas-events.json /tmp/maas-events-window.json /tmp/cloud-init*.log /tmp/install.log /tmp/installer-syslog.txt /tmp/ip-*.txt /tmp/bridge-vlan.txt /tmp/resolvectl.txt /tmp/dns.txt /tmp/listeners.txt /tmp/nftables.txt /tmp/*.pcap /tmp/top.txt /tmp/iostat.txt /tmp/iperf3.txt /tmp/timedatectl.txt /tmp/incident-*.iso /tmp/sosreport*
Verify the archive contents:
tar tzf /tmp/maas-evidence-$(date +%F).tar.gz | less
Redact sensitive data (passwords, keys, user information) before sharing.
Redaction helpers
Use these tools to sanitize or extract data when confidentiality is required.
-
Mask IP and MAC addresses in packet captures:
editcap -C secret.map in.pcap out.pcap -
Extract only DHCP, DNS, or HTTP streams for analysis:
tshark -r in.pcap -Y 'bootp or dhcp or dns or http' -T fields -e frame.time -e ip.src -e ip.dst
Escalation handoff
Use this checklist to prepare the final escalation package.
Include the following with your tarball:
- Brief description of the issue (symptoms, time of first occurrence, actions taken)
- Affected node or system IDs
- Relevant fabric and VLAN identifiers
- Whether the issue is reproducible and under what conditions
Next steps
After evidence collection, the next appendices provide quick references such as port tables, switch configuration checklists, known bad patterns, and lab network templates.