Change the nature of network testing

Originally filed as support issue 00291485 (Aug 2020). All testing is currently performed “inside” the managed system (pre-deployment). However, as we are providing Edge services this is not useful. As the final step of Deployment we “swing” the system to be on a public (or customer private) network. So any network testing done before is of no significance. As we do maintain connectivity to the IPMI of the deployed nodes, monitoring those connections from the Rack would be helpful. Obviously, this can be done outside of MAAS, but even a simplistic check (e.g. poll power every N minutes) would make MAAS a better tool for Operations to monitor system health.

Aside from that level of simplistic health check, while the systems are directly connected (e.g. in the READY state) a more complete network health monitoring (e.g. periodically run iperf from rack to managed systems). Again, this can be coded outside of MAAS, but making MAAS more generally functional is desirable.

As for how this could be implemented, “allow ssh” could be a default for commissioned systems, with a MAAS rack controller credential, and the rack controller would then periodically kick off one network test or another.

Thanks for the suggestion.

I think we should overhaul how MAAS checks on the status of a machine. Instead of issuing the power command and logging the result on the events table this information should be tracked in its own table. MAAS could then have multiple methods to check on the status of a machine(BMC, ping, HTTP, SSH, etc). This information could then be displayed as a graph.

We discussed running iperf on the rack controller but decided against it because most people expect iperf output to be measured throughput. If multiple machines are running iperf against the same rack controller while deployments are happening iperf throughput will be lower. This may lead users into thinking there is a problem with throughput when really the rack controller is just being overloaded. Where iperf is run is also important. If you care about the throughput of an interface you may test with two servers connected directly to eachother. If you care about switch speed testing should be done with two servers connected to the same switch. You may also want to know throughput between availability zones. There are many other scenarios which make where iperf is run important which MAAS can’t figure out.

One thing to keep in mind is that networking testing can be run on a deployed machine however the machine will have to boot into an ephemeral environment.

How would one boot into an ephemeral environment on a deployed machine? At first blush, that seems like a MAAS syntax error (viz. “testing” against a deployed system without release?)

For us, this is only of academic interest …it wouldn’t help us in our Production systems, because once a machine is deployed it is taken off our internal network and only plumbed to the customer’s network (or public internet) so PXE booting is impossible (until the customer release, which has to do a bunch of tasks before MAAS release to sanitize the system and bring it back to our network.

So the RFE “stands” it would be nice if a family of scripts (and, like the current TEST scripts, allow customer added scripts) to be managed and kicked off by MAAS … but from the RACK rather than the managed systems. I take your point about the user needing to be cognizant of the load on the rack (and/or MAAS to monitor that … clearly adding a high network load (ala iperf) to a network constrained Rack while it is doing a lot of MAAS operations would be unwise … but that is exactly why it would be nice for MAAS to mange it … MAAS should be clever enough not to kick off such a test in the middle of multiple network heavy operations (and to perhaps pause such operations for the duration of the test ;>). Coding this up outside of MAAS is more difficult, and in terms of pausing MAAS operations somewhat infeasible