Power Type APC in MAAS 2.7/2.8. Query fails but on/off works!

I have been trying to use MAAS to setup a local cloud (mostly for juju to deploy k8s and spark clusters for some projects). I have three Machines none of which have AMT/IPMI and have been looking forward to getting an APC PDU (Switched Rack PDU 7900) for integrated power management in MAAS.

Setup the PDC today and getting strange results. After I setup the power type to APC (IP, Outlet number, default delay), power status remains “Unknown”. Hitting the ‘Turn On’ and ‘Turn Off’ however work.

I am assuming that /snap/maas/current/lib/python3.6/site-packages/provisioningserver/drivers/power/apc.py is what is used and it looks reasonable. I installed snmp and ‘snmp-mibs-downloader’ on the maas machine to try the commands manually and all three of them (Outlet#7 say) work. So the PDU itself is configured correctly for the snmp v1 calls.

  • query - snmpget -v1 -c private "."
  • on - snmpset -v1 -c private "." i 1
  • off - snmpset -v1 -c private "." i 2

When I look in the logs with a tail -f /var/snap/maas/common/maas.log, I see the following but no error reports for power_query. I also do not see any mention of the change from manual to apc.

2020-07-29T02:07:32.116163-07:00 maas maas.drivers.power.manual: [info] You need to check power state of byqddw manually.
2020-07-29T02:07:32.149571-07:00 maas maas.drivers.power.manual: [info] You need to check power state of byqddw manually.
2020-07-29T02:08:48.447746-07:00 maas maas.power: [info] Changing power state (on) of node: TR32 (byqddw)
2020-07-29T02:14:13.917076-07:00 maas maas.power: [info] Changing power state (off) of node: TR32 (byqddw)

For a VM node (virsh power type), when I had issues with setup, I notice errors in both maas.log and rackd.log. However, not a peep about the APC power_query.

I tried upgrading to MAAS 2.8 from 2.7 but exact same behavior.

Any additional places to look for logs ?

Help please.

1 Like

It seems APCPowerDriver has a power_query method, as you’ve seen, but it unfortunately has queryable = False too which (erroneously) instructs MAAS that this power driver can’t query the status of the machine.

Would you like to contribute a small fix to address that? Should be trivial

Ah, I see how I missed it! Should have looked at the PowerDriver code where it is clearer.

I’d be happy to send a PR out.

Any instructions that you can point me to ? I assume simply sending PRs (2.7, 2.8, 2.9? and master ?) flag change will be simple enough but I want to see how I can test those changes locally on my maas 2.7 system.

Also, while researching how I can handle powering the nodes, I stumbled on https://askubuntu.com/questions/639474/maas-shutdown-node-issue which talks about etherwake.

That code led me to believe that a typical MAAS “Power Off”, issues a ssh command first (either by logging in via ssh ubuntu@node or some call to a daemon running on the node). With the “APC PowerType” however, it simply cuts the power off at the outlet without any graceful shutdown.

I can see the VM node (virsh) get into a shutdown sequence and from a quick read of virsh.py (and drivers/hardware/virsh.py), it looks like it connects to virsh via ssh and asks it to shut the node down.

ipmi.py seems to have a “soft” option which from the docs will likely give the OS a change to shutdown gracefully.

I am likely missing something obvious here. Once power_query is enabled and MAAS knows that a node is up, I would have thought MAAS handles the graceful shutdown (since the OS is up) and then the power drivers come in for the low-level power handling?

If this is to be handled at the power-driver level, can I also look at adding such code to apc.py (via some generic ssh_shutdown method in a new drivers/hardware/generic_linux.py’s) and add a new flag indicating “OS Shutdown before power off” to the APC power params ?

or is there a different way I am supposed to shut the nodes down gracefully ?

Hi Vamsidhar,

thanks to your question I was able to bring up a new power type for ESPHome based devices quickly, by copying the DLI driver and marking it queryable, too.

To return the favor I submitted a merge request for the APC driver, too.