Possible Bug : Power Sync not happening

trsoumi88 · 14 February 2026 01:38

Hi,

My read of MaaS code base for version 3.5.x says, MAAS effectively checks a machine’s power state about every 5 minutes per machine. How ever I cannot see this happening.

My setup is MaaS 3.5.10 setup using deb packages. A run of maas <profile> machine query-power-state machine_id shows the details, however the details are not updated in db. This makes maas <profile> machine read showing inconsistent details about power for a machine.

This can be easily replicated by changing the power details of a machine, wait for 5 miuntes the power_state for the machine will be the same as previous.

Is some one else also facing this?

thanks!

nehjoshi5 · 15 February 2026 15:47

Hello!

What power driver are you using for that machine? Also, do you observe any errors in the rackd logs when querying the power-state for that machine (via journalctl -u maas-rackd)?

trsoumi88 · 16 February 2026 08:47

We use a mix of IPMI and Redfish for all machines. The behaviour is the same for both. Querying maasserver_node shows empty power_state_queried.

maasdb=> SELECT system_id, power_state, power_state_queried, power_state_updated
FROM maasserver_node;
 system_id | power_state | power_state_queried |      power_state_updated
-----------+-------------+---------------------+-------------------------------
 8dmef6    | on          |                     | 2026-02-13 17:33:04.42324-08
 cms6et    | off         |                     | 2026-02-13 16:13:37.665781-08
 ed67b8    | off         |                     | 2026-02-13 16:13:26.305461-08
 smmd46    | on          |                     | 2026-02-13 16:13:43.381593-08
 qybex8    | on          |                     | 2026-02-13 16:13:49.089349-08

Manual attempt for a machine which has an actual power error shows a stack trace

Feb 16 00:25:03 xxx.yyy.com regiond[56952]: maasserver.websockets.handlers.machine: [critical] Failed to update power state of machine.
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:         Traceback (most recent call last):
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             current_context.run(_inlineCallbacks, r, gen, status)
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1740, in _inlineCallbacks
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             status.deferred.errback()
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 700, in errback
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             self._startRunCallbacks(fail)
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             self._runCallbacks()
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:         --- <exception caught here> ---
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             current.result = callback(  # type: ignore[misc]
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/maasserver/websockets/handlers/machine.py", line 1256, in eb_unknown
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             failure.trap(UnknownPowerType, NotImplementedError)
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             self.raiseException()
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             raise self.value.with_traceback(self.tb)
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             result = current_context.run(
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             return g.throw(self.type, self.value, self.tb)
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:           File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 6174, in exec_power_workflow
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:             raise PowerActionFail(cause)
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:         provisioningserver.rpc.exceptions.PowerActionFail: ExitError: exit status 1
Feb 16 00:25:03 xxx.yyy.com regiond[56952]:

No other power related messages can be seen in maas-regiond / maas-rackd/maas-agentd.

thanks!

nehjoshi5 · 16 February 2026 10:56

I suspect this may be an issue with the rack controller not being able to reach the node’s BMC. Can you ping <BMC_IP> from the rack controller and confirm you get a response? If not, this may indicate an issue with the rack controller’s interface IPs/subnets. Thanks

trsoumi88 · 16 February 2026 13:11

Thanks all for the thoughts and suggestions. Much appreciated.

I should have mentioned this earlier - Ours is an all-in-one MaaS setup. Rack and region share the same instance. Haven’t had a need for HA setup yet. MaaS endpoint has TLS enabled. A manual power query works via CLI or via UI. The problem is with automatic power check. Because we deal with baremetal machines, at times BMC fails. I am trying to leverage automatic/periodic power query feature in MaaS, and use it as a source of truth to alert in case the power_status is error for a machine.

$ maas admin machine read 874may | jq -r '.power_state'
error

I am out of ideas on why manual works and not automatic/periodic power query.

thanks!

trsoumi88 · 17 February 2026 02:15

I did some more debugging and found that the subnet_id associated with BMC IP is NULL

maasdb=> SELECT n.system_id, n.hostname, n.status, n.bmc_id, b.power_type, sip.ip AS bmc_ip, sip.subnet_id, n.power_state_queried, n.power_state_updated FROM maasserver_node n LEFT JOIN maasserver_bmc b ON b.id = n.bmc_id LEFT JOIN maasserver_staticipaddress sip ON sip.id = b.ip_address_id WHERE n.system_id = '8dmef6';
 system_id | hostname | status | bmc_id | power_type |    bmc_ip     | subnet_id |      power_state_queried      |      power_state_updated
-----------+----------+--------+--------+------------+---------------+-----------+-------------------------------+-------------------------------
 8dmef6    | xxx168   |      4 |     58 | redfish    | 10.0.0.1      |           |                               | 2026-02-16 17:32:50.112144-08
(1 row)

And I guess this is the reason why the machine isn’t selected for periodic power querying.

I have an explicit route added to enable reachability to BMC in network configuration file. Do I need to do anything explicit to make things work?

thanks!

nehjoshi5 · 17 February 2026 11:57

Thank you for sharing the details!
This is quite unusual as MAAS is able to enlist and commission the machine, yet there is no subnet associated with the node’s BMC. Could you confirm how you created this machine? In particular:

Does a subnet covering the BMC IP 10.0.0.1 exist in MAAS?
Is that subnet attached to a VLAN that has at least one rack controller interface associated with it?
If the subnet is not controlled by MAAS (or was modeled outside of MAAS), this might explain the missing subnet_id column.

trsoumi88 · 17 February 2026 13:21

Could you confirm how you created this machine.

We explicitly add machines to MaaS using API /MAAS/api/2.0/machines/: Create a new machine.

Does a subnet covering the BMC IP 10.0.0.1 exist in MAAS?

Nop, there is no subnet covering BMC IP, however the BMC is reachable from the MaaS instance, and I am guessing this being the reason for a successfull enlist/commission/deployment.

If the subnet is not controlled by MAAS (or was modeled outside of MAAS), this might explain the missing subnet_id column.

Yes, the subnet is not managed by MAAS. But why does the BMC subnet need to be managed by MAAS? The BMC only needs to be reachable from MAAS for power control and other BMC related operations to work. If MAAS were to control the BMC subnet itself, wouldn’t that create a chicken-and-egg problem?

thanks!

trsoumi88 · 17 February 2026 13:32

I went a step ahead and added BMC subnet under the default fabric in use, and made it unmanaged by MaaS, because we manage BMC IPs via a different method.

I can now see the BMC subnet under Controllers >> rack controller >> VLAN. I could also see that MaaS automatically populated subnet_id for BMC IPs with the new subnet. However auto polling of power still did not work.

As per the debug logs it looks like the machines that gets selected for auto power polling should have one of the subnet_id associated with rack server interface (as seen under Controller >> rack controller >> Interfaces).

thanks!

nehjoshi5 · 17 February 2026 13:42

Thanks, that’s helpful. To clarify my previous message, MAAS doesn’t need to manage or control the subnet, but it still needs to be defined and attached to a VLAN with a rack controller so responsibility can be assigned.
Since you mentioned adding the subnet, at least one rack controller interface must have an IP in that subnet (unmanaged is fine), as only those nodes get queried in the periodic power sync.

trsoumi88 · 17 February 2026 13:50

Thanks @nehjoshi5 for the quick response.

Does this mean if total machines managed by MaaS has BMC IPs from n different subnets, the rack controller should have one IP from each of these subnets for auto power polling to work? This is my scenario. We have a DC environment with machines spread on different physical racks and have BMC IPs from n different subnets. Right now MaaS maanages approx. 500 machines, but this will increase soon to around 1k machines. MaaS is the infrastructure provider for our DC kubernetes cluster. We do one node replacement at a time.

Due to all these reasons, our requirement only demands one rack controller for now, and this works as expected except for the auto power polling.

thanks!

nehjoshi5 · 17 February 2026 14:06

You’re welcome! And yes, that’s correct, for the current implementation. Auto power monitor only selects nodes whose BMC subnet matches a subnet that the rack controller has an interface IP on. It does not have a “routable-rack” fallback like that of manual power queries.
In your situation, you could potentially:

Add interfaces (physical or VLAN) to the existing rack controller, so it has an IP in each BMC subnet.
Deploy additional rack controllers, placing each on the relevant BMC subnets so responsibility is distributed.

trsoumi88 · 17 February 2026 14:17

Thanks @nehjoshi5 for the suggestions.

Deploy additional rack controllers, placing each on the relevant BMC subnets so responsibility is distributed.

Is there any suggestion on when I should consider having multiple rack controllers? Is there a limit on the number of machined manased by ?

thanks!

nehjoshi5 · 17 February 2026 14:38

There’s no strict limit on how many machines a rack can manage, but you should consider more rack controllers when you’re managing large fleets of machines, require network segregation (as in your case with segmented BMC subnets) and/or need more efficient load distribution. You can check this article out for more details on enabling high availability (HA).