I provided some feedback in the various recent bugs reported with committed fixes for the Redfish power driver relating to HPE status, but have seen no replies or confirmations that it’s been read, so I am putting this note out there. I hate to say that the changes are a regression in my environment, and are causing us rework in certain cases. For instance, sometimes a Release actually works and the machine shuts down properly at the conclusion, but MAAS reports a failed release with an error in the new process. In cases of true failure, this retry logic masks the actual error because it only reports twisted.python.failure after the initial request. We have to dig through the maas.log to find out what the actual error is (only reported on the first try).
Here’s one where a Commission fails because it couldn’t power the machine on, and yet it did power the machine on and started to request files over TFTP:
Tue, 03 Dec. 2024 08:22:42 TFTP Request - grubx64.efi
Tue, 03 Dec. 2024 08:20:08 Failed to power on node - Power on for the node failed: Failed talking to node's BMC: [<twisted.python.failure.Failure builtins.ValueError: I/O operation on closed file.>]
Tue, 03 Dec. 2024 08:20:08 Node changed status - From 'Commissioning' to 'Failed commissioning'
Tue, 03 Dec. 2024 08:20:08 Marking node failed - Power on for the node failed: Failed talking to node's BMC: [<twisted.python.failure.Failure builtins.ValueError: I/O operation on closed file.>]
Tue, 03 Dec. 2024 08:19:16 Powering on
Tue, 03 Dec. 2024 08:19:16 Node - Started commissioning on '88TN0R3'.