MAAS commissioning bails out too quickly

Hello! This issue has cropped up recently within my MAAS infra. I have a server that takes a couple of minutes to boot through it’s myriad BIOS’s to the point where it’s finally ready to PXE boot, but by that time, MAAS has given up and marked the commissioning process as failed. This seems like a new problem as of 3.3.5, since I was able to commission this machine previously (though on what version it specifically WAS working escapes me - I believe 3.3.3?).

It would seem like the timeout is approximately 1 minute before it bails:

Wed, 28 Feb. 2024 15:41:35	Node - Started commissioning on 'dell-t410'.
Wed, 28 Feb. 2024 15:41:35	Powering on
Wed, 28 Feb. 2024 15:42:40	Marking node failed - Power on for the node failed: Could not contact node's BMC: Device busy while performing power action. MAAS performed several retries. Please wait and try again.

The error message is also misleading, because it claims to have failed to power on the node, when in fact it was quite successful - in fact, the power indicator in the UI next to the hostname even says that it’s powered on. Also, about a minute after those logs, I see the normal PXE boot activity start coming in, showing that the server was fine, it’s just that MAAS gave up way too quickly:

Wed, 28 Feb. 2024 15:43:27	TFTP Request - bootx64.efi
Wed, 28 Feb. 2024 15:43:27	TFTP Request - grubx64.efi
Wed, 28 Feb. 2024 15:43:27	TFTP Request - bootx64.efi

I was previously able to commission this machine no problem, and as far as I know, nothing has changed on it - it takes about as long as it’s normally been to run through it’s boot process. Is there a configurable timeout for this to have MAAS wait longer for a machine to boot? Thanks!

You could try adjusting the node_timeout configuration setting. This sets the time, in minutes, until the node times out during commissioning, testing, deploying, or entering rescue mode.

More on this in the docs:

Thanks for the reply Peter! I forgot to mention that was already suggested to investigate - that’s currently set to 30 minutes (which, since I’ve never adjusted that value before, I presume to be the default):

:~# maas admin maas get-config name=node_timeout
Machine-readable output follows:

might be this same issue