MAAS 3.3.4 Commission failing with own power configuration

Hi,
I am running MAAS 3.3.4 (packages), old HP DL360. I have preconfigured iLO account, as Security does not allow to create accounts on the fly.
MAAS discovered server, I inserted iLO account credentials to power configuration. Power config:

Power type: IPMI
Power driver: LAN_2_0 [IPMI 2.0]
Power boot type: Legacy boot
IP address: 10.x.x.x
Power user: xxxx
Power password: ************
K_g BMC key:—
Cipher Suite ID: freeipmi-tools default
Privilege Level: Administrator
Workaround Flags: None
Power MAC: xx:xx:xx:x:xx:xx

When I hit “Check power” MAAS correctly reads current power state of server. But when I want to start Commission, it just fails after some (tens?) seconds. Rackd log contains following error:

Aug 28 05:53:52 h1-esc-deploy01 sh[219062]: 2023-08-28 05:53:52 provisioningserver.rpc.power: [critical] simple-jackal: Power on failed.
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:         Traceback (most recent call last):
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 661, in callback
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             self._startRunCallbacks(result)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             self._runCallbacks()
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             current.result = callback(  # type: ignore[misc]
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             current_context.run(_inlineCallbacks, r, gen, status)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:         --- <exception caught here> ---
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             current.result = callback(  # type: ignore[misc]
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/provisioningserver/rpc/power.py", line 242, in eb_cancelled
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             failure.trap(CancelledError)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 451, in trap
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             self.raiseException()
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             raise self.value.with_traceback(self.tb)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             result = current_context.run(
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             return g.throw(self.type, self.value, self.tb)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/provisioningserver/rpc/power.py", line 292, in change_power_state
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             yield perform_power_driver_change(
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             result = current_context.run(gen.send, result)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/provisioningserver/drivers/power/__init__.py", line 419, in perform_power
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             raise exc_info[0](exc_info[1]).with_traceback(exc_info[2])
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/provisioningserver/drivers/power/__init__.py", line 374, in perform_power
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             yield deferToThread(power_func, system_id, context)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             result = inContext.theWork()  # type: ignore[attr-defined]
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             return self.currentContext().callWithContext(ctx, func, *args, **kw)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             return func(*args, **kw)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             result = func(*args, **kwargs)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/provisioningserver/drivers/power/ipmi.py", line 467, in power_on
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             self._issue_ipmi_command("on", **context)
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/provisioningserver/drivers/power/ipmi.py", line 461, in _issue_ipmi_command
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             return self._issue_ipmipower_command(
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:           File "/usr/lib/python3/dist-packages/provisioningserver/drivers/power/ipmi.py", line 370, in _issue_ipmipower_command
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:             raise error_info.get("exception")(error_info.get("message"))
Aug 28 05:53:52 h1-esc-deploy01 sh[219062]:         provisioningserver.drivers.power.PowerConnError: The IPMI session has timed out. MAAS performed several retries.  Check BMC configuration and connectivity and try again.

It looks to me like Commission would not take power configuration settings to Commission process and MAAS can’t power the server on when Commission starts. There is no problem with credentials itself, as I tested it by ipmitool and I successfully read status from bmc and powered server on.
Could someone point me to how to debug this or did I miss something? Thanks

Hi @stelucz and welcome to MAAS Community.

Since you are using DEB package, I think the easiest way would be to start with checking how Rack Controller invokes ipmitool by adding some more logging.

Can you please try adding a print statement inside _issue_ipmipower_command function located in .../provisioningserver/drivers/power/ipmi.py and simply print command that is being executed? Maybe there is something wrong with it…

Hey @troyanov, thanks for suggestion. I will try that and back to you. As a workaround for now I switched power config to manual.

I added print to return code also.

Aug 28 11:16:48 h1-esc-deploy01 sh[1757117]: 2023-08-28 11:16:48 stdout: [info] ['ipmipower', '-W', 'opensesspriv', '--driver-type', 'LAN_2_0', '-h', '10.xx.xx.247', '-u', 'x', '-p', 'x', '-I', '3', '-l', 'ADMIN', '--cycle', '--on-if-off']
Aug 28 11:17:08 h1-esc-deploy01 sh[1757117]: 2023-08-28 11:17:08 stdout: [info] ProcessResult(stdout='10.xx.xx.247: session timeout', stderr='', returncode=1)

user and password matched, so I tried to run command directly from node where MAAS is installed. I successfully powered server on by the exactly same command as MAAS issues and which times out. I have no clue currently what could be wrong.

I have injected also --debug flag for ipmipower. MAAS call https://pastebin.ubuntu.com/p/DjT9XfW8gm/ , call from cli https://pastebin.ubuntu.com/p/xxBPjWR3Fh/
Based on message order in MAAS’s debug output, could there be issue with some small timeout? I mean, it sends first request, while response is being delivered a new request is generated.

got it “fixed” by adding

ipmipower_command.append("--retransmission-timeout=500")

after

ipmipower_command.append("--on-if-off")

at row 454.

After this change, Commission works just fine.

@troyanov Thanks for pointing me into right direction. I will fill a bug report.

1 Like

@stelucz I am glad you managed to get it working.
Thanks for sharing your solution and creating a bug report.

If feels like we might need to expose some extra configurable parameters as part of power configuration.

With your help it was much easier to troubleshoot! :slight_smile:
Should there be any fix proposed in following weeks, I am happy to test it.
Once again thank you.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.