Error determining BMC task queue for machine using UCS power driver

Hi all,

We’re using MAAS 3.5 with UCS, and during the first boot after provisioning, the BMC scripts fail. As a result, attempting to “check power” raises the following error:

Failed to query node's BMC - (admin) - Aborting COMMISSIONING and reverting to NEW. Unable to power control the node. Please check power credentials.
Failed to query node's BMC - (admin) - No rack controllers can access the BMC of node server1.

The IPMI/Redfish interface is active in the profile, and there is a user created with a password.

This issue did not occur with MAAS 3.4, where the same setup worked fine. Does anyone have any suggestions or insights?

Could you describe your network topology?

I’ll try to describe the topology for better understanding:


+---------------------------------------------------------------------+    +-----------------------------+
|                                                                     |    |                             |
|            VLAN1                          VLAN2                     |    |  +-----------------+        |
|     +-----------------------+      +---------------------+          |    |  |                 |        |
|     |                       |      |                     |          |    |  |                 |        |
|     | +-----------------+   |      |  +--------------+   |          |    |  |  Blade 01       |        |
|     | |                 |   |      |  |              |   |          |    |  |                 |        |
|     | |  RACKD{1,2}     |   |      |  |REGIOND{1,2}  |   |          |    |  |  at least 4     |        |
|     | |                 |   |      |  |              |   |          |    |  |  iface          |        |
|     | |                 |   |      |  |              |   |          |    |  |                 |        |
|     | +-----------------+   |      |  +--------------+   |          |    |  |  iface at vlan2 |        |
|     |                       |      |                     |          |    |  |                 |        |
|     |                       |      |   +------------+    |          |    |  +-----------------+        |
|     |                       |      |   |            |    |          |    |  +-----------------+        |
|     |                       |      |   |Database 01 |    |          |    |  |  Blade 02       |        |
|     |                       |      |   |            |    |          |    |  |                 |        |
|     |                       |      |   +------------+    |          |    |  |  at least 4     |        |
|     |                       |      |                     |          |    |  |  iface          |        |
|     |                       |      |                     |          |    |  |                 |        |
|     |                       |      |  +---------------+  |          |    |  | -iface-at-vlan2 |        |
|     |                       |      |  |               |  |          |    |  +-----------------+        |
|     |                       |      |  |  HA PROXY     |  |          |    |                             |
|     |                       |      |  |               |  |          |    |                             |
|     |                       |      |  |               |  |          |    |                             |
|     |                       |      |  +---------------+  |          |    |                             |
|     |                       |      |                     |          |    |                             |
|     |                       |      |                     |          |    |                             |
|     +-----------------------+      +---------------------+          |    |                             |
|                                                                     |    |                             |
|                                                                     |    |                             |
|                                                                     |    |                             |
|                                                                     |    |                             |
|   VMWARE virtual machines                                           |    |   CISCO UCS                 |
|                                                                     |    |                             |
+---------------------------------------------------------------------+    +-----------------------------+

At the UCS level:

At maas level:

In my understanding, when we use the UCS power driver, the entire power-on/power-off process of the machine should be handled by the UCS. Is this understanding correct?
Is there any specific configuration we need to set up in the UCS? Some time ago, I managed to get everything working with version 3.4, but now, it no longer works.

Let me know what details I should gather

@r00ta Any idea about what are happening?

In the VLAN2 do you have region+rack controllers?

Apologies for the confusion earlier. Here’s the situation:

  • All physical machines are in VLAN 1.
  • In VLAN 1, I only have rackd.
  • rackd communicates with VLAN 2, where regiond is located.

The machines boot and appear in the interface. However, when we try to “commission” them, an error occurs.

@r00ta I performed a fresh installation of MAAS 3.4, and it is working correctly. There are no BMC errors when checking power, and I am able to successfully “commission” a server.

Could this be a potential bug in version 3.5?

Might be. If you are able to reproduce this on 3.4 and 3.5, could you share the db dumps and the logs?

Hello @r00ta

I set up a simple MAAS 3.5 environment with the following considerations:

Key Changes:

  • All servers were moved to the same VLAN as rackd to eliminate any possibility of firewall or network blocks.
  • HA (High Availability) was removed, leaving only:
    • One regiond server
    • One rackd server
    • HAProxy

Machine IPs:

  • 10.107.72.30 - Database (PostgreSQL) 16.6 (Ubuntu 16.6-1.pgdg24.04+1)
    )
  • 10.107.72.15 (VIP: 10.107.72.25) - HAProxy
  • 10.107.72.10 - rackd01
  • 10.107.72.20 - regiond

Configuration:

The maas_url is configured to use the HAProxy address, not the regiond address directly.

Additional Test:

I performed another test that worked successfully: I installed MAAS on a single machine using a simple command: apt install maas

Based on the documentation, the HA setup for MAAS appears to be relatively simple. It only requires configuring HAProxy and adding machines to it, as MAAS can handle the rest automatically.

Logs are available here: https://drive.google.com/drive/folders/1VYwsZVPGQdvmfX6QlDppma2RLKH0kUBN?usp=sharing

Please let me know if you are unable to access the logs.

Some screenshots:

The username and password are correct! :slight_smile:

Is this the only problematic machine? Do you have the some issue with other BMC?

We have only UCSM as BMC here… all of my machines are hosted in UCS…

You mean all the machines have the same issue?

Yes! All machines have the same issue.

Have you tried a setup without HA proxy and MAAS 3.5?

No. But I can try… I’ll do this test…

@r00ta I found the problem!!!

The UCSM has a certificate chain, but I only added (at ca-certificates) the UCSM certificate, leaving out the CA certificates.

When I executed a curl command to the UCS URL, no errors were displayed, which gave me a false sense that everything was fine.

To make things more challenging, the error message returned by MAAS is not very helpful, as it’s just a generic error message.

It would be very helpful if an “insecure TLS” option were added to the provider. In most cases (I believe), this kind of setup is internal and often uses self-signed certificates.

How did I discover this? During the installation of MAAS 3.4, the BMC error was also displayed, but the message was much clearer. It was only after some time that I decided to try manually adding the CA’s public key.

1 Like

Thank you so much for sharing!

@r00ta I opened an issue on Launchpad (Bug #2092062 “Unclear Error Message for BMC Issues with Missing ...” : Bugs : MAAS) regarding this unclear error. I believe having a clear error message is very important (I spent 4 days in crazy troubleshooting :stuck_out_tongue:).

Regarding a “new feature” for adding an “insecure TLS” option, where would be the best place to request this?

Thanks! You can open another topic and use the tag “features”

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.