MAAS commissioning script fail 30-maas-01-bmc-config on supermicro server

Hi MAAS Dev’s

We have a new supermicro server which failed in commissioning script “30-maas-01-bmc-config”. up until now we had only Dell PowerEdge servers with now issues.

we run MAAS 3.5 Stable
Server is from SuperMicro and has ARM cpu, GH200
https://www.supermicro.com/en/products/system/gpu/1u/ars-111gl-nhr

modprobe: ERROR: could not insert 'ipmi_si': No such device
Unable to get Number of Users
ERROR: Unable to add BMC user!
INFO: Loading IPMI kernel modules...
INFO: Checking for HP Moonshot...
INFO: Checking for Redfish...
ERROR: Redfish write() argument must be str, not None
Traceback (most recent call last):
  File "/tmp/user_data.sh.RRV4Px/scripts/commissioning/30-maas-01-bmc-config", line 1132, in detect_and_configure
    if bmc.detected():
  File "/tmp/user_data.sh.RRV4Px/scripts/commissioning/30-maas-01-bmc-config", line 1065, in detected
    self._detect()
  File "/tmp/user_data.sh.RRV4Px/scripts/commissioning/30-maas-01-bmc-config", line 1050, in _detect
    self._configure_network(iface, data)
  File "/tmp/user_data.sh.RRV4Px/scripts/commissioning/30-maas-01-bmc-config", line 1023, in _configure_network
    netplan.write(netplan_config)
TypeError: write() argument must be str, not None

INFO: Checking for IPMI...
INFO: IPMI detected!
INFO: Reading current IPMI BMC values...
INFO: Configuring IPMI Lan_Channel...
INFO: Configuring IPMI Lan_Channel_Auth...
INFO: Lan_Channel_Auth settings unavailable!
WARNING: No K_g BMC key found or configured, communication with BMC will not use a session key!
INFO: Configuring IPMI Serial_Channel...
INFO: Serial_Channel settings unavailable!
INFO: Configuring IPMI SOL_Conf...
INFO: SOL_Conf settings unavailable!
INFO: Configuring IPMI BMC user "maas"...
INFO: IPMI user number - None
INFO: IPMI user privilege level - Administrator

what commissioning image are you using?

Ubuntu22.04 that pulled from maas

I’m starting to see the same problem (redfish write argument) on random nodes out of a bunch. Some blades get this, others don’t. I cannot figure out what is or isn’t happening .

Downloading the logs has been uninformative.

---------------------------- 30-maas-01-bmc-config ----------------------------
ipmi_cmd_set_user_access: invalid parameters
INFO: Loading IPMI kernel modules...
INFO: Checking for HP Moonshot...
INFO: Checking for Redfish...
ERROR: Redfish write() argument must be str, not None
Traceback (most recent call last):
  File "/tmp/user_data.sh.sKiyi2/scripts/commissioning/30-maas-01-bmc-config", line 1134, in detect_and_configure
    if bmc.detected():
  File "/tmp/user_data.sh.sKiyi2/scripts/commissioning/30-maas-01-bmc-config", line 1067, in detected
    self._detect()
  File "/tmp/user_data.sh.sKiyi2/scripts/commissioning/30-maas-01-bmc-config", line 1052, in _detect
    self._configure_network(iface, data)
  File "/tmp/user_data.sh.sKiyi2/scripts/commissioning/30-maas-01-bmc-config", line 1025, in _configure_network
    netplan.write(netplan_config)
TypeError: write() argument must be str, not None

INFO: Checking for IPMI...
INFO: IPMI detected!
INFO: Reading current IPMI BMC values...
INFO: Configuring IPMI Lan_Channel...
INFO: Configuring IPMI Lan_Channel_Auth...
INFO: Lan_Channel_Auth settings unavailable!
WARNING: No K_g BMC key found or configured, communication with BMC will not use a session key!
INFO: Configuring IPMI Serial_Channel...
INFO: Configuring IPMI SOL_Conf...
INFO: Found existing IPMI user "maas"!
INFO: Configuring IPMI BMC user "maas"...
INFO: IPMI user number - User3
INFO: IPMI user privilege level - Administrator
WARNING: Unable to set User3:Serial_Enable_Link_Auth to Yes!
INFO: IPMI Version - LAN_2_0
INFO: IPMI boot type - efi

Please note that the logs are actually wrong because it’s not really an ERROR: your machine is going to be configured with IPMI as you can see. So it’s not fatal or something, it simply says that it failed to configure redfish.

My next questions are:

  1. Has MAAS ever been able to use Redfish on the same machine? Did you upgrade the firmware or make any changes?
  2. are these supermicro machines? model?

If you have an OS deployed on that machine, do you mind running

dmidecode -t 42

and post the output? I suspect that firmware is not following the redfish standard and it’s an edge case we need to handle if we get this info.

(as an alternative, you can execute it in a custom commissioning script and print the output)

Has MAAS ever been able to use Redfish on the same machine?

We have >5 dozen of these blades in use via MaaS – never seen this before today.

Did you upgrade the firmware or make any changes?

These are entirely new machines. The first two I tried worked just fine, then we started seeing individual blades fail with these errors. Firmware is identical on the blades that failed versus the blade that succeeded.

are these supermicro machines? model?

Yup, Supermicro https://store.supermicro.com/us_en/3u-microcloud-as-3015mr-h8tnr.html

I tried the supermicro20 hack with no change / no improvement.

Picking one of the blades that works, I see this output. It doesn’t look like it would be different between blades given the firmware are the same…

$ sudo dmidecode -t 42

# dmidecode 3.6
Getting SMBIOS data from sysfs.
SMBIOS 3.7.0 present.

Handle 0x0039, DMI type 42, 122 bytes
Management Controller Host Interface
	Host Interface Type: Network
	Device Type: USB
	idVendor: 0x0b1f
	idProduct: 0x03ee
	Protocol ID: 04 (Redfish over IP)
		Service UUID: c95cab2a-6de0-45f4-8b3b-726c3794b26b
		Host IP Assignment Type: AutoConf
		Host IP Address Format: IPv4
		IPv4 Address: 169.254.3.1
		IPv4 Mask: 255.255.255.0
		Redfish Service IP Discovery Type: AutoConf
		Redfish Service IP Address Format: IPv4
		IPv4 Redfish Service Address: 169.254.3.254
		IPv4 Redfish Service Mask: 255.255.255.0
		Redfish Service Port: 443
		Redfish Service Vlan: 0
		Redfish Service Hostname: 169.254.3.254

So I finally made a video and slowed it down, and what I see is from the screenshot. This flashes by way too fast to see normally.

I can confirm from the server logs that it does send a POST and get a 401 response, but then it immediately (same second!) sends another POST with a 200 response

 10.2.2.232 - - [20/Feb/2025:16:29:19 -0800] "POST /MAAS/metadata/2012-03-01/ HTTP/1.1" 401 122 "-" "Python-urllib/3.10"
10.2.2.232 - - [20/Feb/2025:16:29:19 -0800] "POST /MAAS/metadata/2012-03-01/ HTTP/1.1" 200 2 "-" "Python-urllib/3.10"


In the logs, if we wait 30 minutes for provisioning to fail then we get the following logs… which yes you haven’t heard from it because it powered off!

 Thu, 20 Feb. 2025 17:22:12	Failed to query node's BMC - (admin) - No rack controllers can access the BMC of node x36
Thu, 20 Feb. 2025 17:21:52	Failed to query node's BMC - (admin) - No rack controllers can access the BMC of node x36
Thu, 20 Feb. 2025 17:21:52	User powering down node - Node stopped because SSH is disabled
Thu, 20 Feb. 2025 17:21:52	Node changed status - From 'Commissioning' to 'Failed commissioning'
Thu, 20 Feb. 2025 17:21:52	Marking node failed - Node has not been heard from for the last 30 minutes
Thu, 20 Feb. 2025 16:51:49	Script result - 30-maas-01-bmc-config changed status from 'Pending' to 'Passed'

I need the output from a machine that does not work :slight_smile:

It was the BIOS time. No matter that you are setting up NTP during the boot, if the date and time of the BIOS are too far wrong they fail to authenticate to MAAS to get the user scripts.