Help request with deploying Ubuntu to HP DL380 G9

Dear Sirs,

thanks in advance for your help.

I am new to MaaS. I installed a rel 2.7 Rack+Region controller onto a VMware VM (snap install) to use as a self-training lab. At first i tried to deploy a new VM Ubuntu 18.04 connected to the same network as the MaaS server and it worked fine.

Now I am trying to deploy Ubuntu to a bare-metal HP DL380 G9 connected to a different network. So I first installed a second virtual rack controller to manage the new subnet and made it join the region controller. Also this part succeded.

Booting the ILO from the MaaS DHCP made it appears in the dashboard at first istance, but after a while I found it in the “Devices” tab.

I expected to have the server automatically enlisted as new in the “Machines” tab, but this did not happen. So I manually added it, using IPMI LAN 2.0 as power type, and ran the “Commissioning” phase.

The server booted and ended with the Ubuntu ephemeral image loaded, but none of the commissioning checks started. All of them remained pending until timeout.

I cannot tell where to start investigating why the commissioning does not complete? Is there someone who can give me some suggestions?

P.S.: I logged into the server and checked the /var/log/cloud-init.log and found this:

  `2020-03-13 21:20:30,631 - main.py[DEBUG]: retrieving url 'http://192-168-61-0--24.maas-internal:5248/MAAS/metadata/latest/by-id/sxarfw/?op=get_preseed' failed: HTTPConnectionPool(host='192-168-61-0--24.maas-internal', port=5248): Max retries exceeded with url: /MAAS/metadata/latest/by-id/sxarfw/?op=get_preseed (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff693a79940>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))`

Verifying 192-168-61-0--24.maas-internal from the server shell it resolves with the IP of the proper rack controller.

By server I assume you mean the newly deployed machine. Can you curl that url from the machine?

I do not know if that url is hosted by the region controller or the rack controller. I’m guessing the rack controller or else you would potentially have routing issues.

1 Like

Hi, Jeremy.

Thanks for spending some time with my issue.

> By server I assume you mean the newly deployed machine.
Yes, you’re right

> Can you curl that url from the machine?
Yes, I can. It works.

> I do not know if that url is hosted by the region controller or the rack controller. I’m guessing the rack controller…
So do I. The rack controller and the machine are both connected to the same subnet (192.168.61.0/24) , while the region controller is connected to a different network (192.168.92.0/24). There is a firewall between.

Since the machine and the rack controller are on the same subnet I would exclude any routing issue.

In my experience the provisioned machine needs to be able to curl the URL of the region not the rack

Here is a little update.

I deleted the machine from MaaS Web GUI. Then I opened the ILO web interface and power on the DL 380. It booted from MaaS DHCP and finally I had the machine enlisted as “new”. This time commissioning check completed succesfully, but failed while running “smartctl-validate” testing. Here is the test script combined output:

Unable to run 'smartctl-validate': 'MAAS did not detect any storage devices during commissioning!'Given parameters:{'storage': {'argument_format': '{path}', 'type': 'storage', 'value': 'all'}}Discovered storage devices:[]Discovered interfaces:{'98:f2:b3:26:cc:3c': 'eno1'}

The disk array is a HPE Smart Array P840ar Controller

The commissioning task ended with a “Failed testing” status

Can you confirm that no disks were discovered on the system by looking on in the Storage tab. If you don’t see any try using an HWE kernel and recommissioning? This can be done by going to the Configuration tab, click Edit, and setting the minimum kernel to hwe-18.04-edge.

If that doesn’t work please file a bug at https://bugs.launchpad.net/maas and include the output of the commissioning script 50-maas-01-commissioning.

Hi ltrager,

thanks for your suggestion.

> Can you confirm that no disks were discovered on the system by looking on in the Storage tab.
Yes, I confirm it: Error:There are currently no storage devices. Please add a storage device to be able to deploy this node.

I did login into the machine and ran smartctl with the only device file that looked like a storage device:

ubuntu@amazed-ray:~$ sudo smartctl --all /dev/sg0
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-88-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              P840ar
Revision:             6.60
Serial number:        PVYKH0BRHAL0KU
Device type:          storage array
Local Time is:        Fri Mar 20 08:04:21 2020 UTC
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported

Device does not support Self Test logging

It looks like if the array controller has been recognized correctly. I see from the ILO web interface that there should be 7x2TB physical volumes attached to the array, but the OS image cannot find them.

Now I’m going to rerun the commissioning wth the HWE kernel and then I will post an update.

Hi ltrager,

Commissioning/test issue has been solved. I realized (stupid me!) that the array was not configured. I mean there were no logical volumes configured to be recognized and used by the OS.

Commissioning succeded after a RAID-1 2 TB volume has been created.

Now I am running the deploy. I cross my fingers.

1 Like

Deploy completed successfully.

1 Like