MAAS deployment fails on server with multiple HDD

Hi All,

I am seeing a weird issue where on deploying the server via MAAS it just does not boot.

I have a supermicro server with 4HDD and 1 SSD. Before commissioning the server, as shown in below BIOS snapshot everything is listed properly. By default I should be installing OS in P1 HDD.


Once commissioning is done, MAAS shows the storage as shown below

Now I am trying to deploy Ubuntu 22.04 on this machine which seems to be successful.
But on bootup the server is stuck in Grub as shown below

when I login into the BIOS of the server, I can see P1 is still on top of the list but P0-HDD is vanished from the list of drives.

I am suspecting that OS installation is done on P0-HDD and bootup it is vanished from the list and as a result I cant boot into it. Any suggestions would be of great help.

Also when I try to make other drive as boot drive in MAAS and install OS in it. I can see that drive is vanished off from boot order.

Hi there!

Please may you provide your installation log for this machine?

In addition, is ‘Legacy’ boot mode a hard requirement for your setup? It might be worth trying UEFI boot mode instead.

Thanks.

Let me try the same with UEFI.

Now I have one more supermicro server (server-2) where I have single logical volume created by the RAID controller. Everything looks fine where I can see logical volume created by the raid controller in Ready state. Commissioning of server-2 is also successful.

Now once I start the deployment of ubuntu 22.04 server, on reboot it lands up in the same error. I was using Dual mode in this server.

@andyls

In Server-1 and Server-2, when I am trying with UEFI mode with boot options as

  1. UEFI Network
  2. UEFI harddisk
  3. UEFI EFI shell

image

I am always ending up with UEFI EFI shell. That’s the reason I am using the Legacy mode.
Might be UEFI is not able to work properly in detecting the network and harddisk

So I tried DUAL mode which has both options from LEGACY and UEFI.

In server-1

In Server-2

image

Both the servers are ending up in Commissioned state
Now I have triggered deployment of ubuntu server 22.04 on both the nodes from MAAS

Below are the links to the deployment logs for both the machines

server-1
https://drive.google.com/file/d/1vb4IBe1fbf7w-fDEweDuk3vm3nOy0SXp/view?usp=sharing

Now on bootup server-1 is stuck in the same error.
I logged into BIOS and disabled network boot and reset the system. Now it is able to boot properly

server-2
https://drive.google.com/file/d/1MpHO9L8dNpGpgLrd-695hKwZtUQUYWzi/view?usp=sharing

Now on bootup server-2 is also stuck.
So logged into BIOS to disable network but virtual drive of RAID controller is vanished off as shown below.

image

Thank you for undertaking this further investigation.

It appears the links to your uploaded logs require permissions to access. It might be best if you could upload them to a service such as pastebin or similar instead (redacting any potentially sensitive information first, of course) and linking to them here. We need to see what efibootmgr is doing such that setting up the boot is failing and causing the bootloader to fall all the way through to the UEFI shell.

Thanks.

@andyls - I have updated respective links and you should be able to download the log files without any issue.

I’m facing the same issue deploying Ubuntu 22.04 with MaaS 3.6.
I have machines with multiple disks :

  • 1x hardware RAID1 for the OS
  • 1x hardware RAID1 for /var
  • 4x JBOD SSDs that are not configured by MaaS (they will be used as OSDs for Ceph purpose)

When I configure only one drive in MaaS (using the RAID1 OS for everything in “/”), it works perfectly.
As soon as I configure a second drive (for /var), it fails everytime.
I tried allocating the whole disk or a partition, tried with different format but nothing worked.

Any idea ?

Okay, I’ll answer myself since I found the origin of my problem.

Multiple drives is not the problem but delay in format is.
The drive I’m using for /var was configured as “unmap capable” and thus, when running mkfs.ext4 in the installation process, mkfs does a discard of all the blocks (even though the drive is blank) and it takes ages with this 1.6TB drive (it took about 10mn testing this by hand) which apparently makes the setup process fail.

Would be great to better handle this type of situation and provide a better progress report on the UI (like the progress percentage in the discard operation when there is one)