After the post under BMC I did enable IPMI over LAN option, DHCP for ethernet port. Once I reboot the server with PXE boot option, MAAS is able to detect and start the server with ephemeral image successfully setting ipaddress to first inband ethernet port.
I tried commissioning the server and it is successful.
Now when I am trying to deploy it fails and ends up shutting down the server. Logs shown in MAAS are not useful and need your help in troubleshooting the same .
But yesterday when I am in the lab, I was trying to find the console failure logs on the screen and it is so fast that I am not able to get the exact failure reason. The moment it faces an issue, it starts stopping all services and shutdown the system.
If there is another way to capture these logs on screen or within MAAS please do share the steps for the same so that I can try. Currently I have access only via MAAS GUI.
It seems that the machine does not even reach the moment where it starts communicating the status back to MAAS. If this is correct, there is nothing to extract from MAAS as the issue is on your specific machine and the information is not sent out.
Only a physical access to the machine/stdout can explain why the machine crashed
Is there a way to stop scrolling on the screen using some keyboard shortcuts to see the failure reason ?
Currently I am connected to server through USB keyboard and monitor via serial port.
That’s the 1 million dollar question there are some tricks to do but they require to change the cmdline of the grub entry that Maas sends to the machine.
The most basic and quickest thing you could try is to take a video with your phone, watch it and see if it’s enough to spot the failure
I got a chance to try the same. From my MAAS UI once the machine is in Ready State, I have selected Deploy option picking Ubuntu 22.04 as the OS image.
Server ended up getting rebooted in ephemeral image and I can see the login prompt. Now I started to see below time sync errors
After that server ended up rebooting again. On successfully rebooting to login prompt it just powered off.
I was suspecting there is a time synchronization issue between MAAS server and Dell server but it is not true as both of them are synced up properly at bios level.
As shown below hwclock with --localtime ideally gives the time in UTC instead of PDT in which system clock is configured. When I give just hwclock command it just shows proper time and I dont think it is time issue.
ipin# hwclock
2023-09-27 06:54:17.232248-07:00
ipin# date
Wed Sep 27 06:54:21 AM PDT 2023
ipin# hwclock --localtime
2023-09-27 13:54:39.681479-07:00
Meanwhile I tried changing NTP settings in MAAS as shown below and the deployment of the OS worked successfully with the Ubuntu server 22.04
I think it is a DNS issue but not clock issue.
NOTE: I only have one VM in which MAAS is installed which is acting as both regiond and rackd instead of 2 seperate instances.