Deployment failures due to incorrect clock settings in regiond and rackd

Hi All,

Continuing with my previous post about MAAS and installing in poweredge r710 with idrac6.
Installing Ubuntu over Dell poweredge R710 machine with iDRAC6)

After the post under BMC I did enable IPMI over LAN option, DHCP for ethernet port. Once I reboot the server with PXE boot option, MAAS is able to detect and start the server with ephemeral image successfully setting ipaddress to first inband ethernet port.

I tried commissioning the server and it is successful.

Now when I am trying to deploy it fails and ends up shutting down the server. Logs shown in MAAS are not useful and need your help in troubleshooting the same .

Hi @codingfreak,

Unfortunately without additional information it’s impossible to provide help. Could you please upload the Logs somewhere (like pastebin.ubuntu.com)?

Hi @r00ta

I am not sure how to take the logs for the same to share it. Can you please share the commands to get the same ?

Under logs tab in GUI, I dont see any failure logs of why deployment failed.

Can you send a screenshot of the Logs page anyways just to double check?

Hi @r00ta

Do you have access to the machine to see the console when the machine boots?

@r00ta

Well I dont have the access now.

But yesterday when I am in the lab, I was trying to find the console failure logs on the screen and it is so fast that I am not able to get the exact failure reason. The moment it faces an issue, it starts stopping all services and shutdown the system.

If there is another way to capture these logs on screen or within MAAS please do share the steps for the same so that I can try. Currently I have access only via MAAS GUI.

It seems that the machine does not even reach the moment where it starts communicating the status back to MAAS. If this is correct, there is nothing to extract from MAAS as the issue is on your specific machine and the information is not sent out.

Only a physical access to the machine/stdout can explain why the machine crashed

Hi @r00ta

Is there a way to stop scrolling on the screen using some keyboard shortcuts to see the failure reason ?
Currently I am connected to server through USB keyboard and monitor via serial port.

That’s the 1 million dollar question :slight_smile: there are some tricks to do but they require to change the cmdline of the grub entry that Maas sends to the machine.

The most basic and quickest thing you could try is to take a video with your phone, watch it and see if it’s enough to spot the failure :sweat_smile:

Let me try the same using a phone to confirm the failure cause

@r00ta

I got a chance to try the same. From my MAAS UI once the machine is in Ready State, I have selected Deploy option picking Ubuntu 22.04 as the OS image.

Server ended up getting rebooted in ephemeral image and I can see the login prompt. Now I started to see below time sync errors

After that server ended up rebooting again. On successfully rebooting to login prompt it just powered off.

I was suspecting there is a time synchronization issue between MAAS server and Dell server but it is not true as both of them are synced up properly at bios level.

Output of timedatectl on the region? Can you check that the output is correct based on your real time of the day?

Well i verified the output of time on server and regiond, they are same.

Only when the cloudinit script is executed it points to time as 02:51:42 instead of 07:52

image

Did you check the hardware clock or the system time on ubuntu? Output of sudo hwclock --localtime and date?

Looks like they are out of sync at hardware vs date

$ sudo hwclock --localtime
[sudo] password for ipin:
2023-09-26 20:23:19.739097-07:00

$date
Tue Sep 26 01:23:28 PM PDT 2023

Once you fix it you’ll be able to deploy machines. Note that it seems to be a timezone setting issue.

@r00ta - Thanks for the help.

But I am confused if this is the real reason ?

As shown below hwclock with --localtime ideally gives the time in UTC instead of PDT in which system clock is configured. When I give just hwclock command it just shows proper time and I dont think it is time issue.

ipin# hwclock 
2023-09-27 06:54:17.232248-07:00

ipin# date
Wed Sep 27 06:54:21 AM PDT 2023

ipin# hwclock --localtime
2023-09-27 13:54:39.681479-07:00

Meanwhile I tried changing NTP settings in MAAS as shown below and the deployment of the OS worked successfully with the Ubuntu server 22.04

image

I think it is a DNS issue but not clock issue.

NOTE: I only have one VM in which MAAS is installed which is acting as both regiond and rackd instead of 2 seperate instances.