Attempt to read or write outside of disk hd0

We have a machine that constantly fails to install Ubuntu. The server fails in the last reboot with the following message:

Booting from Hard drive C:
error: attempt to read or write outside of disk `hd0`.
Entering rescue mode...
grub rescue>

I have tried to recover the installation as suggested in different Stackoverflow without success (typing normal doesn’t work either). What is interesting is that sometimes the installation is successful, but after some some days (or weeks) the same problem happens when the server reboots.

maas/focal,now 1:3.1.0-10901-g.f1f8f1505-0ubuntu1~20.04.1 all [installed]

The logs don’t show anything relevant for the server, for example rackd.log:

2022-01-25 14:35:17 provisioningserver.rackdservices.http: [info] pxelinux.cfg/33354a1a-5912-1044-3131-a1c04f434a21 requested by 150.10.234.173
2022-01-25 14:35:17 provisioningserver.rackdservices.http: [info] pxelinux.cfg/01-a1-b1-1d-d2-11-61 requested by 150.10.234.173
2022-01-25 14:35:17 provisioningserver.rackdservices.http: [info] /images/ubuntu/amd64/ga-20.04/focal/stable/boot-kernel requested by 150.10.234.173
2022-01-25 14:35:17 provisioningserver.rackdservices.http: [info] /images/ubuntu/amd64/ga-20.04/focal/stable/boot-initrd requested by 150.10.234.173
2022-01-25 14:35:31 provisioningserver.rackdservices.http: [info] /images/ubuntu/amd64/ga-20.04/focal/stable/squashfs requested by 150.10.234.173

The server is a PowerEdge R7525 with a PERC H745 Controller, configured to boot in legacy. Could this be a MaaS bug?

Try to manually install ubuntu on the server.
MaaS doesn’t keep track of the server after deployment and the crash is more like a server fail.

I have experience with PowerEdge R7525 with PERC H745 controller (boot in UEFI): MaaS works fine on these servers.

1 Like

Installing manually works without a problem.

Did you manage to configure the server to boot in UEFI with MaaS? I tried it and the server boots in an infinite loop during the process.

<Component FQDD="BIOS.Setup.1-1">
         <Attribute Name="BootMode">Uefi</Attribute>
         <Attribute Name="SetBootOrderEn">NIC.PxeDevice.1-1</Attribute>
         <Attribute Name="SetBootOrderFqdd1">NIC.PxeDevice.1-1</Attribute>
         <Attribute Name="PxeDev1EnDis">Enabled</Attribute>
         <Attribute Name="PxeDev1Interface">NIC.Integrated.1-1-1</Attribute>
         <Attribute Name="UefiBootSeq">NIC.PxeDevice.1-1</Attribute>
</Component>

Try these bios profile options. This configuration means you are using PXE on a 10G interface.

The configuration is similar, also using an integrated NIC. MaaS can commission the server but deploying is an infinite loop.

need logs :grinning:
I’m sure there is a error somewhere

The problem vanished when I removed one a custom curtin script and activated UEFI boot. I deployed the machine without the final curtitn scrip and it worked!

The question now is: what is wrong with the custom curtin script?

#cloud-config
debconf_selections:
 maas: |
  {{for line in str(curtin_preseed).splitlines()}}
  {{line}}
  {{endfor}}

late_commands:
  10_upgrade: ["curtin", "in-target", "--", "sh", "-c", "/usr/bin/apt update; /usr/bin/apt -y upgrade"]
  11_packages: ["curtin", "in-target", "--", "sh", "-c", "DEBIAN_FRONTEND=noninteractive /usr/bin/apt -y install git ldap-auth-config nss-updatedb nscd nslcd libnss-ldapd libpam-ldapd ldap-utils autofs"]
  12_env: ["curtin", "in-target", "--", "sh", "-c", "git config --system http.sslverify false"]
  13_clone: ["curtin", "in-target", "--", "sh", "-c", "git clone https://gitlab-ci-token:XXXXX@gitlab.company/fwt/maas-curtin.git /tmp/deployment-scripts/"]
  14_install_localadmin: ["curtin", "in-target", "--", "sh", "-c", "/usr/bin/python3 /tmp/deployment-scripts/deploy/generic/localuser.py"]
  15_nfs_install: ["curtin", "in-target", "--", "sh", "-c", "/usr/bin/python3 /tmp/deployment-scripts/deploy/generic/nfs.py"]
  16_ldap_install: ["curtin", "in-target", "--", "sh", "-c", "/usr/bin/python3 /tmp/deployment-scripts/deploy/generic/ldap.py"]
  17_install_install_sudo-ldap: ["curtin", "in-target", "--", "sh", "-c", "export SUDO_FORCE_REMOVE=yes; /usr/bin/apt -y install sudo-ldap"]
  18_install_configure_sudo-ldap: ["curtin", "in-target", "--", "sh", "-c", "/usr/bin/python3 /tmp/deployment-scripts/deploy/generic/sudo_ldap.py"]
  19_configure_intranet_rootcertificates: ["curtin", "in-target", "--", "sh", "-c", "cd /tmp/deployment-scripts/deploy/generic/; chmod +x intranet_rootcertificates.sh; ./intranet_rootcertificates.sh"]
  20_set_ssl_verification: ["curtin", "in-target", "--", "sh", "-c", "git config --system http.sslverify true"]
  21_install_extra_packages: ["curtin", "in-target", "--", "sh", "-c", "DEBIAN_FRONTEND=noninteractive /usr/bin/apt -y install net-tools build-essential valgrind"]
  22_clean_up_packages: ["curtin", "in-target", "--", "sh", "-c", "DEBIAN_FRONTEND=noninteractive /usr/bin/apt -y autoremove"]

Note that this was working without a problem with machines booting in Legacy mode.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.