PXE boot goes into GRUB shell

Hello,

In our team, we are planning to implement MAAS into our production. So, we have set up a test environment for this. but there is one thing that I don’t understand, when deploying an OS on multiple machines at once (or sometimes just one) it get stuck on status “powering on”. The machine has actually already began to PXE boot and is able to transfer two files bootx64.efi and grubx64.efi. The machine then get stuck at Grub shell and when i manually reboot it and the PXE boot starts, everything is working as expected.

In the test environment we got 1x Maas controller/region/rack and 7x nodes. PXE network is isolated and runned by Maas. I have only tried implementing Ubuntu 20.04/22.04 as this is the OS’s we are going to run in the end. BIOS is configured to only have the specific ethernet PXE boot priority (tried without disabling the rest), UEFI only (have tried both) and BMC/IPMI network configuration.

I’ve tried configure the network in different ways, changing node_timeout, deploying from UI and API (juju) but still the same outcome.

I’ve read other topics with similar problems but I still can’t manage to figure it out.

Name  Version                   Rev        Tracking     Publisher     Notes
maas  3.2.6-12016-g.19812b4da   23947      3.2/stable   canonical✓    -

Have tried with different version (2.9, 3.0, 3.1) but have best outcome with 3.2/stable.

/var/snap/maas/common/log/maas.log

2022-12-29T14:40:47.169377+00:00 maas maas.power: [info] Changing power state (on) of node: c4 (ry4bqp)
2022-12-29T14:40:47.176359+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:40:49.451052+00:00 maas maas.power: [info] Changing power state (on) of node: c6 (tx7f78)
2022-12-29T14:40:49.497842+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:40:49.853425+00:00 maas maas.drivers.power.ipmi: message repeated 2 times: [ [warn] using a non-secure cipher suite id]
2022-12-29T14:40:50.019360+00:00 maas maas.power: [info] Changing power state (on) of node: c7 (km48rq)
2022-12-29T14:40:50.026023+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:40:51.009609+00:00 maas maas.drivers.power.ipmi: message repeated 4 times: [ [warn] using a non-secure cipher suite id]
2022-12-29T14:40:51.053504+00:00 maas maas.power: [info] Changed power state (on) of node: c1 (ptspmn)
2022-12-29T14:40:51.268113+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:40:58.029894+00:00 maas maas.drivers.power.ipmi: message repeated 7 times: [ [warn] using a non-secure cipher suite id]
2022-12-29T14:40:58.074539+00:00 maas maas.power: [info] Changed power state (on) of node: c3 (pn3pnb)
2022-12-29T14:40:58.629518+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:40:58.659886+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:40:58.706299+00:00 maas maas.power: [info] Changed power state (on) of node: c2 (r7bhpm)
2022-12-29T14:40:59.395641+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:40:59.427265+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:40:59.472264+00:00 maas maas.power: [info] Changed power state (on) of node: c4 (ry4bqp)
2022-12-29T14:41:01.715278+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:41:01.759477+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:41:01.846059+00:00 maas maas.power: [info] Changed power state (on) of node: c6 (tx7f78)
2022-12-29T14:41:02.339585+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:41:02.390394+00:00 maas maas.drivers.power.ipmi: [warn] using a non-secure cipher suite id
2022-12-29T14:41:02.466791+00:00 maas maas.power: [info] Changed power state (on) of node: c7 (km48rq)
2022-12-29T14:41:45.040236+00:00 maas maas.interface: [info] enp3s0 (physical) on maas: New MAC, IP binding observed: 00:24:ec:f2:cb:2d, 172.20.20.198
2022-12-29T14:41:45.198547+00:00 maas maas.interface: [info] enp3s0 (physical) on maas: New MAC, IP binding observed: 00:24:ec:f4:2d:90, 172.20.20.199
2022-12-29T14:41:47.262713+00:00 maas maas.interface: [info] enp3s0 (physical) on maas: New MAC, IP binding observed: 00:24:ec:f2:cb:cd, 172.20.20.197
2022-12-29T14:41:47.778660+00:00 maas maas.interface: [info] enp3s0 (physical) on maas: New MAC, IP binding observed: 00:24:ec:f2:cb:ca, 172.20.20.200
2022-12-29T14:41:51.108550+00:00 maas maas.interface: [info] enp3s0 (physical) on maas: New MAC, IP binding observed: 00:24:ec:f2:cb:27, 172.20.20.201
2022-12-29T14:41:54.512835+00:00 maas maas.interface: [info] enp3s0 (physical) on maas: New MAC, IP binding observed: 00:24:ec:f2:cb:15, 172.20.20.202

/var/snap/maas/common/log/regiond.log

2022-12-29 14:40:52 maasserver.region_controller: [info] Reloaded DNS configuration:
	 * ip 172.20.20.202 allocated
	 * ip 172.20.20.201 allocated
2022-12-29 14:41:45 maasserver.rpc.leases: [info] Lease update: commit for 172.20.20.198 on 0:24:ec:f2:cb:2d at 2022-12-29 14:41:45 (lease time: 30s)
2022-12-29 14:41:45 maasserver.rpc.leases: [info] Lease update: commit for 172.20.20.199 on 0:24:ec:f4:2d:90 at 2022-12-29 14:41:45 (lease time: 30s)
2022-12-29 14:41:47 maasserver.rpc.leases: [info] Lease update: commit for 172.20.20.197 on 0:24:ec:f2:cb:cd at 2022-12-29 14:41:47 (lease time: 30s)
2022-12-29 14:41:47 maasserver.rpc.leases: [info] Lease update: commit for 172.20.20.200 on 0:24:ec:f2:cb:ca at 2022-12-29 14:41:47 (lease time: 30s)
2022-12-29 14:41:51 maasserver.rpc.leases: [info] Lease update: commit for 172.20.20.201 on 0:24:ec:f2:cb:27 at 2022-12-29 14:41:51 (lease time: 30s)
2022-12-29 14:41:52 maasserver.regiondservices.active_discovery: [info] Active network discovery: Active scanning is not enabled on any subnet. Skipping periodic scan.
2022-12-29 14:41:54 maasserver.rpc.leases: [info] Lease update: commit for 172.20.20.202 on 0:24:ec:f2:cb:15 at 2022-12-29 14:41:54 (lease time: 30s)

/var/snap/maas/common/log/rackd.log

2022-12-29 14:41:47 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 172.20.20.200
2022-12-29 14:41:47 provisioningserver.rackdservices.tftp: [info] grubx64.efi requested by 172.20.20.197
2022-12-29 14:41:47 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 172.20.20.200
2022-12-29 14:41:48 provisioningserver.rackdservices.tftp: [info] grubx64.efi requested by 172.20.20.200
2022-12-29 14:41:51 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 172.20.20.201
2022-12-29 14:41:51 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 172.20.20.201
2022-12-29 14:41:51 provisioningserver.rackdservices.tftp: [info] grubx64.efi requested by 172.20.20.201
2022-12-29 14:41:54 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 172.20.20.202
2022-12-29 14:41:54 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 172.20.20.202
2022-12-29 14:41:55 provisioningserver.rackdservices.tftp: [info] grubx64.efi requested by 172.20.20.202
2022-12-29 14:46:37 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: enp1s0f0, enp3s0.
2022-12-29 14:46:57 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.
2022-12-29 14:56:37 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: enp1s0f0, enp3s0.
2022-12-29 14:56:57 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.
2022-12-29 15:06:37 provisioningserver.rackdservices.dhcp_probe_service: [info] Probe for external DHCP servers started on interfaces: enp1s0f0, enp3s0.
2022-12-29 15:06:57 provisioningserver.rackdservices.dhcp_probe_service: [info] External DHCP probe complete.

2022-12-29 15-35-30

tags: @erik-lonroth

1 Like

I have run into a similar issue before and seen it in the forums a couple of other times. My guess is that you are using Intel NICs or the onboard NICs are Intel.

On intel NICs with the original firmware, there is a race condition that can cause a deadlock (#1437353). This can be fixed by flashing the firmware with Intel’s flash utility. Unfortunately, the flash utility only works on add-on NIC cards and can’t update onboard NICs. Our org was banging our head against this bug for a while but after finding the flash utility we haven’t had a problem.

2 Likes

Hi @nanderson91!
Thanks a lot for reply. We actually both have onboard and module Intel NIC’s. I believe there is a chance to flash the external ones, don’t think they have been updated since delivery. Thanks again!

@marcus, did @nanderson91’s suggestion resolve your issue?

Hi @billwear,
Before I even got to try it I found another “hack” to go around the problem. To enable PXE IPv6 and set it to boot before IPv4 solved it.

2 Likes

thanks, @marcus! well done.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.