I am trying to deploy newer machines OptiPlex Small Form Factor Plus 7010. They all are experiencing deployment failure: Marking node failed - Installation failed (refer to the installation log for more information) after the “Loading ephemeral” stage.
OS: Ubuntu 20.04 LTS focal
I tried all of the available kernel versions
Tried a few BIOS versions
Here is one of the logs in regiond
regiond: [info] 127.0.0.1 POST /MAAS/metadata/status/[systemID] HTTP/1.1 --> 204 NO_CONTENT (referrer: -; agent: python-requests/2.22.0)
There is no other information I could find
Any input is appreciated.
Update: Rackd related logs
maasserver.ipc: [info] Worker pid:x lost burst connection to ('10.7.x.x', 5252).
RegionServer,x,::ffff:10.6.x.2: [info] RegionServer connection lost (HOST:IPv6Address(type='TCP', host='::ffff:10.7.x.x', port=5252, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:10.6.x.2', port=38132, flowInfo=0, scopeID=0))
maasserver.ipc: [info] Worker pid:195808x lost burst connection to ('10.7.x.x', 5252).
r00ta
31 July 2024 20:03
2
I’d suggest to monitor what happens on the target machine by looking at its serial console
I’m not sure how to look at its serial console. In the meantime, I have added more logs from rackd to see if it helps narrow down anything.
r00ta
1 August 2024 05:40
4
nope, these messages are harmless
Update: I now encountered these errors when trying to erase the disk.
My MAAS version is 3.4.3 and I am using snap.
HTTP Request - /images/ubuntu/amd64/no-such-kernel/focal/no-such-image/boot-kernel
Wed, 31 Jul. 2024 20:12:18 Marking node failed - Missing boot image ubuntu/amd64/no-such-kernel/focal.
Wed, 31 Jul. 2024 20:12:17 Performing PXE boot
Wed, 31 Jul. 2024 20:12:17 PXE Request - commissioning
Wed, 31 Jul. 2024 20:12:17 TFTP Request - /grub/grub.cfg
Wed, 31 Jul. 2024 20:12:17 TFTP Request - /grub/x86_64-efi/terminal.lst
Wed, 31 Jul. 2024 20:12:17 TFTP Request - /grub/x86_64-efi/fs.lst
Wed, 31 Jul. 2024 20:12:17 TFTP Request - /grub/grub.cfg-cc:96:xx:32:xx:xx
Wed, 31 Jul. 2024 20:12:17 TFTP Request - /grub/x86_64-efi/crypto.lst
Wed, 31 Jul. 2024 20:12:17 TFTP Request - /grub/x86_64-efi/command.lst
Wed, 31 Jul. 2024 20:12:12 TFTP Request - grubx64.efi
Wed, 31 Jul. 2024 20:12:12 TFTP Request - bootx64.efi
Wed, 31 Jul. 2024 20:12:12 TFTP Request - bootx64.efi
Wed, 31 Jul. 2024 20:11:53 Node powered on
Wed, 31 Jul. 2024 20:11:46 Power cycling
Wed, 31 Jul. 2024 20:11:46 Node - Started releasing [device]
I believe this may be similar to this bug here: Bug #2013529 “Nodes stuck in Failed Disk Erasing due to wrong ip...” : Bugs : MAAS as the workstations with the issue are new vPro workstations.
The deployment issue is also similar to https://bugs.launchpad.net/maas/+bug/1908452 .
Am I missing something?