2.9 early snags

2.9 release has for me coincided nicely with adopting some new hardware to expand the cluster.

There’s some pain points I’ve seen so far. These may not be 2.9 specific, and could very well be quirks of my infra setup or the new hardware but I can’t stop to dupe check against launchpad right now, and I would like to write them down whilst they’re fresh to me:

  1. The existing machines on the cluster are happily redeployed with Focal+KVM pods with no fuss, and some IPMI sporadic issues I’d seen seem to have vanished :+1:
  2. The new hardware is not happy - PXEBoot works for enlist/comish but during deploy (at the final reboot) it bails when trying to boot local disk with “WARN: No MBR Magic, treating disk as raw”. I’ve seen this before and imagine it’s an EFI bios compatibility tweak, rather than anything related to the PXE changes in 2.9.
  3. In debugging this problem I tried to instead deploy Bionic, and was blocked with "Error:hwe_kernel(hwe-18.04-edge) is older than min_hwe_kernel(hwe-20.04-edge).". I checked if this was from me setting a specific version for comish/deploy but changing those values didn’t seem to help.
  4. This one is definitely a me thing, but I hit the gotcha of MAAS clobbering disk superblocks for block devices I’d preseeded with data for use post-deploy. Is there any way I can disable that behavior, armed with the knowledge that if I see any problems related to unique IDs I know to resolve those manually? It’s not usually a problem in my standard deploy process, but when doing repeated enlist/comish/deploy cycles, at some point I forgot to remove the preseed disks.
  5. I manually added a machine to MAAS, and now can’t delete it. in the UI I see this error.
1 Like

@seffyroff, many thanks for sharing your own experiences. i’m making a note to fold the learnings into doc.

i’m going to forward this post to someone to help source your issues. one of us will let you know what we think.

1 Like

I’ve been looking more into the PXE boot on UEFI issue, I’m able to repro it in an EFI-configured VM. I’ll see if I can solve it or provide more insight here.

The early results from repro are:

Test running on KVM Ubuntu Focal. Host also Focal (Desktop with Network Manager stack, bridge added with nmcli so not ideal) Host itself is able to pxeboot for enlist/comish

Anyhow, on the VM it sits at 'Start PXE for IPv4 for a minute or 2, then seems to hit some sort of silent timeout and gets to this stage:

After another minute like this it eventually arrives at the UEFI shell.

1 Like

@seffyroff, did you ever figure this one out? looks like nobody ever responded to you 'till now, sorry.