Cannot deploy Official Ubuntu 22.04 and 18.04 only 20.04 works on multiple maas versions

Hi,

I am using Maas 3.3/stable successfully while deploying the official Ubuntu 20.04 Maas Image on a server. I can commission the server (using ubuntu 20.04 also), deploy and release continuously without any issue.
But if I try to deploy an official Ubuntu 18.04 or 22.04 image it doesn’t work. (with the same server, same settings everywhere). The process status gets stuck on performing PXE boot as per below:

18.04 Error Example

22.04 Error Example

I have already tried with multiple MaaS Versions like 3.4/edge, 3.2/stable but it is no different. 20.04 always works flawlessly 18.04 and 22.04 don’t.
Any tips?

Thanks
BR

Hi @jonaspaulo ,

Could you please take a look at the rackd logs and check if is there anything there?

The error for 18.04 seems similar to something another user reported recently MAAS PXE boot Ubuntu 20.04 image and failed with busybox - #8 by lindaor15

Thanks for the reply @r00ta

The logs on a new 18.04 deploy attempt:

2023-12-04 10:18:42 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.0.10.201
2023-12-04 10:18:42 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.0.10.201
2023-12-04 10:18:42 provisioningserver.rackdservices.tftp: [info] grubx64.efi requested by 10.0.10.201
2023-12-04 10:18:43 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/command.lst requested by 10.0.10.201
2023-12-04 10:18:43 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/fs.lst requested by 10.0.10.201
2023-12-04 10:18:43 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/crypto.lst requested by 10.0.10.201
2023-12-04 10:18:43 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/terminal.lst requested by 10.0.10.201
2023-12-04 10:18:43 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg requested by 10.0.10.201
2023-12-04 10:18:43 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg-b4:96:91:c9:f2:6a requested by 10.0.10.201
2023-12-04 10:18:43 provisioningserver.rackdservices.http: [info] /images/ubuntu/amd64/ga-18.04/bionic/stable/boot-kernel requested by 10.0.10.201
2023-12-04 10:18:43 provisioningserver.rackdservices.http: [info] /images/ubuntu/amd64/ga-18.04/bionic/stable/boot-initrd requested by 10.0.10.201
2023-12-04 10:18:44 Uninitialized: [info] ClusterClient connection established (HOST:IPv6Address(type=‘TCP’, host=’::ffff:192.168.122.1’, port=35386, flowInfo=0, scopeID=0) PEER:IPv6Address(type=‘TCP’, host=’::ffff:192.168.122.1’, port=5250, flowInfo=0, scopeID=0))
2023-12-04 10:18:44 Uninitialized: [info] ClusterClient connection established (HOST:IPv6Address(type=‘TCP’, host=’::ffff:192.168.122.1’, port=35398, flowInfo=0, scopeID=0) PEER:IPv6Address(type=‘TCP’, host=’::ffff:192.168.122.1’, port=5250, flowInfo=0, scopeID=0))
2023-12-04 10:18:44 Uninitialized: [info] ClusterClient connection established (HOST:IPv6Address(type=‘TCP’, host=’::ffff:192.168.122.1’, port=35406, flowInfo=0, scopeID=0) PEER:IPv6Address(type=‘TCP’, host=’::ffff:192.168.122.1’, port=5250, flowInfo=0, scopeID=0))
2023-12-04 10:18:44 provisioningserver.rpc.clusterservice: [info] Event-loop ‘maas:pid=7298’ authenticated.
2023-12-04 10:18:44 provisioningserver.rpc.clusterservice: [info] Event-loop ‘maas:pid=7298’ authenticated.
2023-12-04 10:18:44 provisioningserver.rpc.clusterservice: [info] Event-loop ‘maas:pid=7298’ authenticated.
2023-12-04 10:18:44 provisioningserver.rpc.clusterservice: [info] Rack controller ‘ak644g’ registered (via maas:pid=7298) with MAAS version 3.4.0-14319-g.3ab76533f.
2023-12-04 10:18:44 provisioningserver.rpc.clusterservice: [info] Rack controller ‘ak644g’ registered (via maas:pid=7298) with MAAS version 3.4.0-14319-g.3ab76533f.
2023-12-04 10:18:44 provisioningserver.rpc.clusterservice: [info] Rack controller ‘ak644g’ registered (via maas:pid=7298) with MAAS version 3.4.0-14319-g.3ab76533f.

Just some more detail.
I thought that the issue could be with the image downloading or something since the image version that always worked across multiple MaaS versions was the default one that is included (20.04).
But I tried with some more Ubuntu versions downloaded after Maas Installation and here are the results (always maintaining the same setup, settings and repeatable tests (multiple deploys releases and achieving the same result):

  • Ubuntu 23.04 - Works
  • Ubuntu 23.10 - Works
  • Ubuntu 22.10 - Doesn’t work (same errors as 22.04)

Thanks a lot.

Hey @jonaspaulo,

I wonder if you can capture the same console output for a 20.04 boot. Is there something obvious that happens there that doesn’t happen in the other versions?

I would guess that commissioning with 22.04 or 18.04 would also fail but it might be useful to check.

Another thought I had was you might have a custom curtin_preseed for 20.04 but not other versions. But I don’t think you’re getting far enough for that to matter.

Is this restricted to a specific server or is this behavior with all servers you deploy?

I would want to see more of the failed deployments’ console output. If you could screen-record the boot process, perhaps you’ll see an error earlier in the process that will give some hints.

Not sure what’s going on, these are just my initial thoughts on the problem.

Cheers,
Vern

I’m having a very similar issue with ipv6. Hosts boot with no apparent consistency.
Half of the times they boot and download the squashfs, the rest they fail with “network is unreachable”.
I have captured rack and gateway traffic and screen recorded the hosts since I can’t wrap my head around this, it almost feels like a sneaky race condition.

All of these recordings contain sensitive data though, is there any way I can share them with the MAAS team in order to debug this?

Moreover, I’ve set up a debugging host with a bridge (2 interfaces that pass through all traffic) to capture packets just before the host, and regardless of which distro I boot it always succeeds with no error; as soon as I plug the host back directly into the switch it starts failing, hence my suggestion of a race condition.

@rhxto there are a couple of options:

  • if you are a Canonical customer you should open a case on the customer portal
  • if you are a community user since there is no formal agreement it’s up to you to share your logs/data with Canonical and/or the community.
    For example, you can use whatever tool you prefer (google drive, mail, whatever) and give somebody (for example, me) access. I can then upload your logs to our internal storage and give read permissions to the team

I’m a community user. I’ll share the files to your email as replacing all my data in the logs would be way too long and break the packet captures, not to mention editing the screen recordings.

If the solution is related to this thread I’ll post it here as well.
Thanks!

yup sounds good, if you could also open a bug with all the information you have it would be great. I will then attach the logs to that bug (will be readable only for people within Canonical)

Before doing anything - I think this might have something to do with Bug #2016908 “udev fails to make prctl() syscall with apparmor=0...” : Bugs : maas-images.

I’d like to try that new kernel first. How can I do this?

I confirm that with ubuntu 23.10 it works perfectly. It most likely is due to the bug I linked in the previous comment.
So much network debugging for nothing :’)

Is there anything I can do to force ubuntu 23 as commissioning OS? It only lets me choose LTS distros.

See MAAS images used for deployment

I did a very quick test some time ago and 23.10 does not work for commissioning. Since we only support LTS for commissioning I don’t have much time to investigate further. In the post I linked you can see how to bypass the check and use 23.10 for commissioning (in short, try Deb and delete the line https://github.com/maas/maas/blob/0d3fffe8ae05c9fcfd0dff782acb5eb6655944d1/src%2Fprovisioningserver%2Fdrivers%2Fosystem%2Fubuntu.py#L59)

thanks for the input.
Right now I only have one server to test.
I am going to test with another server if possible to see if it is something related to this one and then post the videos if I can
BR

@jonaspaulo you should give a look to the bug I linked.
I think that’s what causing your issue as well.

@r00ta fyi: I was able to successfully commission and deploy all of my nodes with 23.10.

Let me know if I can help with the non-lts commissioning issues.

Thanks but since we only support LTS for commissioning it’s not my priority at the moment :slight_smile: Only one person asked for it, in case you have the same need (but from your last message I guess not) you might coordinate with him to sort it out

Alright, in case I face any issues I’ll contact you.
In the meantime I’ll keep using 23.10 to commission while I wait for the patched kernel for 22.04.

Thanks

@r00ta - Is there a way to compile your own kernel / image for commissioning?

MAAS only supports LTS, but since everything is open source yes, in theory you can

I tried to set apparmor=1 in the kernel parameters in Maas but it is still not working