MAAS 3.6 - Deploying Baremetal with Libvirt - Won't Complete the Deployment

Hi there,

When Deploying a baremetal with Libvirt enabled, I get this error on the output, “Cloud Init Final Stage, no datasource found” and it won’t complete the deployment. I also can’t SSH into the box, but I can ping the interfaces.

If I deploy baremetal without Libvirt support, it completes the deployment and I can SSH into the box.

Not sure what’s going on when MAAS creates those network bridges for Libvirt, I tried with 20.04 and 24.04, same problem.

What’s the network config for that machine?

I can’t get into it to see the netplan, nor can i use the terminal to log in, the only thing I can show you is the network setup in the MAAS UI of that baremetal, will that work?

Could you try to use the workaround I explained here Comment #13 : Bug #2106671 : Bugs : MAAS and see if that fixes your problem?

Hi,

I encounter a very similar issue but not related to libvirt in any way.
I’m deploying a new cloud based on MaaS 3.6 (snap edition) on top of Ubuntu 24.04 nodes (I had one based on MaaS 3.1/Ubuntu 20.04 before).

When deploying machines with a LACP bond with MaaS, once the machine is installed and it reboots to enable the new configuration (hence, the LACP bonded network), it seems the OS don’t has access to the network as if it was unplugged.
This makes downloadind MaaS meta data fail and make cloud-init crashes at the end of the process with the same error that we can see in @mk4umha screenshot.
Worst than that, this makes network services such as SSH fail and there is no way to connect to the machine once booted.

What is even more weird is the fact that L2/L3 network is up since I can ping this newly deployed machine from the MaaS node !
But, as for @mk4umha , there is no feedback to MaaS and deployment for this machine remains stuck in “Rebooting” status.

I tried to re-deploy the same machine with the exact same network configuration except that I changed the bond mode to “active-backup” instead of “802.3ad” and it worked, deployment was a success.

I even reconfigured the network by changing the bond mode afterwards to re-enable LACP and then, after a reboot, it worked !

This is just pure crazyness and definitely a fault, MaaS 3.6 is just unusable like this.
I attach the Netplan configuration that is applied during node installation taken form the IPMI console output.

Any help would be very welcomed

####### Working Active-Backup bond conf ##############
bonds:
  bond0:
    addresses:
    - 10.38.130.16/24
    gateway4: 10.38.130.1
    interfaces:
    - eno12399np0
    - eno12409np1
    macaddress: 14:23:f2:f4:15:10
    mtu: 9000
    nameservers:
      addresses:
      - 10.38.130.11
      - 10.38.130.19
      - 10.38.130.12
      search:
      - maas
    parameters:
      down-delay: 0
      gratuitious-arp: 1
      mii-monitor-interval: 100
      mode: active-backup
      transmit-hash-policy: layer2
      up-delay: 0
ethernets:
  eno12399np0:
    match:
      macaddress: 14:23:f2:f4:15:10
    mtu: 9000
    set-name: eno12399np0
  eno12409np1:
    match:
      macaddress: 14:23:f2:f4:15:11
    mtu: 9000
    set-name: eno12409np1
version: 2
vlans:
  bond0.21:
    addresses:
    - 10.38.135.32/24
    id: 21
    link: bond0
    mtu: 9000
    nameservers:
      addresses:
      - 10.38.130.11
      - 10.38.130.12
      - 10.38.130.13
      - 10.38.130.18
      search:
      - maas
####### Non Working LACP bond conf ##############
bonds:
  bond0:
    addresses:
    - 10.38.130.16/24
    gateway4: 10.38.130.1
    interfaces:
    - eno12399np0
    - eno12409np1
    macaddress: 14:23:f2:f4:15:10
    mtu: 9000
    nameservers:
      addresses:
      - 10.38.130.11
      - 10.38.130.19
      - 10.38.130.12
      search:
      - maas
    parameters:
      down-delay: 0
      lacp-rate: fast
      mii-monitor-interval: 100
      mode: 802.3ad
      transmit-hash-policy: layer3+4
      up-delay: 0
ethernets:
  eno12399np0:
    match:
      macaddress: 14:23:f2:f4:15:10
    mtu: 9000
    set-name: eno12399np0
  eno12409np1:
    match:
      macaddress: 14:23:f2:f4:15:11
    mtu: 9000
    set-name: eno12409np1
version: 2
vlans:
  bond0.21:
    addresses:
    - 10.38.135.32/24
    id: 21
    link: bond0
    mtu: 9000
    nameservers:
      addresses:
      - 10.38.130.11
      - 10.38.130.12
      - 10.38.130.13
      - 10.38.130.18
      search:
      - maas

As you can see, no big difference except the bond mode.
I don’t explain also why MaaS don’t use the same DNS servers for the bond and for the vlans added on top of the bond.

And remember what I said, re-using the same LACP conf after a successful Active-Backup deployment but edited by hand works ! So basically, this is not the Netplan config that is faulty, there is “something else”.

please see Can't deploy Ubuntu newer than Bionic if using bond interfaces

Has it been released already ?
My MaaS platform syncs images directly from maas.io so I suppose that if it was already fixed, it should work already except that I still encounter the issue.
This is known for weeks now and a blocker for us.
I just retried and I got some improvements when I had the MAAS provided Network on both the bond itself (no tagging) and a tagged vlan with the same network attached to it (the PXE network) … This is a workaround that works but it means I provide 2 IPs for the same network for each machine which is far from being ideal.

To me this issue is not fixed and a real blocker for us.

No, the fix is not available yet on images.maas.io.

any ETA for this release ?
Or maybe, it seems the updated cloud-init release is available in proposed but I don’t see how to enable the proposed repositories in MaaS.
Is there a way to do that as a workaround ?

The current workaround is described in the launchpad bug (see the comments there). No ETA, we are waiting for the package to land in the archive.

Let my try this and get back to you, thanks!

I tried image Index of /ephemeral-v3/candidate/jammy/amd64/20250606 which is the last jammy release available in candidate and it is worst than before.
As can be seen in the manifest, cloud-init has been upgraded to 25.1.2-0ubuntu0~22.04.1 but trying to commission or deploy any machine with this image fails in MaaS.

It looks like cloud-init is not even triggered.
My machine are stuck in “Loading ephemeral” or “Performing PXE boot” state on the MaaS side while they are fully started and waiting on the prompt within the IPMI console.

Here is my LP comment where I attached my boot log : Comment #21 : Bug #2106671 : Bugs : MAAS

Latest candidate image is broken.

When can we hope for a fixed release ?

No ETA for the upstream community images yet. This week we’ll investigate what’s broken in the candidate images

It looks like it’s fixed for me when deploying as libvirt on 20.04.

All 3 of my baremetal completes the deployment and shows as “deployed” now.