MaaS 3.4 - Addressing Single Point of Failure in MaaS-LXD Connection

Hello,

I’ve recently installed MaaS 3.4 and deployed Ubuntu 22.04 on three separate machines. I chose not to automatically register these machines as VM hosts via the MaaS UI. The reason was, MaaS seemed to deploy three isolated LXD hosts, which would later prevent me from forming an LXD Cluster. This would necessitate a reinstallation of LXD. Therefore, I initiated these machines in a blank state, with just the OS installed. Additionally, MaaS’s use of the PXE network to communicate with LXD is not ideal for my setup, as the PXE network (kinda OOBM here) does not align with my InBand on top of a bond channel.

Following this, I manually created an LXD Cluster on these freshly deployed machines and connected it with MaaS. However, I’ve encountered a problem: MaaS only recognizes one IP from the LXD Cluster. If the LXD node known to MaaS goes offline, MaaS loses its ability to manage the LXD Cluster.

To address this, I added a second LXD node (from the same LXD Cluster) in MaaS, but this resulted in a duplicate host listing in the MaaS UI.

I’m seeking advice on how to properly integrate an LXD Cluster into MaaS so that it remains functional even if one or two nodes go offline. Would adding a VIP to the LXD Cluster and then integrating the LXD Cluster into MaaS using that VIP be a viable solution? What is the recommended method to achieve this?

I also came across a StackOverflow post stating that MaaS doesn’t support LXD Clusters with Ubuntu 22.04:

Link to StackOverflow post

Is this still the case with MaaS 3.4? Could this be the reason behind the issue I’m experiencing?

Here are some screenshots for reference:

  1. LXD Cluster added to MaaS:

  1. LXD node known to MaaS rebooted intentionally, MaaS losing access to LXD Cluster:

NOTE: MaaS regains access after LXD Node comes back online.

  1. Another LXD node was added to MaaS to address the SPOF situation:

  1. Resulting duplicate LXD Cluster listing in MaaS UI:

I’m eager to find a proper setup that eliminates the SPOF. Any guidance would be greatly appreciated.

Cheers!

@tmartins,

Re: “Additionally, MaaS’s use of the PXE network to communicate with LXD is not ideal for my setup, as the PXE network (kinda OOBM here) does not align with my InBand on top of a bond channel.”

That’s not always the case, it depends on the LXD URLs the user configures for LXD communication, and then PXE to the LXD instances depends on how you configure the LXD network. You can setup a bridge on the host the is used that its parent is what the host boots on and tell LXD to use that bridge, so it’s all on a single PXE network.

To create a LXD cluster in MAAS, you first need to cluster LXD by itself, then register a single host of that cluster into MAAS. It sounds like you’re trying to add all hosts into MAAS and then cluster them, which is incorrect and won’t work.

I don’t know that we’ve tested clustering with 22.04.

Hello @billwear,

Thank you for your response. I appreciate your time and effort in explaining the setup process for an LXD cluster and its registration in MaaS. However, I believe there may have been a slight misunderstanding regarding my issue.

I have already successfully created an LXD Cluster and registered it with MaaS. The problem I’m encountering is not with the setup or registration but with how MaaS manages the LXD Cluster when one or more nodes go offline.

MaaS only recognizes one IP from the LXD Cluster in the recommended setup. If the LXD node known to MaaS goes offline, MaaS loses its ability to manage the LXD Cluster. To address this, I tried adding a second LXD node (from the same LXD Cluster) in MaaS, resulting in a duplicate host listing in the MaaS UI.

My main question is: How can I properly integrate an LXD Cluster into MaaS to remain functional even if one or two nodes go offline? Is there a way for MaaS to recognize all the IPs in the LXD Cluster, or is there a recommended method to handle this situation?

Thanks!

wrt MAAS, a VM is tied to a VMHost, even in the cluster, so when the cluster loses the node, the other VMHosts should still be able to compose a VM – so you’d have to move the VM to the other host.

Hello @billwear,

Thank you for your continued assistance. I understand that a VM is tied to a VMHost in MaaS, and that other VMHosts should still be able to compose a VM when a node goes offline. However, my concern is not about the management of VMs within the cluster, but about MaaS’s ability to maintain its connection to the LXD Cluster itself.

In the recommended setup, MaaS only recognises one IP from the LXD Cluster. If the LXD node known to MaaS goes offline, MaaS loses its ability to manage the LXD Cluster entirely, not just the VMs on the offline node. This is the issue I’m trying to address.

Is there a way for MaaS to recognize all the IPs in the LXD Cluster, not just one, so that it can still manage the cluster even if the LXD node known to MaaS goes offline? I’m looking for a solution that allows MaaS to maintain its connection to the LXD Cluster as a whole, regardless of the status of individual nodes.

I hope this clarifies my question.

can you give me more information? i need the IPs of the nodes in your LXD cluster, and logs. you can anonymize the IPs if you must, but i need them to match. what you’re currently describing doesn’t seem to match what we expect, so we may not yet be on common ground.