MaaS 2.5 beta4 - Fail to deploy as KVM Host


#1

Hey guys,

I’m using MaaS 2.5 beta 4 for a few days, everything seems to be working fine. I can deploy Ubuntu 18.04 bare-metal machines normally, PXE boot okay, BOND channels, and etc.

I can also “Release / Deploy” the Machines over and over again.

However, if I just click on “Register as KVM Host” option, the deployment fail and I don’t know why.

Any idea about how to fix this? I wish to start using this feature ASAP!

Can I just deploy the machine and “add as a Pod” later on? I don’t want to lose any network functionalities by adding the KVM host manually after the deployment.

Thanks!
Thiago


#2

Thanks for the feedback on the latest beta; I saw your comment on bug #1800573. That bug is more general, but we are also looking at fixing a more specific issue which likely has the same root cause - bug #1801420.

If you could post any relevant logging on the more specific bug, that could be helpful. For example, the rsyslog output from /var/log/maas/rsyslog/<host> on the controller, and /var/log/cloud-init*.log from the installed system would be helpful. I would also check the event log for the machine; I’ve seen a few cases where MAAS reports in the event log that it cannot log into the KVM host after deployment.

That said, if the new deploy-time configuration feature isn’t working properly and you still want to test KVM pods in general, there should be no problem with simply deploying the machine and using basically the same procedure we use to set up the KVM pod. There is only one caveat: when you use the checkbox to deploy KVM, we use a feature that is only exposed via the API when you deploy the machine: bridge_all=true. This causes a bridge to be automatically created for every interface we find on the machine. The advantage of doing this is that you can reach VMs from the host itself. In the absence of the ability to attach to a bridge, a macvtap type connection will be attempted, which should allow the VM to be attached directly to a physical interface.

The new networking features require the MAAS database to be up-to-date, so that when you ask for an interface on a specific network, we know which interface to attach to. So if you want to deploy the machine manually and then configure the pod, you just need to be careful to not change anything about the network configuration, such as interface names, addition of bridges, etc.


#3

Ok, cool! I’ll post the logs on Launchpad then!

I just reinstalled MaaS 2.5 from scratch but, this time, using 2 VMs, one for “apt install maas-region-controller” and another one for “apt install maas-rack-controller”.

Machine deployment still working!

I’ll try again the “Register as KVM Host” option.

BTW, I’m not doing any changes to this “deployed server”, just: Release, select “KVM Host”, Deploy. The physical server have 2 NIC cards, NIC1 for PXE, 2 NIC untouched / disconnected".


#4

So, the KVM host deployment failed again.

However, I can’t ssh into the machine from the rack controller, I can’t even ping it!


2018-11-16 14:41:29 provisioningserver.rackdservices.tftp: [info] pxelinux.cfg/01-0c-c4-7a-c6-5e-82 requested by 192.168.6.255

The 192.168.6.255 is the new machine… I’ll collect the logs and post soon!

So, I guess that something went wrong while MaaS tried to reconfigure the network interfaces and it kicked out the machine.


#5

I just found the problem!

https://paste.ubuntu.com/p/kWQd6Jwvrz/ - line 22

The generated Netplan yaml file is broken, no IPs!


bridges:
    br-eno1:
        addresses:
        - None/22
        gateway4: 192.168.4.1


#6

Nice find; thanks for the help debugging this! I’m going to try to take a closer look at this issue later today.


#7

By the way, if you have a chance, could you follow the steps here to gather more information, in order to help narrow down the root cause?

To be specific, I’m most interested in the output of get-curtin-config, which you should be able to run without redeploying.


#8

Sure! I’ll do that.

Also, I just uploaded the cloud init logs to the LP#1800573 bug report.


#9

The cloud-init logs actually have enough information, so no need to get the curtin config. Thanks again!


#10

AHA!!! Machine as KVM host, deployed!!!

My intuition was right… I just changed the “PXE Net IP” from “Auto assign”, to “Static” and voialá!

:smiley:


#11

Looks like that the “Deploy as KVM Host” option is very sensitive about the network settings.

I just tried to deploy on top of a BOND channel, that I know it works but, the deployment failed.

However, I can SSH into the “failed KVM host”! Doesn’t need to enter Rescue mode, at least.


#12

This is weird, I just logged into the “failed deployment” and everything is there, the LACP BOND is up (br-bond0 created by MaaS), virsh is installed and “virsh list” works".

But, somehow, it have failed status on MaaS Web UI.

I guess that I’ll just try to manually install KVM at a regular deployed machine and add it as a Pod, maybe it will works for me! :slight_smile:


#13

Can you check the event log for the machine, and /var/log/maas/regiond.log to see if there are any errors related to KVM pod deployment?

Are all the IP addresses MAAS knows about on the machine reachable from MAAS itself?

On the deployed machine, can you try passwd virsh, set it to a known value, then add the KVM pod manually with a URL in the format qemu+ssh://virsh@<ip>/system, and let us know if it works?


#14

So, here is what I’m trying to do (worth to mention that KVM Host deployments works without bond and with static IP on PXE network).

First try with manually configured BOND via MaaS Interfaces page, worked:

Now, machine Released, trying as KVM Host on top of same BOND that just worked (above), but, failed later on:

It failed! But looks like that the machine was deployed correctly:

Weird…

BTW, this one didn’t took ~40 to turn into “failed deployment”, it was quick!


#15

YAY!!! I can now Compose VMs from MaaS and deploy them on MaaS’ KVM Host, on top of a br-bond0 channel! Exactly what I want!

I just manually deployed Ubuntu and installed “libvirt-daemon-system qemu-kvm”, configured the “br-bond0” at libvirt and it’s working!

Why OpenStack?! LOL


#16

Hi Thiago,

Just to confirm, are you saying you are manually creating the bridge on the machine before attempting to deploy a KVM host?

If so, MAAS will automatically deploy a KVM host and will automatically bridge the interfaces, so you shouldn’t really need to bridge them yourself before deployment.


#17

I think we’ve narrowed down the cause of this issue; it’s captured in bug #1804557. The fix didn’t make it in time for RC1, but the fix is being tested for inclusion in a future release as we speak! If you’d like, you can take a look at the merge proposal, make a corresponding change to node.py (you’ll find it under /usr/lib/python3/dist-packages/maasserver/models in a .deb-installed version of MAAS).

Thank you for testing MAAS and reporting this.