Commissioning VM in LXD host fails

These looks good.

Then it is stuck somewhere and fails to boot I guess.

That launches container. You are missing --vm flag to start aVM

Did you check the bridge part from the doc?

The behaviour you observe looks similar to the one mentioned in the doc:

The bridge LXD creates is isolated and not managed by MAAS. If this bridge is used, you would be able to add the LXD VM host and compose virtual machines, but commissioning, deploying, and any other MAAS action which uses the network will fail – so yes is the correct answer here.

I am confused.

The bridge LXD creates is isolated and not managed by MAAS. If this bridge is used, you would be able to add the LXD VM host and compose virtual machines, but commissioning, deploying, and any other MAAS action which uses the network will fail – so yes is the correct answer here.

So should i disable creation of lxdbr0 ?

Looks like lxdbr0 is doing NAT which will not help MAAS to perform certain tasks. If I have to use VMs which are directly connected to my MAAS controlled DHCP server then which networking option I should use ?

If I try the VM creation manually in maas-project it fails saying cant download the image because of name resolution issue. But in the case of default project it just works.

root@op2:/home/ubuntu# lxc launch ubuntu:22.04 vmtest2 --vm --project maas-project 
Creating vmtest2
Error: Failed instance creation: Get "https://cloud-images.ubuntu.com/releases/server/releases/jammy/release-20231010/ubuntu-22.04-server-cloudimg-amd64-lxd.tar.xz": lookup cloud-images.ubuntu.com: Temporary failure in name resolution
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# lxc launch ubuntu:22.04 vmtest2 --vm 
Creating vmtest2
Starting vmtest2

Can you share lxc network show lxdbr0?

# lxc network show lxdbr0
config:
  ipv4.address: 10.156.54.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:7a06:141d:df22::1/64
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/vmtest
- /1.0/instances/vmtest1
- /1.0/instances/vmtest2
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

@codingfreak bridge config looks good.

  1. Do you have DHCP enabled in MAAS? Can you make a screenshot of subnets tab?
  2. You can check VM that MAAS failed to boot by trying to start it manually with --console I suspect it just fails to PXE boot

Hi @troyanov

Please find the requested information below

 Do you have DHCP enabled in MAAS? Can you make a screenshot of subnets tab?

image

image

image

You can check VM that MAAS failed to boot by trying to start it manually with `--console` I suspect it just fails to PXE boot

Weirdly that VM is still in RUNNING stage as shown below. I tried stopping the VM manually from CLI and it is off no use as the command is stuck

# lxc ls --project maas-project 
+------+---------+------+------+-----------------+-----------+
| NAME |  STATE  | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+------+---------+------+------+-----------------+-----------+
| vm01 | RUNNING |      |      | VIRTUAL-MACHINE | 0         |
+------+---------+------+------+-----------------+-----------+
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# lxc stop vm01 --project maas-project 

Logs for the same

DEBUG  [2023-10-13T21:00:27Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0 username=root
DEBUG  [2023-10-13T21:00:27Z] Handling API request                          ip=@ method=GET protocol=unix url="/1.0/events?project=maas-project" username=root
DEBUG  [2023-10-13T21:00:27Z] Event listener server handler started         id=8f587730-7d6b-4c4f-9a3a-b92255501c20 local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG  [2023-10-13T21:00:27Z] Handling API request                          ip=@ method=PUT protocol=unix url="/1.0/instances/vm01/state?project=maas-project" username=root
DEBUG  [2023-10-13T21:00:27Z] New operation                                 class=task description="Stopping instance" operation=654b3e8b-d6ca-4c54-b86f-8cead6316d60 project=maas-project
DEBUG  [2023-10-13T21:00:27Z] Instance operation lock reused                action=stop instance=vm01 project=maas-project reusable=true
DEBUG  [2023-10-13T21:00:27Z] Shutdown started                              instance=vm01 instanceType=virtual-machine project=maas-project timeout=-1s
DEBUG  [2023-10-13T21:00:27Z] Started operation                             class=task description="Stopping instance" operation=654b3e8b-d6ca-4c54-b86f-8cead6316d60 project=maas-project
DEBUG  [2023-10-13T21:00:27Z] Shutdown request sent to instance             instance=vm01 instanceType=virtual-machine project=maas-project
DEBUG  [2023-10-13T21:00:27Z] Handling API request                          ip=@ method=GET protocol=unix url="/1.0/operations/654b3e8b-d6ca-4c54-b86f-8cead6316d60?project=maas-project" username=root

# lxc info --show-log vm01 --project maas-project 
Name: vm01
Status: RUNNING
Type: virtual-machine
Architecture: x86_64
PID: 5508
Created: 2023/10/13 19:52 UTC
Last Used: 2023/10/13 19:52 UTC

Resources:
  Processes: -1
  Disk usage:
    root: 12.00KiB

Log:

# lxc exec vm01 --project maas-project -- bash 
Error: LXD VM agent isn't currently running

Some good news …

On server2 (ubuntu-server-22.04), I removed the previous VM which failed in the creation and tried creating new one from MAAS. This time it just worked. Not sure what is different from previous case.

When I am trying to launch a VM from CLI it still fails in maas-project.

root@op2:/home/ubuntu# lxc ls --project maas-project 
+------+---------+----------------------+------+-----------------+-----------+
| NAME |  STATE  |         IPV4         | IPV6 |      TYPE       | SNAPSHOTS |
+------+---------+----------------------+------+-----------------+-----------+
| vm01 | RUNNING | 10.10.10.18 (enp5s0) |      | VIRTUAL-MACHINE | 0         |
+------+---------+----------------------+------+-----------------+-----------+
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# lxc profile show default --project maas-project 
config: {}
description: Default LXD profile for project maas-project
devices: {}
name: default
used_by: []
root@op2:/home/ubuntu# lxc launch --vm ubuntu:22.04 vm02 --project maas-project 
Creating vm02
Error: Failed instance creation: Failed creating instance record: Failed initialising instance: Failed getting root disk: No root device could be found
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# lxc ls --project maas-project 
+------+---------+----------------------+------+-----------------+-----------+
| NAME |  STATE  |         IPV4         | IPV6 |      TYPE       | SNAPSHOTS |
+------+---------+----------------------+------+-----------------+-----------+
| vm01 | RUNNING | 10.10.10.18 (enp5s0) |      | VIRTUAL-MACHINE | 0         |
+------+---------+----------------------+------+-----------------+-----------+

Good to know that there is some progress! Wondering what was the difference…

I think thats fine. When MAAS triggers LXD API it also specifies network device, storage profile and something else. So I was wrong with assumptions that you can launch new VM in MAAS project manually. Instead we should start existing VM that failed (for troubleshooting).

So everything works now? Or there is still something failing?

@troyanov

Well it still does not work with server1 (running ubuntu-core-22) which I need to try again from scratch.

I do have one question though, as shown below lxdbr0 has DHCP disabled but NAT is enabled. When I create a new VM in default project, it is assigned with the ipaddress from the lxdbr0 subnet. Is this proper or DHCP is enabled in background ?

root@op2:/home/ubuntu# lxc network show lxdbr0
config:
  ipv4.address: 10.156.54.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:7a06:141d:df22::1/64
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/tesme
- /1.0/profiles/default
managed: true
status: Created
locations:
- none
root@op2:/home/ubuntu# lxc ls 
+-------+---------+-----------------------+-------------------------------------------------+-----------------+-----------+
| NAME  |  STATE  |         IPV4          |                      IPV6                       |      TYPE       | SNAPSHOTS |
+-------+---------+-----------------------+-------------------------------------------------+-----------------+-----------+
| tesme | RUNNING | 10.156.54.59 (enp5s0) | fd42:7a06:141d:df22:216:3eff:feeb:d276 (enp5s0) | VIRTUAL-MACHINE | 0         |
+-------+---------+-----------------------+-------------------------------------------------+-----------------+-----------+

@troyanov

I can see that this issue is still present in ubuntu-core-22.

I deployed a new machine with Ubuntu-core-22 [lets call SERVER3] using MAAS. Then I installed LXD manually and added the LXD host in MAAS.

codingfreak@mlin01:~$ snap list
Name       Version         Rev    Tracking       Publisher   Notes
core22     20230801        864    latest/stable  canonical✓  base
pc         22-0.3          146    22/stable      canonical✓  gadget
pc-kernel  5.15.0-86.96.1  1433   22/stable      canonical✓  kernel
snapd      2.60.4          20290  latest/stable  canonical✓  snapd
codingfreak@mlin01:~$ 
codingfreak@mlin01:~$ snap install lxd
lxd 5.18-da72b8b from Canonical✓ installed
codingfreak@mlin01:~$ 
codingfreak@mlin01:~$ 
codingfreak@mlin01:~$ snap list
Name       Version         Rev    Tracking       Publisher   Notes
core22     20230801        864    latest/stable  canonical✓  base
lxd        5.18-da72b8b    25945  latest/stable  canonical✓  -
pc         22-0.3          146    22/stable      canonical✓  gadget
pc-kernel  5.15.0-86.96.1  1433   22/stable      canonical✓  kernel
snapd      2.60.4          20290  latest/stable  canonical✓  snapd
codingfreak@mlin01:~$ 
codingfreak@mlin01:~$ sudo -s
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: lxd-storage
Name of the storage backend to use (dir, lvm, zfs, btrfs, ceph) [default=zfs]: 
Create a new ZFS pool? (yes/no) [default=yes]: 
Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]: 
Size in GiB of the new loop device (1GiB minimum) [default=30GiB]: 500GiB
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like the LXD server to be available over the network? (yes/no) [default=no]: yes
Address to bind LXD to (not including port) [default=all]: 
Port to bind LXD to [default=8443]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 
root@mlin01:/home/codingfreak# 

Now if I try to create new VM from MAAS UI it fails as shown below

This issue is present in ubuntu-core-22

I don’t remember by heart (turned off my lab), but I have a feeling that this is expected.

As we already discussed, I’d recommend to go check the following:

  1. Can you see machine in LXD CLI under MAAS project? If yes, simply try lxc start $machine --console

  2. If machine fails to start in addition you can check for any error in LXD Logs. IIRC they are under /var/snap/lxd/common/lxd/logs/

  3. Check if there are any errors in MAAS logs (especially Region controller)
    https://maas.io/docs/delving-into-maas-logging-practices#heading--Using-the-logs-directly

  4. The reason why MAAS might fail to create/start VM might be hidden in the LXD driver.
    You can try adding some debug print statements in the LXD driver responsible for VM creation.
    https://git.launchpad.net/maas/tree/src/provisioningserver/drivers/pod/lxd.py?h=3.3#n422

@troyanov

Please find the replies to your first 2 questions

root@mlin01:/home/codingfreak# lxc ls --project maas-project 
+-----------+---------+------+------+-----------------+-----------+
|   NAME    |  STATE  | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+-----------+---------+------+------+-----------------+-----------+
| mlin01vm1 | STOPPED |      |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+------+------+-----------------+-----------+
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# lxc start mlin01vm1 --console --project maas-project 
Error: Failed to start device "eth0": All virtual functions on parent device "enp113s0f0" are already in use
Try `lxc info --show-log mlin01vm1` for more info
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# lxc info --show-log mlin01vm1 --project maas-project 
Name: mlin01vm1
Status: STOPPED
Type: virtual-machine
Architecture: x86_64
Created: 2023/10/13 22:44 UTC
Error: open /var/snap/lxd/common/lxd/logs/maas-project_mlin01vm1/qemu.log: no such file or directory
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# 
root@mlin01:/home/codingfreak# cat /var/snap/lxd/common/lxd/logs/
dnsmasq.lxdbr0.log      lxd.log                 maas-project_mlin01vm1/ 
root@mlin01:/home/codingfreak# cd /var/snap/lxd/common/lxd/logs/maas-project_mlin01vm1/
root@mlin01:/var/snap/lxd/common/lxd/logs/maas-project_mlin01vm1# 
root@mlin01:/var/snap/lxd/common/lxd/logs/maas-project_mlin01vm1# 
root@mlin01:/var/snap/lxd/common/lxd/logs/maas-project_mlin01vm1# ls
root@mlin01:/var/snap/lxd/common/lxd/logs/maas-project_mlin01vm1# 
root@mlin01:/var/snap/lxd/common/lxd/logs/maas-project_mlin01vm1# 
root@mlin01:/var/snap/lxd/common/lxd/logs/maas-project_mlin01vm1# cd ..
root@mlin01:/var/snap/lxd/common/lxd/logs# ls
dnsmasq.lxdbr0.log  lxd.log  maas-project_mlin01vm1
root@mlin01:/var/snap/lxd/common/lxd/logs# 
root@mlin01:/var/snap/lxd/common/lxd/logs# cat lxd.log | more
time="2023-10-13T22:42:30Z" level=warning msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
root@mlin01:/var/snap/lxd/common/lxd/logs# 
root@mlin01:/var/snap/lxd/common/lxd/logs# 
root@mlin01:/var/snap/lxd/common/lxd/logs# cat lxd.log 
time="2023-10-13T22:42:30Z" level=warning msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"

It seems that LXD fails to create machine with the parameters that MAAS is using for VM configuration.

What is the difference between machines (the one that is created by MAAS and the one that you lauch manually? lxc config $machine

Maybe this post will help?
https://discuss.linuxcontainers.org/t/trying-to-use-sr-iov-with-mellanox-infiniband-all-virtual-functions-on-device-sriov-are-already-in-use/2338/8

The last resort would be to check what is being applied here: https://git.launchpad.net/maas/tree/src/provisioningserver/drivers/pod/lxd.py?h=3.3#n422

Hi @troyanov

In a ubuntu-core-22 machine, I cannot start a VM in maas-project either manually or via MAAS UI

# lxc launch ubuntu:22.04 --vm mm02 --project maas-project 
Creating mm02
Error: Failed instance creation: Failed creating instance record: Failed initialising instance: Failed getting root disk: No root device could be found

NOTE: maas-project is the lxd project created by MAAS UI

But I can start a VM successfully in default profile.

I am still working out on the other links which you have shared for debugging