Commissioning VM in LXD host fails

Hi All,

I have commissioned a physical server running ubuntu-core using MAAS.

Then I configured LXD on the physical server and mapped to MAAS as shown below

image

Then created a VM from MAAS in the physical server

image

Now if I try to commission the VM it fails saying some of the testcases failed

image

The reason behind the failure is VM cannot be started as shown in below logs

image

Configuration for the power type in MAAS is as shown below. It is set by the MAAS automatically when the VM is created in LXD host

image

How can i troubleshoot this scenario of why MAAS is not able to powerup the VM running in LXD host ?

UPDATE:

On the physical server I have checked respective project created by MAAS which is as shown below. I can Network for maas-project is NO. Is this the culprit ?

# lxc project ls
+-------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+
|       NAME        | IMAGES | PROFILES | STORAGE VOLUMES | STORAGE BUCKETS | NETWORKS | NETWORK ZONES |       DESCRIPTION       | USED BY |
+-------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+
| default (current) | YES    | YES      | YES             | YES             | YES      | YES           | Default LXD project     | 3       |
+-------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+
| maas-project      | YES    | YES      | YES             | YES             | NO       | NO            | Project managed by MAAS | 2       |
+-------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+

Hi @codingfreak

I just gave it a try on my dev environment and everything works.
What version of MAAS are you using?

IIUC machine got created and you see it under lxc ls --project maas-project?
MAAS should talk to LXD API the same way as it does for machine creation.

If the machine is there, then I don’t have any good ideas, except maybe to try starting it manually?
lxc start {machine name}

Also is there anything interesting in the rackd or regiond logs?

Thats okay to have “NO” there

LXD features.networks Whether to use a separate set of networks for the project

Hi @troyanov

What version of MAAS are you using?

I am using MAAS 3.3.4

Yes I see the machine under maas-project as shown below

# lxc ls --project maas-project
+--------+---------+------+------+-----------------+-----------+
|  NAME  |  STATE  | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+--------+---------+------+------+-----------------+-----------+
| test01 | STOPPED |      |      | VIRTUAL-MACHINE | 0         |
+--------+---------+------+------+-----------------+-----------+
| testVM | STOPPED |      |      | VIRTUAL-MACHINE | 0         |
+--------+---------+------+------+-----------------+-----------+

Well if i try to manually start the machine I am seeing following errors

# lxc start testVM --project maas-project 
Error: Failed to start device "eth0": Failed getting IOMMU group for VF device "0000:71:02.0": lstat /sys/bus/pci/devices/0000:71:02.0/iommu_group: no such file or directory
Try `lxc info --show-log testVM` for more info
#
# lxc info --show-log testVM
Error: Instance not found

Also the default profile for maas-project is as shown below where there is no configuration for network.

# lxc profile show default  --project maas-project
config: {}
description: Default LXD profile for project maas-project
devices: {}
name: default
used_by: []

Oh, it seems that there might be something with the LXD itself.
I guess if you create a new machine manually it will fail with the same issue?

@codingfreak

Sorry I forgot to mention which command to use: lxc launch ubuntu:jammy dummy --vm --storage default --console

Do you mean under project “maas-project” which is created by maas ? Yes it failed as shown below

# sudo lxc launch ubuntu:jammy dummy --vm --project maas-project 
Creating dummy
Error: Failed instance creation: Failed creating instance record: Failed initialising instance: Failed getting root disk: No root device could be found

But worked fine under project “default” as shown below

# sudo lxc launch ubuntu:jammy dummy --vm --project default 
Creating dummy
Starting dummy
# lxc ls --project default 
+-------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
| NAME  |  STATE  |          IPV4          |                      IPV6                       |      TYPE       | SNAPSHOTS |
+-------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
| dummy | RUNNING | 10.231.47.151 (enp5s0) | fd42:a037:1fd4:913b:216:3eff:fee0:5b8f (enp5s0) | VIRTUAL-MACHINE | 0         |
+-------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
| mm01  | RUNNING | 10.231.47.29 (enp5s0)  | fd42:a037:1fd4:913b:216:3eff:fec2:c06f (enp5s0) | VIRTUAL-MACHINE | 0         |
+-------+---------+------------------------+-------------------------------------------------+-----------------+-----------+

You should also add --project maas-project

@troyanov

# lxc info --show-log testVM --project maas-project 
Name: testVM
Status: STOPPED
Type: virtual-machine
Architecture: x86_64
Created: 2023/10/10 12:19 PDT
Error: open /var/snap/lxd/common/lxd/logs/maas-project_testVM/qemu.log: no such file or directory

@codingfreak from what I see, MAAS is able to talk to LXD API, however it might be something wrong with an LXD configuration itself.

Can you please collect some logs and output of what happens when you try to create a VM via lxc?

I mean command like this lxc launch ubuntu:jammy dummy --vm --project maas-project should work and the error you’ve posted earlier makes me wonder if LXD was configured correctly.

No root device could be found

Might be that storage pool was not initialized properly.

@troyanov

As i was mentioning in my previous replies, I was suspecting the profile of maas-project is incomplete and it might be resulting in this error. As shown in below logs, other projects like default and client1-iso-project are working fine.

If the maas-project is the lxc project which is automatically created by the MAAS UI in the server, does user need to explicitly modify the same to make it work ?

# lxc project ls 
+-------------------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+
|             NAME              | IMAGES | PROFILES | STORAGE VOLUMES | STORAGE BUCKETS | NETWORKS | NETWORK ZONES |       DESCRIPTION       | USED BY |
+-------------------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+
| client1-iso-project (current) | YES    | YES      | YES             | YES             | NO       | NO            |                         | 9       |
+-------------------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+
| default                       | YES    | YES      | YES             | YES             | YES      | YES           | Default LXD project     | 3       |
+-------------------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+
| maas-project                  | YES    | YES      | YES             | YES             | NO       | NO            | Project managed by MAAS | 4       |
+-------------------------------+--------+----------+-----------------+-----------------+----------+---------------+-------------------------+---------+
# 
# lxc ls --project client1-iso-project
+---------------+---------+-----------------------+-------------------------------------------------+-----------------+-----------+
|     NAME      |  STATE  |         IPV4          |                      IPV6                       |      TYPE       | SNAPSHOTS |
+---------------+---------+-----------------------+-------------------------------------------------+-----------------+-----------+
| debian12      | RUNNING | 10.231.47.64 (eth0)   | fd42:a037:1fd4:913b:216:3eff:fe11:8897 (eth0)   | CONTAINER       | 0         |
+---------------+---------+-----------------------+-------------------------------------------------+-----------------+-----------+
| rocky9        | RUNNING | 10.231.47.23 (enp5s0) | fd42:a037:1fd4:913b:d51d:894:e79b:5a21 (enp5s0) | VIRTUAL-MACHINE | 0         |
+---------------+---------+-----------------------+-------------------------------------------------+-----------------+-----------+
| ubuntulobster | RUNNING | 10.231.47.81 (enp5s0) | fd42:a037:1fd4:913b:216:3eff:fea7:86bb (enp5s0) | VIRTUAL-MACHINE | 0         |
+---------------+---------+-----------------------+-------------------------------------------------+-----------------+-----------+
# 
# lxc ls --project default
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
root@mltr01:/home/codingfreak# 
root@mltr01:/home/codingfreak# lxc ls --project maas-project
+--------+---------+------+------+-----------------+-----------+
|  NAME  |  STATE  | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+--------+---------+------+------+-----------------+-----------+
| test01 | STOPPED |      |      | VIRTUAL-MACHINE | 0         |
+--------+---------+------+------+-----------------+-----------+
| testVM | STOPPED |      |      | VIRTUAL-MACHINE | 0         |
+--------+---------+------+------+-----------------+-----------+
# 
# lxc profile show default --project maas-project
config: {}
description: Default LXD profile for project maas-project
devices: {}
name: default
used_by: []
# 
# lxc profile show default --project client1-iso-project
config: {}
description: Default LXD profile for project client1-iso-project
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: lxd-storage
    type: disk
name: default
used_by:
- /1.0/instances/ubuntulobster?project=client1-iso-project
- /1.0/instances/rocky9?project=client1-iso-project
- /1.0/instances/debian12?project=client1-iso-project
# 
# lxc profile show default --project default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: lxd-storage
    type: disk
name: default
used_by: []

So i modified the profile under maas-project accordingly as shown below and then I am able to manually launch new instances

# lxc launch ubuntu:22.04 webserver --project maas-project 
Creating webserver
Starting webserver
#
# lxc ls --project maas-project 
+-----------+---------+---------------------+-----------------------------------------------+-----------+-----------+
|   NAME    |  STATE  |        IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |
+-----------+---------+---------------------+-----------------------------------------------+-----------+-----------+
| webserver | RUNNING | 10.231.47.89 (eth0) | fd42:a037:1fd4:913b:216:3eff:fe8d:a93a (eth0) | CONTAINER | 0         |
+-----------+---------+---------------------+-----------------------------------------------+-----------+-----------+
#
# lxc profile show default --project maas-project 
config: {}
description: Default LXD profile for project maas-project
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: lxd-storage
    type: disk
name: default
used_by:
- /1.0/instances/webserver?project=maas-project

Now if I am try to create new VM from MAAS it fails as VM is getting created with empty mac-address and ends up in broken stage

So you used “Install KVM” option during machine deployment or you manually installed LXD and then added as an existing KVM?

I tried both options on 3.3.4 earlier today and it worked well. However I was not using Ubuntu Core.

@troyanov

Well I had installed LXD manually in the physical server which is deployed by MAAS with ubuntu-core-22. Then in MAAS I had used “Add LXD host” option to add the same in the MAAS.

Did you take all the steps as mentioned in the docs?

I am out of ideas what else could be wrong.
To me it feels like LXD init was executed after the project was created, so it is not picking correct values.

That might be because the default profile used for VM creation is missing network devices and could be related to the issue with IOMMU group you mentioned earlier.

Hi @troyanov

As explained in my previous reply, I have modified default profile of maas-project to match with default profile in default project. Now I am able to manually launch VM in maas-project but not from MAAS.

I tried deleting and adding back the LXD-HOST to MAAS mapping to the same project i.e. maas-project. Now if I create a vm using MAAS it still fails with empty mac-address.

Is there a show command in lxd which can dump configurations set during lxd init ? this might help in figuring out what caused an issue ?

Hi @codingfreak

To be fair I don’t know if thats possible, thats why I am trying to understand how exactly you configured LXD and if you followed the documentation or there was something else.

Some more ideas:

  1. Maybe something interesting pops up while running lxc monitor --pretty?
  2. What is being returned by lxc config show $broken_vm --project maas-project and lxc network show $your_bridge_network?

Thats what I have in my env, just for a comparison:

❯ lxc profile show default --project maas-kvm
config: {}
description: Default LXD profile for project maas-kvm
devices: {}
name: default
used_by: []

❯ lxc config show great-boar --project maas-kvm
architecture: x86_64
config:
  limits.cpu: "1"
  limits.memory: "2147483648"
  limits.memory.hugepages: "false"
  security.secureboot: "false"
  volatile.cloud-init.instance-id: 0ef585c0-32d5-425f-8baf-ec9a21567f1a
  volatile.eth0.hwaddr: 00:16:3e:6f:f9:1f
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 4eedc51f-bc27-4b72-a0cb-c4e8e2ba57b1
  volatile.uuid.generation: 4eedc51f-bc27-4b72-a0cb-c4e8e2ba57b1
  volatile.vsock_id: "2067760905"
devices:
  eth0:
    boot.priority: "1"
    name: eth0
    nictype: bridged
    parent: maas-net
    type: nic
  root:
    boot.priority: "0"
    path: /
    pool: default
    size: "8000000000"
    type: disk
ephemeral: false
profiles: []
stateful: false
description: ""

The reason why MAAS might fail to create/start VM might be hidden in the LXD driver.

  1. You can try adding some debug print statements in the LXD driver responsible for VM creation.
    https://git.launchpad.net/maas/tree/src/provisioningserver/drivers/pod/lxd.py?h=3.3#n422
  2. Follow the logic how interfaces are picked up (there is a certain order of preference)
    It might be that you don’t have a bridge (it should be created) and IOMMU feels like an error from SRIOV, which is a next prefered type.
        attach_preference = [
            InterfaceAttachType.BRIDGE,
            InterfaceAttachType.SRIOV,
            InterfaceAttachType.NETWORK,
            InterfaceAttachType.MACVLAN,
        ]

Hi @troyanov

I tried on a newly deployed server which is running ubuntu-server 22.04 using MAAS.
I have initialized lxd init with below inputs

# lxd init 
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: lxd-storage
Name of the storage backend to use (cephobject, dir, lvm, zfs, btrfs, ceph) [default=zfs]: 
Create a new ZFS pool? (yes/no) [default=yes]: 
Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]: 
Size in GiB of the new loop device (1GiB minimum) [default=30GiB]: 500GiB
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like the LXD server to be available over the network? (yes/no) [default=no]: yes
Address to bind LXD to (not including port) [default=all]: 
Port to bind LXD to [default=8443]: 
Trust password for new clients: 
Again: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 

Then I added it as LXD host in MAAS creating maas-project from MAAS UI.
As shown below default profile in maas-project is empty

# lxc project ls
+-------------------+--------+----------+-----------------+----------+-------------------------+---------+
|       NAME        | IMAGES | PROFILES | STORAGE VOLUMES | NETWORKS |       DESCRIPTION       | USED BY |
+-------------------+--------+----------+-----------------+----------+-------------------------+---------+
| default (current) | YES    | YES      | YES             | YES      | Default LXD project     | 2       |
+-------------------+--------+----------+-----------------+----------+-------------------------+---------+
| maas-project      | YES    | YES      | YES             | NO       | Project managed by MAAS | 2       |
+-------------------+--------+----------+-----------------+----------+-------------------------+---------+
root@op2:/home/ubuntu# 
root@op2:/home/ubuntu# lxc profile show default --project maas-project 
config: {}
description: Default LXD profile for project maas-project
devices: {}
name: default
used_by: []

Now I tried creating VM from MAAS and it is created properly and stuck in commissioning stage as shown below

In server I can see that VM is in running stage

# lxc ls --project maas-project 
+------+---------+------+------+-----------------+-----------+
| NAME |  STATE  | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+------+---------+------+------+-----------------+-----------+
| vm01 | RUNNING |      |      | VIRTUAL-MACHINE | 0         |
+------+---------+------+------+-----------------+-----------+

Well better than the previous scenario with ubuntu-core-22 but still failing.

It takes some time to commission. For how long it is being stuck?
Is there anything under the logs tab?

@troyanov

Well its been 10 minutes i guess and nothing much under logs other than powered on

@troyanov

Well in my server2 (ubuntu-server 22.04) if I try to manually launch a container in maas-project it fails as shown below

# lxc launch ubuntu:22.04 vmtest2 --project maas-project 
Creating vmtest2
Error: Failed instance creation: Failed creating instance record: Failed initialising instance: Failed getting root disk: No root device could be found

Logs from lxc monitor

DEBUG  [2023-10-13T20:11:15Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0 username=root
DEBUG  [2023-10-13T20:11:15Z] Handling API request                          ip=@ method=GET protocol=unix url="/1.0/events?project=maas-project" username=root
DEBUG  [2023-10-13T20:11:15Z] Event listener server handler started         id=3690f3c9-d41c-4515-9db5-333a83c476a7 local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG  [2023-10-13T20:11:15Z] Handling API request                          ip=@ method=POST protocol=unix url="/1.0/instances?project=maas-project" username=root
DEBUG  [2023-10-13T20:11:15Z] Responding to instance create                
DEBUG  [2023-10-13T20:11:15Z] Started operation                             class=task description="Creating instance" operation=8cbca1cd-b0d6-4774-999c-039f11f7e205 project=maas-project
DEBUG  [2023-10-13T20:11:15Z] New operation                                 class=task description="Creating instance" operation=8cbca1cd-b0d6-4774-999c-039f11f7e205 project=maas-project
DEBUG  [2023-10-13T20:11:15Z] Connecting to a remote simplestreams server   URL="https://cloud-images.ubuntu.com/releases"
DEBUG  [2023-10-13T20:11:15Z] Handling API request                          ip=@ method=GET protocol=unix url="/1.0/operations/8cbca1cd-b0d6-4774-999c-039f11f7e205?project=maas-project" username=root
DEBUG  [2023-10-13T20:11:16Z] Acquiring lock for image                      fingerprint=b948dd91cd5a8da89f6dcd4949d7189f064cf6d4dc5bd70b7f9b7aff1883babf
DEBUG  [2023-10-13T20:11:16Z] Lock acquired for image                       fingerprint=b948dd91cd5a8da89f6dcd4949d7189f064cf6d4dc5bd70b7f9b7aff1883babf
DEBUG  [2023-10-13T20:11:16Z] Lock acquired for image                       fingerprint=b948dd91cd5a8da89f6dcd4949d7189f064cf6d4dc5bd70b7f9b7aff1883babf
DEBUG  [2023-10-13T20:11:16Z] Image already exists in the DB                fingerprint=b948dd91cd5a8da89f6dcd4949d7189f064cf6d4dc5bd70b7f9b7aff1883babf
DEBUG  [2023-10-13T20:11:16Z] Acquiring lock for image                      fingerprint=b948dd91cd5a8da89f6dcd4949d7189f064cf6d4dc5bd70b7f9b7aff1883babf
DEBUG  [2023-10-13T20:11:16Z] Instance operation lock created               action=create instance=vmtest2 project=maas-project reusable=false
ERROR  [2023-10-13T20:11:16Z] Failed initialising instance                  err="Failed getting root disk: No root device could be found" instance=vmtest2 project=maas-project type=container
INFO   [2023-10-13T20:11:16Z] Creating instance                             ephemeral=false instance=vmtest2 instanceType=container project=maas-project
DEBUG  [2023-10-13T20:11:16Z] Failure for operation                         class=task description="Creating instance" err="Failed creating instance record: Failed initialising instance: Failed getting root disk: No root device could be found" operation=8cbca1cd-b0d6-4774-999c-039f11f7e205 project=maas-project
DEBUG  [2023-10-13T20:11:16Z] Instance operation lock finished              action=create err="Failed getting root disk: No root device could be found" instance=vmtest2 project=maas-project reusable=false
DEBUG  [2023-10-13T20:11:16Z] Event listener server handler stopped         listener=3690f3c9-d41c-4515-9db5-333a83c476a7 local=/var/snap/lxd/common/lxd/unix.socket remote=@

Default profile in maas-project

# lxc profile show default --project maas-project 
config: {}
description: Default LXD profile for project maas-project
devices: {}
name: default
used_by: []

# lxc ls --project maas-project 
+------+---------+------+------+-----------------+-----------+
| NAME |  STATE  | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+------+---------+------+------+-----------------+-----------+
| vm01 | RUNNING |      |      | VIRTUAL-MACHINE | 0         |
+------+---------+------+------+-----------------+-----------+

Commissioning failed