VM host networking (snap/2.9/UI)

2.7 2.8 2.9 3.0
Snap CLI ~ UI CLI ~ UI CLI ~ UI CLI ~ UI
Packages CLI ~ UI CLI ~ UI CLI ~ UI CLI ~ UI

In order to deploy a VM host in your MAAS network, you first need to set up a bridge to connect between your VM host and MAAS itself. This section explains several ways of accomplishing this.

Five questions you may have:

  1. How do I set up a VM host bridge with the web UI?
  2. How do I set up a VM host bridge with netplan?
  3. How do I set up a VM host bridge with libvirt?
  4. How do I set up SSH for use by libvirt?
  5. How do I make LXD available for hosting?

LXD sets up a bridge as part of the initialisation process; note that you will have to perform a couple of additional steps to prevent LXD from offering DHCP, which will interfere with the normal operation of MAAS. Everything you need to know is described at the initialisation link.

To enable VM host networking features, MAAS must match the VM host IP address of a potential VM host with a known device (a machine or controller). For example, if a machine not known to MAAS is set up as a VM host, enhanced interface selection features will not be available.

It’s essential to enforce usage of IP addresses to avoid domain name conflicts, should different controllers resolve the same domain name with different IP addresses. You should also avoid using 127.0.0.1 when running multiple controllers, as it would confuse MAAS.

Set up VM host bridge with MAAS UI

You can use the MAAS UI to configure a bridge to connect a VM host to MAAS:

Select the machine you want to use as a VM host, switch to the “Network” tab. Select the network where you want to create the bridge and click “Create bridge:”

Configure the bridge on a subnet MAAS controls. You may use any IP mode for the bridge:

When you’re done, it should look something like this:

Then you can deploy Ubuntu.

Use netplan to configure a bridge

You can also use netplan to configure a VM host bridge:

Open your netplan configuration file. This should be in /etc/netplan. It could be called 50-cloud-init.yaml, netplan.yaml, or something else. Modify the file to add a bridge, using the example below to guide you:

network:
    bridges:
        br0:
            addresses:
            - 10.0.0.101/24
            gateway4: 10.0.0.1
            interfaces:
            - enp1s0
            macaddress: 52:54:00:39:9d:f9
            mtu: 1500
            nameservers:
                addresses:
                - 10.0.0.2
                search:
                - maas
            parameters:
                forward-delay: 15
                stp: false
    ethernets:
        enp1s0:
            match:
                macaddress: 52:54:00:39:9d:f9
            mtu: 1500
            set-name: enp1s0
        enp2s0:
            match:
                macaddress: 52:54:00:df:87:ac
            mtu: 1500
            set-name: enp2s0
        enp3s0:
            match:
                macaddress: 52:54:00:a7:ac:46
            mtu: 1500
            set-name: enp3s0
    version: 2

Apply the new configuration with netplan apply.

Use libvirt to configure a bridge

It is also possible to use libvirt to configure a virtual bridge. This method will also work for LXD VM hosts running on Ubuntu. Be aware that other methods may be required if you are configuring LXD on an OS other than Ubuntu.

By default, libvirt creates a virtual bridge, virbr0, through which VMs communicate with each other and the Internet. DHCP, supplied by libvirt, automatically assigns an IP address to each VM. However, to enable network booting in MAAS, you’ll need to provide DHCP in MAAS and either:

  1. Disable DHCP on libvirt’s default network, or
  2. Create a new libvirt network maas with DHCP disabled.

You can set up such a maas network like this:

cat << EOF > maas.xml
<network>
 <name>maas</name>
 <forward mode='nat'>
   <nat>
     <port start='1024' end='65535'/>
   </nat>
 </forward>
 <dns enable="no" />
 <bridge name='virbr1' stp='off' delay='0'/>
 <domain name='testnet'/>
 <ip address='172.16.99.1' netmask='255.255.255.0'>
 </ip>
</network>
EOF
virsh net-define maas.xml

Note that this network also has NAT port forwarding enabled to allow VMs to communicate with the Internet at large. Port forwarding is very useful in test environments.

Set up SSH

For MAAS to successfully communicate with libvirt on your VM host machine – whether you’re running from snap or package, or running rack controllers in LXD containers or on localhost – this example command must succeed from every rack controller:

virsh -c qemu+ssh://$USER@$VM_HOST_IP/system list --all

Here, $USER is a user on your VM host who is a member of the libvirtd Unix group on the VM host, and $VM_HOST_IP is the IP of your VM host. Note that insufficient permissions for $USER may cause the virsh command to fail with an error such as failed to connect to the hypervisor. Check the $USER group membership to make sure $USER is a member of the libvirtd group.

Set up SSH (libvirt only)

If you installed MAAS via snap, then create the needed SSH keys this way:

sudo mkdir -p /var/snap/maas/current/root/.ssh
cd /var/snap/maas/current/root/.ssh
sudo ssh-keygen -f id_rsa

Finally, you’ll need to add id_rsa.pub to the authorized_keys file in /home/<vm-host-user-homedir-name>/.ssh/, where <vm-host-user-homedir-name> is the name of your VM host user.

Make LXD available for VM hosting

Assuming that you want to use LXD VM hosts, you need to install the correct version of LXD. Prior to the release of Ubuntu 20.04 LXD was installed using Debian packages. The Debian packaged version of LXD is too old to use with MAAS. If this is the case, you’ll need to remove the LXD Debian packages and install the Snap version. Note that you cannot install both Debian and snap versions, as this creates a conflict.

Removing older versions of LXD

If you’re on a version of Ubuntu older than 20.04, or you have the Debian version of LXD, start the uninstall process with the following command:

sudo apt-get purge -y *lxd* *lxc*

This command should result in output that looks something like this:

Reading package lists... Done
Building dependency tree      
Reading state information... Done
Note, selecting 'lxde-core' for glob '*lxd*'
Note, selecting 'python-pylxd-doc' for glob '*lxd*'
Note, selecting 'python3-pylxd' for glob '*lxd*'
Note, selecting 'python-nova-lxd' for glob '*lxd*'
Note, selecting 'lxde-common' for glob '*lxd*'
Note, selecting 'lxde-icon-theme' for glob '*lxd*'
Note, selecting 'lxde-settings-daemon' for glob '*lxd*'
Note, selecting 'lxde' for glob '*lxd*'
Note, selecting 'lxdm' for glob '*lxd*'
Note, selecting 'lxd' for glob '*lxd*'
Note, selecting 'lxd-tools' for glob '*lxd*'
Note, selecting 'python-pylxd' for glob '*lxd*'
Note, selecting 'lxdm-dbg' for glob '*lxd*'
Note, selecting 'lxde-session' for glob '*lxd*'
Note, selecting 'nova-compute-lxd' for glob '*lxd*'
Note, selecting 'openbox-lxde-session' for glob '*lxd*'
Note, selecting 'python-nova.lxd' for glob '*lxd*'
Note, selecting 'lxd-client' for glob '*lxd*'
Note, selecting 'openbox-lxde-session' instead of 'lxde-session'
Note, selecting 'lxctl' for glob '*lxc*'
Note, selecting 'lxc-common' for glob '*lxc*'
Note, selecting 'python3-lxc' for glob '*lxc*'
Note, selecting 'libclxclient-dev' for glob '*lxc*'
Note, selecting 'lxc-templates' for glob '*lxc*'
Note, selecting 'lxc1' for glob '*lxc*'
Note, selecting 'lxc-dev' for glob '*lxc*'
Note, selecting 'lxc' for glob '*lxc*'
Note, selecting 'liblxc1' for glob '*lxc*'
Note, selecting 'lxc-utils' for glob '*lxc*'
Note, selecting 'vagrant-lxc' for glob '*lxc*'
Note, selecting 'libclxclient3' for glob '*lxc*'
Note, selecting 'liblxc-dev' for glob '*lxc*'
Note, selecting 'nova-compute-lxc' for glob '*lxc*'
Note, selecting 'python-lxc' for glob '*lxc*'
Note, selecting 'liblxc-common' for glob '*lxc*'
Note, selecting 'golang-gopkg-lxc-go-lxc.v2-dev' for glob '*lxc*'
Note, selecting 'lxcfs' for glob '*lxc*'
Note, selecting 'liblxc-common' instead of 'lxc-common'
Package 'golang-gopkg-lxc-go-lxc.v2-dev' is not installed, so not removed
Package 'libclxclient-dev' is not installed, so not removed
Package 'libclxclient3' is not installed, so not removed
Package 'lxc-templates' is not installed, so not removed
Package 'lxctl' is not installed, so not removed
Package 'lxde' is not installed, so not removed
Package 'lxde-common' is not installed, so not removed
Package 'lxde-core' is not installed, so not removed
Package 'lxde-icon-theme' is not installed, so not removed
Package 'lxde-settings-daemon' is not installed, so not removed
Package 'lxdm' is not installed, so not removed
Package 'lxdm-dbg' is not installed, so not removed
Package 'openbox-lxde-session' is not installed, so not removed
Package 'python-lxc' is not installed, so not removed
Package 'python3-lxc' is not installed, so not removed
Package 'vagrant-lxc' is not installed, so not removed
Package 'liblxc-dev' is not installed, so not removed
Package 'lxc-dev' is not installed, so not removed
Package 'nova-compute-lxc' is not installed, so not removed
Package 'nova-compute-lxd' is not installed, so not removed
Package 'python-nova-lxd' is not installed, so not removed
Package 'python-pylxd' is not installed, so not removed
Package 'python-pylxd-doc' is not installed, so not removed
Package 'lxc' is not installed, so not removed
Package 'lxc-utils' is not installed, so not removed
Package 'lxc1' is not installed, so not removed
Package 'lxd-tools' is not installed, so not removed
Package 'python-nova.lxd' is not installed, so not removed
Package 'python3-pylxd' is not installed, so not removed
The following packages were automatically installed and are no longer required:
  dns-root-data dnsmasq-base ebtables libuv1 uidmap xdelta3
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
  liblxc-common* liblxc1* lxcfs* lxd* lxd-client*
0 upgraded, 0 newly installed, 5 to remove and 21 not upgraded.
pAfter this operation, 34.1 MB disk space will be freed.
(Reading database ... 67032 files and directories currently installed.)
Removing lxd (3.0.3-0ubuntu1~18.04.1) ...
Removing lxd dnsmasq configuration
Removing lxcfs (3.0.3-0ubuntu1~18.04.2) ...
Removing lxd-client (3.0.3-0ubuntu1~18.04.1) ...
Removing liblxc-common (3.0.3-0ubuntu1~18.04.1) ...
Removing liblxc1 (3.0.3-0ubuntu1~18.04.1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
(Reading database ... 66786 files and directories currently installed.)
Purging configuration files for liblxc-common (3.0.3-0ubuntu1~18.04.1) ...
Purging configuration files for lxd (3.0.3-0ubuntu1~18.04.1) ...
Purging configuration files for lxcfs (3.0.3-0ubuntu1~18.04.2) ...
Processing triggers for systemd (237-3ubuntu10.40) ...
Processing triggers for ureadahead (0.100.0-21) ...

You should also autoremove packages no longer needed by LXD:

$ sudo apt-get autoremove -y

Output from this command should be similar to:

Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following packages will be REMOVED:
  dns-root-data dnsmasq-base ebtables libuv1 uidmap xdelta3
0 upgraded, 0 newly installed, 6 to remove and 21 not upgraded.
After this operation, 1860 kB disk space will be freed.
(Reading database ... 66769 files and directories currently installed.)
Removing dns-root-data (2018013001) ...
Removing dnsmasq-base (2.79-1) ...
Removing ebtables (2.0.10.4-3.5ubuntu2.18.04.3) ...
Removing libuv1:amd64 (1.18.0-3) ...
Removing uidmap (1:4.5-1ubuntu2) ...
Removing xdelta3 (3.0.11-dfsg-1ubuntu1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...

Now install LXD from the Snap:

$ sudo snap install lxd
2020-05-20T22:02:57Z INFO Waiting for restart...
lxd 4.1 from Canonical✓ installed

Refreshing LXD on 20.04

If you are on 20.04 or above LXD should be installed by default, but it’s a good idea to make sure it’s up to date:

$ sudo snap refresh
All snaps up to date.

Initialise LXD prior to use

Once LXD is installed it needs to be configured with lxd init before first use:

$ sudo lxd init

Your interactive output should look something like the following. Note a few points important points about these questions:

  1. Would you like to use LXD clustering? (yes/no) [default=no]: no - MAAS does not currently support LXD clusters.

  2. Name of the storage back-end to use (btrfs, dir, lvm, zfs, ceph) [default=zfs]: dir - testing has primarily been with dir; other options should work, but less testing has been done, so use at your own risk.

  3. Would you like to connect to a MAAS server? (yes/no) [default=no]: no - When LXD is connected to MAAS containers or virtual machines created by LXD will be automatically added to MAAS as devices. This feature should work, but has limited testing thus far.

  4. Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: yes - The bridge LXD creates is isolated and not managed by MAAS. If this bridge is used, you would be able to add the LXD VM host and compose virtual machines, but commissioning, deploying, and any other MAAS action which uses the network will fail – so yes is the correct answer here.

  5. Name of the existing bridge or host interface: br0 - br0 is the name of the bridge the user configured (see sections above) which is connected to a MAAS-managed network.

  6. Trust password for new clients: - This is the password the user will enter when connecting with MAAS.

Would you like to use LXD clustering? (yes/no) [default=no]: no
Do you want to configure a new storage pool? (yes/no) [default=yes]: yes
Name of the new storage pool [default=default]:  
Name of the storage back-end to use (btrfs, dir, lvm, zfs, ceph) [default=zfs]: dir
Would you like to connect to a MAAS server? (yes/no) [default=no]: no
Would you like to create a new local network bridge? (yes/no) [default=yes]: no
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: yes
Name of the existing bridge or host interface: br0
Would you like LXD to be available over the network? (yes/no) [default=no]: yes
pAddress to bind LXD to (not including port) [default=all]:
Port to bind LXD to [default=8443]:
Trust password for new clients:
Again:
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

After initialising LXD, you will also want to make sure that LXD is not trying to provide DHCP for the new local network bridge. You can check this with the following command:

lxc network show lxdbr0

If you didn’t accept the default bridge name (lxdbr0), substitute your name for that new bridge in the command above. This will produce output something like this:

config:
  dns.mode: managed
  ipv4.address: 10.146.214.1/24
  ipv4.dhcp: "true"
  ipv4.nat: "true"
  ipv6.address: fd42:c560:ee59:bb2::1/64
  ipv6.dhcp: "true"
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

There is a quick tutorial on the possible settings here. For simplicity, to turn off LXD-provided DHCP, you need to change three settings, as follows:

lxc network set lxdbr0 dns.mode=none
lxc network set lxdbr0 ipv4.dhcp=false
lxc network set lxdbr0 ipv6.dhcp=false

You can check your work by repeating the show command:

$ lxc network show lxdbr0
config:
  dns.mode: none
  ipv4.address: 10.146.214.1/24
  ipv4.dhcp: "false"
  ipv4.nat: "true"
  ipv6.address: fd42:c560:ee59:bb2::1/64
  ipv6.dhcp: "false"
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

Once that’s done, the LXD host is now ready to be added to MAAS as an LXD VM host. Upon adding the VM host, its own commissioning information will be refreshed.

When composing a virtual machine with LXD, MAAS uses either the ‘maas’ LXD profile, or (if that doesn’t exist) the ‘default’ LXD profile. The profile is used to determine which bridge to use. Users may also add additional LXD options to the profile which are not yet supported in MAAS.

Hi @billwear. I’ve managed to get through this entire process with some easy success. The only issue i face is not being able to turn on the LXD VM when they are powered off. Even running from the host machine and trying to lxc start [vm name] does not work. The entire reason the containers are stopped or powered off (both attempted) is because im running the lxc command to config device add [vm name] gpu gpu id=gpu_num. Any idea for a work around? Any help is appreciated :slight_smile:

dan

@dandevslack, i’m not seeing that. for example, here is the sequence i just tried on my own lxd maas vm farm:

stormrider@wintermute:~$ lxc list
+-----------+---------+-----------------------+------+-----------------+-----------+
|   NAME    |  STATE  |         IPV4          | IPV6 |      TYPE       | SNAPSHOTS |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-1  | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-2  | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-3  | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-4  | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-5  | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-6  | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-7  | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-8  | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-10 | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-11 | STOPPED |                       |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| lxd-vm-12 | RUNNING | 10.124.141.4 (enp5s0) |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| u1        | RUNNING | 10.124.141.191 (eth0) |      | CONTAINER       | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
| u2        | RUNNING | 10.124.141.190 (eth0) |      | CONTAINER       | 0         |
+-----------+---------+-----------------------+------+-----------------+-----------+
stormrider@wintermute:~$ lxc start lxd-vm-1
stormrider@wintermute:~$ lxc list
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
|   NAME    |  STATE  |         IPV4          |                     IPV6                     |      TYPE       | SNAPSHOTS |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-1  | RUNNING |                       | fd42:ed9:1e81:bd10:216:3eff:fee2:5225 (eth0) | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-2  | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-3  | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-4  | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-5  | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-6  | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-7  | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-8  | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-10 | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-11 | STOPPED |                       |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| lxd-vm-12 | RUNNING | 10.124.141.4 (enp5s0) |                                              | VIRTUAL-MACHINE | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| u1        | RUNNING | 10.124.141.191 (eth0) |                                              | CONTAINER       | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
| u2        | RUNNING | 10.124.141.190 (eth0) |                                              | CONTAINER       | 0         |
+-----------+---------+-----------------------+----------------------------------------------+-----------------+-----------+
stormrider@wintermute:~$ lxc exec lxd-vm-1 bash
root@lxd-vm-1:~# ls
curtin-install-cfg.yaml  curtin-install.log  snap
root@lxd-vm-1:~# 

Can you expand on “not working?” What’s it do when you try to power it on, e.g…, what error are you getting? Anything in the LXD log? Can you tell me what version of lxd you’re running? Are you trying to modify deployed machines?

Realize, of course, since the VMs are set to receive IP from DHCP, and MAAS isn’t aware that it’s booted on, it isn’t necessarily going to (immediately) have an IP address, nor should you try to manipulate it from MAAS while you’re making the changes:

Thanks much for the response @billwear I can definitely provide more context. I’ve “powered off” the vm “s2” (power on and off several times with no issue). Then i run the command lxc config device add s2 gpu gpu. After that, the vm is entirely unresponsive.

When attempting to launch the vm from the host machine, i get the following reported error:

lxc start s2
Error: virtiofsd failed to bind socket within 10s Try `lxc info --show-log s2` for more info  

lxc info --show-log s2 returns:

Name: s2  
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/01/13 03:46 UTC
Status: Stopped
Type: virtual-machine
Profiles: default
Error: open /var/snap/lxd/common/lxd/logs/s2/qemu.log: no such file or directory

I’ve reproduced the unresponsive LXD VM on two other machines with NVIDIA cards as well. Again, all is working fine right up to the point where the GPU is added. LXD will not allow me to add a GPU if the VM is running, thus the reason for stopping and powering down. Thanks again for this great system and the responses!

i try to execute the command when powered on but get the following:

lxc config device add s2-gpu gpu gpu
Error: Failed to add device "gpu": Device cannot be added when instance is running 

what error message do you get when you try adding a GPU when a VM is running?

i get this error message:

Error: Failed to add device "gpu": Device cannot be added when instance is running

have you seen this page? it may not help, but you never know…

I did try that, but ill give it another whirl. Who knows, maybe I just missed a step.

Oddly enough, lxc containers work just fine. It is the VMs that refuse to work.

i’ll fiddle with it today. maybe i can figure it out. don’t let that stop you from experimenting. :slight_smile:

i tried it. adding a gpu to a vm and starting the vm crashes my laptop catastrophically, whether i’m starting it with lxd start <name> or by turning it on with MAAS. i’ll file a bug and post the link to the bug here.

Thanks much! Always lovely to know that the issue isnt isolate to just me :slight_smile:

so it turns out it’s not a bug, actually; we were asking LXD to do something the wrong way. still trying the get the correct process from the designer, but advice so far goes like this:

"For containers, GPUs are shared by just exposing the GPU device nodes in a container. For VMs, GPU need to be passed through, that is, detached from the host system and attached to the VM over PCIe.

the config you used tells LXD to detach all GPUs from your system and attach them all to the VM, which it did.

To use a GPU device you effectively need:

  • A dedicated GPU which is NOT used by anything (especially not a desktop environment)
  • A motherboard which supports IOMMU grouping with that feature enabled in the firmware
  • The system be booted with the relevant kernel boot options for IOMMUs (I believe MAAS does that automatically when deploying a KVM host though)
  • Non-polluted IOMMU groups (the GPU should be alone in its group)

This usually disqualifies just about every single piece of consumer grade hardware

as most consumer grade CPUs lack sufficient PCIe lanes to run every device on a dedicated bus giving it its own IOMMU group

f they’re just on a standard desktop system with a single GPU, the answer is to not use MAAS and just do straight LXD with a container instead

if they are on server grade hardware, then they should make sure their firmware has IOMMU groups and VT-d enabled, the MAAS commissioning data should then include the IOMMU configuration too to validate it’s all good at which point they can add the GPU to the VM, usually by selecting it based on PCI address

That’s an example of a MAAS managed VM with two GPUs attached that I have here:

`root@athos:~# lxc config show maas-vm12 --expanded
architecture: x86_64
config:
  limits.cpu: "4"
  limits.memory: 8GiB
  security.secureboot: "false"
  volatile.eth0.hwaddr: 00:16:3e:73:1f:58
  volatile.eth1.hwaddr: 00:16:3e:73:e3:7a
  volatile.last_state.power: STOPPED
  volatile.uuid: 63f91205-2ee8-47ac-9406-d821210c6f7c
devices:
  eth0:
    boot.priority: "10"
    name: eth0
    nictype: bridged
    parent: br1016
    type: nic
  eth1:
    name: eth1
    nictype: bridged
    parent: br1017
    type: nic
  gpu1:
    pci: "0000:04:00.0"
    type: gpu
  gpu2:
    pci: "0000:82:00.0"
    type: gpu
  root:
    path: /
    pool: hdd
    size: 50GiB
    type: disk
ephemeral: false
profiles:
- maas
stateful: false
description: ""`

also worth noting that if it’s a NVIDIA GPU, non-datacenter cards cannot be passed through without using a bunch of workarounds.

That’s not really a technical limitation so much as a product limitation, NVIDIA wants you to use datacenter cards (Quadro/Tesla) for virtualized environments.

passthrough will appear to function with consumer grade cards but the driver will then refuse to function properly in the VM

fwiw, MAAS doesn’t have direct handling for attaching physical/mdev/SR-IOV devices to VMs other than for networking cards"

@dandevslack, any of this help you figure it out?

wow. Yes. Thats fantastically in depth and helpful. I definitely have datacenter cards hanging around that can be used for these purposes, but will have to find the appropriate hardware that can be configured this way. Thanks again bill, i’ve got some work cut out for me to run this one down.

1 Like

my title is developer advocate, and that’s what i try to do. i’m a UNIX developer since 1974; i don’t develop MAAS, but i sit in the engineering team, write the docs, blog, and try to help developers as much as i can. feel free to call on me as needed – i may not know the answer, but i can connect with people who probably will. good luck, and let me know how it’s going.

1 Like

Just an update. I need to test this with a proper server, but it turns out the desktop test environment is vt-x enabled, supports IOMMU and has 3 cards. I adjust the config file to mimic yours (while pulling the pci id for the unallocated nvidia card from using sudo lshw -C display ). It still did not work. I will push forward with server infra and quadro cards next week and report back results. Thanks again for runnign this down.

1 Like

any news about this? @dandevslack @billwear? I’ve tried this today on datacenter GPUs (Tesla T4), but the error is the same (“Error: virtiofsd failed to bind socket within 10s”). So containers work, but VMs do not… Anything that you need to be careful about? Some extra configs?

Hey @matjaz i actually have not figured this out with either the GeForce or Quadro cards. I’ve also tried both versions of vm including beta LXD on maas. But i can confirm that containers do work :slight_smile: