Hi Dan,
Which model of GPU(s) are you using? Are you able to share information about your setup/test bed?
Unfortunately right now there is no way to “carve up” or share a single GPU across multiple KVM instances, at least not with NVIDIA cards on Ubuntu. The functionality you are describing is known as vGPU support and we are hoping to support this feature at some point later on in Ubuntu.
However, certain models of NVIDIA cards, I believe the A100 allow a GPU to be “split” into several “segments” similar to vGPU without the need for additional software like a licence server. These are presented to the underlying host as several individual GPU(S) and thus, resource sharing is made easier.
Here is a direct quote from NVIDIA on the subject:
MIG Capability of NVIDIA Ampere GPU Architecture
The new MIG feature can partition each A100 into as many as seven GPU Instances for optimal
utilization, effectively expanding access to every user and application.
The A100 GPU new MIG capability can divide a single GPU into multiple GPU partitions called
GPU Instances. Each instance’s SMs have separate and isolated paths through the entire
memory system — the on-chip crossbar ports, L2 cache banks, memory controllers and DRAM
address busses are all assigned uniquely to an individual instance. This ensures that an
individual user’s workload can run with predictable throughput and latency, with the same L2
cache allocation and DRAM bandwidth even if other tasks are thrashing their own caches or
saturating their DRAM interface.
Using this capability, MIG can partition available GPU compute resources to provide a defined
quality of service (QoS) with fault isolation for different clients (such as VMs, containers,
processes, and so on). It enables multiple GPU Instances to run in parallel on a single, physical
A100 GPU. MIG also keeps the CUDA programming model unchanged to minimize
programming effort.
CSPs can use MIG to raise utilization rates on their GPU servers, delivering up to 7x more GPU
Instances at no additional cost. MIG supports the necessary QoS and isolation guarantees
needed by CSPs to ensure that one client (VM, container, process) cannot impact the work or
scheduling from another client.
CSPs often partition their hardware based on customer usage patterns. Effective partitioning
only works if hardware resources are providing consistent bandwidth, proper isolation, and good
performance during runtime.
With NVIDIA Ampere architecture-based GPU, users will be able to see and schedule jobs on
their new virtual GPU Instances as if they were physical GPUs. MIG works with Linux operating
systems and their hypervisors. Users can run containers with MIG using runtimes such as
Docker Engine, with support for container orchestration using Kubernetes coming soon.
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf.
I have not personally tested this, however it should work. If you’re using an AMD card I assume there will be similar functionality available.