Stress Testing for GPU

For the AI devices, Maybe we should add a module that can perform stress testing for GPUs,
like ‘stress-ng’ for CPU or memory.
For the loaded linux system to test GPU, It needs to disable the ‘nouveau’,
Install the necessary drivers and tools like CUDA(for Nvidia GPUs) ,g++, gcc, make, etc.
And as the CUDA Toolkit package is very large,maybe add a local file server module to download it.

Test method like gpu-burn,bandwidth Test, p2p Bandwidth Latency Test, fieldiag, etc.

Hi there!

Thank you for this feature request, we will definitely consider it for the future!

As for achieving this at present, you can try writing a hardware test script. An example test script already featuring stress-ng can be found in our ‘Hardware test scripts’ reference.

Afterwards, definitely consider contributing it to MAAS’ existing suite of testing scripts[1] , as I’m certain this functionality would be valuable for others, too.

Hope that helps,
Andrew

[1]: There is this repo too for example commissioning and testing scripts: GitHub - canonical/maas-commissioning-scripts: A repository of example MAAS commissioning scripts