Support for Mellanox 100G in MAAS

I have tried scouring the web for this information, but I’m wondering if MAAS supports out of the box Mellanox 100G cards/interfaces?

I can’t seem to get the Mellanox cards configured using netplan manually and I’m wondering if netplan doesnt support Mellanox, then I’m thinking MAAS wont be able to support it?

Hi @r9labs, not sure if it helps or how things work in the latest MAAS releases, but do you install any Mellanox drivers during enlist/commission/install steps (see Custom node setup docs)? I got a script that installs Mellanox drivers on 16.04 during commissioning from @jamesbeedy so he might know some more.

1 Like

I don’t currently, but thats something I can look into. It looks like I’ll most likely have to configure a seperate network after the provisioning has occurred.

This is interesting I’d really like to know what you find.
Question though: If you go ahead and deploy Ubuntu 18.04 to the machine with the ConnectX adapter, does it natively find it, or do you need to compile and install and register the ConnectX modules before it turns up in lspci?

Here is the script for installing ConnectX drivers for Xenial from @jamesbeedy in case that helps: https://paste.ubuntu.com/p/nN662VbY8F/

Thanks for that example. I had forgotten about script metadata and didn’t realise that tar.gz files could be downloaded as packages.

I use the following script for my deployments. It’s written so that I can use the same script for any version of Ubuntu supported by Mellanox. The installation itself is a bit custom because I have Infiniband adapters but I have to force then into ethernet mode because MAAS doesn’t seem to inderstand Infiniband. I have to admit that I’m also running a slightly ghetto network configuration.

https://gist.github.com/lparkes/3b9e4e6e772dd1635eec50cc284b7b3a

1 Like

After much trial and tribulation, we found out that the cards ship by default with the infiniband protocol enabled (IB). We (MAAS) needs it to be configured as ETH protocol. So in order to do that automatically on comissioning, you should download the latest mellanox MFT tool: http://www.mellanox.com/page/management_tools

and then use this as a commissioning script, named 00-maas-00-mftinstall.sh:
https://paste.ubuntu.com/p/NqMvkHRgNT/

you’ll have to host the .tgz file somewhere and update the script to download it from whatever URL you use.

This script will install the Mellanox mangement tools, and then set the firmware on your card to be ETH. After the machine commissions, you’ll be able to configure your mellanox 100G cards in MAAS.

3 Likes

@r9labs thanks for your input here. I have adapted what you are doing to fit our setup. My commissioning script:

#!/bin/bash -ex
# --- Start MAAS 1.0 script metadata ---
# name: 00-maas-000-mft-4.14.4-6-x86_64
# title: 00-maas-000-mft-4.14.4-6-x86_64
# description: mft-4.14.4-6-x86_64
# script_type: commissioning
# tags: configure_infiniband
# packages:
#  url: https://www.mellanox.com/downloads/MFT/mft-4.14.4-6-x86_64-deb.tgz
# recommission: False
# may_reboot: True
# --- End MAAS 1.0 script metadata ---
apt-get update -y
apt-get install gcc make dkms linux-headers-5.4.0-26-generic linux-headers-generic -y

$DOWNLOAD_PATH/mft-4.14.4-6-x86_64-deb/install.sh
mst start
mlxconfig --yes -d /dev/mst/mt4123_pciconf0 set LINK_TYPE_P1=2
reboot

commissioning script log
The above commissioning script succeeds, but the node never comes back up after the reboot is called at the end of the script :cry:

Which leads to the next commissioning script just hanging and eventually failing due to time out.

This feels to me like the interfaces are getting mixed up somewhere once the infiniband card is configured for ETH or possibly the machine is grabbing a new ip when it comes back up from reboot.

Trying to get at the jvm console to get some further info. I’ll post back when I have something.
Thanks!

Update

After playing around with aborting the commissioning process and recommissioning, I’ve found a path forward.

The infiniband card gets configured the first time the commissioning script is ran with this command

mlxconfig --yes -d /dev/mst/mt4123_pciconf0 set LINK_TYPE_P1=2

Following this, the commissioning script runs the reboot command and the commissioning process hangs due to the node not coming back up.

At this point, the mellanox card is configured with LINK_TYPE_P1=2.

Following this, I abort commissioning, and recommission the node, this time without 00-maas-000-mft-4.14.4-6-x86_64.

This second commissioning succeeds and is able to find the infiniband card in 40-maas-01-network-interfaces because; a) the mellanox card was configured by the mlxconfig command in the 1st commissioning, b) the 00-maas-000-mft-4.14.4-6-x86_64 doesn’t run in the second commissioning.

All this said, I guess I’m left wondering if there is something I’m missing here that would allow me to get this all in one pass through?

Thanks!

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.