@r9labs thanks for your input here. I have adapted what you are doing to fit our setup. My commissioning script:
#!/bin/bash -ex
# --- Start MAAS 1.0 script metadata ---
# name: 00-maas-000-mft-4.14.4-6-x86_64
# title: 00-maas-000-mft-4.14.4-6-x86_64
# description: mft-4.14.4-6-x86_64
# script_type: commissioning
# tags: configure_infiniband
# packages:
# url: https://www.mellanox.com/downloads/MFT/mft-4.14.4-6-x86_64-deb.tgz
# recommission: False
# may_reboot: True
# --- End MAAS 1.0 script metadata ---
apt-get update -y
apt-get install gcc make dkms linux-headers-5.4.0-26-generic linux-headers-generic -y
$DOWNLOAD_PATH/mft-4.14.4-6-x86_64-deb/install.sh
mst start
mlxconfig --yes -d /dev/mst/mt4123_pciconf0 set LINK_TYPE_P1=2
reboot
commissioning script log
The above commissioning script succeeds, but the node never comes back up after the reboot is called at the end of the script
Which leads to the next commissioning script just hanging and eventually failing due to time out.
This feels to me like the interfaces are getting mixed up somewhere once the infiniband card is configured for ETH or possibly the machine is grabbing a new ip when it comes back up from reboot.
Trying to get at the jvm console to get some further info. I’ll post back when I have something.
Thanks!
Update
After playing around with aborting the commissioning process and recommissioning, I’ve found a path forward.
The infiniband card gets configured the first time the commissioning script is ran with this command
mlxconfig --yes -d /dev/mst/mt4123_pciconf0 set LINK_TYPE_P1=2
Following this, the commissioning script runs the reboot
command and the commissioning process hangs due to the node not coming back up.
At this point, the mellanox card is configured with LINK_TYPE_P1=2
.
Following this, I abort commissioning, and recommission the node, this time without 00-maas-000-mft-4.14.4-6-x86_64
.
This second commissioning succeeds and is able to find the infiniband card in 40-maas-01-network-interfaces
because; a) the mellanox card was configured by the mlxconfig
command in the 1st commissioning, b) the 00-maas-000-mft-4.14.4-6-x86_64
doesn’t run in the second commissioning.
All this said, I guess I’m left wondering if there is something I’m missing here that would allow me to get this all in one pass through?
Thanks!