I ran a custom test script on a bare-metal node. The script finished with exit code 0 and was marked as passed, but the node itself is still reported as Failed Testing. Only one test was executed.
Expected Result:
The node should be marked as passed if all executed tests return exit code 0.
Actual Result:
Despite the test passing, the node is reported as Failed Testing.
Hi @kasia1
Sorry for the late reply. I’ve been tied up most of the month. I’ve tried several different scripts that I wrote, but they all return 0 and are still marked as failed and this is one of them.
#!/bin/bash
#
# --- Start MAAS 1.0 script metadata ---
# name: ibstat
# title: Infiniband ports Validation
# description: Inifiniband ports status.
# tags: infiniband
# script_type: test
# hardware_type: network
# parallel: instance
# apply_configured_networking: true
# timeout: 00:05:00
# --- End MAAS 1.0 script metadata ---
set -euo pipefail
export DEBIAN_FRONTEND=noninteractive
function setUp() {
DOCA_URL="https://linux.mellanox.com/public/repo/doca/3.2.0/ubuntu22.04/x86_64/"
BASE_URL=$([ "${DOCA_PREPUBLISH:-false}" = "true" ] && echo https://doca-repo-prod.nvidia.com/public/repo/doca || echo https://linux.mellanox.com/public/repo/doca)
DOCA_SUFFIX=${DOCA_URL#*public/repo/doca/};
DOCA_URL="$BASE_URL/$DOCA_SUFFIX"
sudo apt install gnupg -y
curl $BASE_URL/GPG-KEY-Mellanox.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub
echo "deb [signed-by=/etc/apt/trusted.gpg.d/GPG-KEY-Mellanox.pub] $DOCA_URL ./" > /etc/apt/sources.list.d/doca.list
sudo apt-get update -y
sudo apt-get install doca-ofed -y
}
function get_all_devs() {
ls -1 "/sys/class/infiniband" | grep -E '^mlx5_[0-9]+$'
}
function failed_devices() {
local failed=()
local state phys_state
ibports=$(get_all_devs)
for dev in $ibports; do
output=$(ibstat -d "$dev")
state=$(echo "$output" | grep -Po "State:\s*\K\w+")
phys_state=$(echo "$output" | grep -Po "Physical state:\s*\K\w+")
if [[ "$state" != "Active" || "$phys_state" != "LinkUp" ]]; then
failed+=("$dev")
fi
done
echo "${failed[@]}"
}
function main() {
setUp
ibports=$(get_all_devs)
count=$(echo "$ibports" | wc -w)
if [[ "$count" -lt 8 ]]; then
echo "Error: Expected at least 8 InfiniBand devices, found $count"
exit 1
fi
failed_devices=$(failed_devices)
if [[ -z "$failed_devices" ]]; then
for d in $ibports; do
echo "================ Port $d ================"
mlxlink -d $d -c -e -m
done
echo -e "All mlx devices are ACTIVE and LinkUp.\n$count devices:\n$ibports"
exit 0
else
for d in $failed_devices; do
echo "================ Port $d ================"
mlxlink -d $d -c -e -m
done
exit 1
fi
}
main
This script only runs for about 2 minutes, doesn’t time out and is still marked as failed in the end.