Feature request: Health check endpoint

szeestraten · 12 February 2021 09:57

Hi, as MAAS does not seem to have any official support for health monitoring, I would like to suggest that a health check endpoint should be added to the API similar to Grafana’s /api/health or Prometheus’s /-/healhty.

This is different from the Prometheus metrics endpoint /MAAS/metrics which already exists, but only reports metrics and not health status.

jgoetz417 · 4 August 2021 20:04

Hello!

Just adding a +1 to this feature request, I’m looking for a way to monitor the following health status from both the rack and region controllers from some kind of metrics output:

regiond status
bind9 status
ntp status
proxy status
syslog status
rackd status
tftp status
dhcpd status
image sync status

All of which are currently available in the webUI under Controllers > Controller summary. It’d be great to be able to access this data via CLI or some other method (our usecase would utilize Zabbix for health monitoring).

sagor999 · 5 October 2021 18:55

hello,
+1 and bump. would love to see this feature added as well. thank you!

jgoetz417 · 5 October 2021 19:14

Hello!

Just to add to this post for others looking, I was able to get the health status check working via MAAS CLI. It’s not a direct health-check endpoint, but a good workaround for those looking for solutions.

To get this working, we have a system account on the MAAS server specifically created to act as a monitoring API. Once you have the maas CLI command setup on your server, we’re able to pull a JSON file containing an output of all MAAS statues, I was able to use JQ to parse out the service-specific statuses with the following command:

maas YOUR_CLI_USERNAME rack-controllers read | jq '.[].service_set'

The above command produces the following output:

[
  {
    "name": "regiond",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "syslog_region",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "bind9",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "proxy",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "ntp_region",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "tftp",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "dns_rack",
    "status": "unknown",
    "status_info": "managed by the region"
  },
  {
    "name": "ntp_rack",
    "status": "unknown",
    "status_info": "managed by the region"
  },
  {
    "name": "http",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "rackd",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "dhcpd",
    "status": "running",
    "status_info": ""
  },
  {
    "name": "dhcpd6",
    "status": "off",
    "status_info": ""
  },
  {
    "name": "syslog_rack",
    "status": "unknown",
    "status_info": "managed by the region"
  },
  {
    "name": "proxy_rack",
    "status": "unknown",
    "status_info": "managed by the region"
  }
]

From the above, we were able to parse out the health data we needed and feed it to our monitoring system.

Hope this helps someone else looking for a similar solution.