Monitor the status connection of every rack to every region

Hi guys, I’m using endpoint /metrics that MaaS provided with regions and racks. I got it all metrics into my grafana.

I’m searching for a metric for ensure rack and regions rpc communication is fine, like when I have failure in the rack, but what side of failure is it on rack ou region? I want see region and rack communication for solute this case

I assume you are interested to know in general if the rack is “connected” to the region, meaning that it is receiving/sending commands regardless their execution status (fail or success).
With you’ll get the services availability in the metrics (racks “status” as well). It will be shipped with 3.5.

Otherwise, in every rack, you can monitor maas_rack_region_rpc_call_latency_count{call="Ping" which is a periodic ping sent from the rack to the region every 30 seconds. If you don’t see an increase, then it’s a warning.

I cannot identify which region is receiving my rack with this metric, I would not be able to know at which point the communication failure is heard, for example I have 3 regions and 3 racks, one of my racks failed to communicate with the region, but which region?

AFAIK this is not available at the moment. I’m marking this as a community feature request.

1 Like