MAAS Agent and Temporal User Documentation

Introduction

What is MAAS Agent?

MAAS Agent, short for Metal-as-a-Service Agent, is a new component designed to enhance the functionality of the Rack Controller. Unlike a mere rewrite, the MAAS Agent is a complete redesign that meets the modern requirements of the system. It is intended to run on Top of Rack (ToR) switches, where resources like CPU and RAM are limited. This redesigned component is implemented as a single binary in the Go programming language. As of now, MAAS Agent is managed by the Rack Controller and serves to extend MAAS’s capabilities.

What is Temporal?

Temporal is a microservice orchestration platform that empowers developers to create scalable applications without compromising productivity or reliability. In the context of MAAS, Temporal serves as a seamless underlying infrastructure for executing tasks and workflows. It is embedded within MAAS, just like external services, and facilitates the execution of MAAS tasks requiring communication between Rack and Region controllers.

MAAS Agent

Overview

The MAAS Agent represents a significant advancement in the MAAS ecosystem. It is a modernized version of the Rack Controller and serves as a vital component to meet the ever-evolving requirements of the system.

Purpose and Benefits

The primary objective of the MAAS Agent is to facilitate improved performance on resource-restricted environments, particularly ToR switches. It is a single binary component implemented in the Go programming language, enabling it to run efficiently on constrained resources.

Architecture

The MAAS Agent runs as a daemon managed by the Rack Controller. This architecture ensures that the functionality and capabilities of the Rack Controller are extended without introducing complexity.

Integration with Rack Controller

The MAAS Agent is integrated into the Rack Controller’s ecosystem. It runs as a daemon, allowing it to seamlessly interact with the rest of the MAAS infrastructure. This integration ensures that the Agent’s capabilities can be leveraged effectively.

Running MAAS Agent as a Daemon

MAAS Agent operates as a daemon, providing continuous background services. To inspect and debug MAAS Agent, you can use the following commands:

For deb:

journalctl -u maas-agent

For snap:

journalctl -u snap.maas.pebble -t maas-agent

Inspection and Debugging

To inspect MAAS Agent and diagnose any issues, you can access logs using the journalctl command. By using this command with the appropriate flags, you can view the logs generated by MAAS Agent for troubleshooting purposes.

Temporal

Temporal overview

Temporal is a microservice orchestration platform that empowers developers to build scalable applications while maintaining productivity and reliability. In the context of MAAS, Temporal serves as the backbone for executing tasks and workflows.

Role in MAAS

Temporal is embedded within MAAS, enabling it to facilitate communication and orchestration between different components. It is responsible for executing MAAS tasks that require communication between Rack and Region controllers.

Why Temporal?

Temporal replaces the existing RPC communication mechanism with reliable transport and durable workflows. This enhances MAAS’s ability to handle tasks that involve communication between controllers. The robustness of Temporal allows it to handle task retries and replay workflows, even in the event of executor crashes.

Key Features

Key features of Temporal include:

  • Reliable transport for communication between controllers.
  • Durable workflows that ensure tasks are completed accurately and consistently.
  • Built-in mechanisms for task retries and replaying workflows.

Metrics and Monitoring

Temporal has its metrics that can be collected using Prometheus. These metrics provide insights into Temporal’s performance and ensure smooth operation.

To access Temporal metrics, use the following URL structure:

  • For Temporal Server Metrics: MAAS_IP:MAAS_PORT/metrics/temporal
  • For Temporal Client Metrics: MAAS_RACK_IP:RACK_PORT

Troubleshooting

Although details about troubleshooting are limited at this time, MAAS aims to expose relevant metrics and logs to aid in diagnosing issues. These metrics and logs will be valuable for monitoring and ensuring Temporal operates as expected.

Temporal Codec Service

What is a Codec Server?

A Codec Server is a service that decodes Workflow Payloads, aiding in troubleshooting and monitoring. While users typically do not interact with Temporal directly, Codec Servers provide a way to inspect workflow arguments and responses.

Why is it Used?

Codec Servers are used when users want to read workflow arguments or responses for troubleshooting purposes. They facilitate the decryption of encrypted binary data using specified algorithms and keys.

Setting Up a Codec Server

Users can set up their Codec Server using provided examples. By running the Codec Server with the necessary parameters, users can decode Workflow Payloads and analyze workflow details.

Using the Codec Server

The Codec Server can be accessed by specifying the ENV variable TEMPORAL_CLI_CODEC_ENDPOINT and using Temporal’s tctl command-line tool. This allows users to decode and inspect encrypted data for troubleshooting purposes.

Do we want to mention Temporal Grafana dashboard https://github.com/temporalio/dashboards that can be used for monitoring?

yeah, not sure i meant to skip those.

Thanks for starting this. It is really needed.

I found some duplicated information. Let me list what I found:

  • MAAS Agent runs on ToR/limited resources is in first section, “purpose and benefits” (maybe they can be merged together, they seem to have a very similar purpose?),
  • MAAS Agent runs as a daemon is in “architecture”, “integration with Rack”, and “running MAAS Agent as a Daemon” (I’d remove the last one - users do not really need to care because we run it - and put the commands for deb and snap under “Inspection and Debugging”)

Also for “purpose and benefits” and “What is MAAS Agent”: I think the primary purpose ain’t being small and lightweight, however we do not really talk about what MAAS Agent actually does and why it exists, we only tease that it is “a vital component”. I’d be interested to know more about why it’s vital and it’s responsibilities.

In addition, what is the intention of “Setting up a Codec Server”? Currently, it does not provide links to “necessary parameters” or “provided examples”. If we do not want to teach people about this in more detail, the whole section on “Temporal Codec Server” could be removed IMHO (also codec servers can have more responsibilities than aiding in debugging iiuc). My suggestion would be to make this a subsection of “Troubleshooting” as in “how to setup/use a codec server to debug temporal payloads?” (best I can come up with with my shallow understanding :slight_smile: )

Finally, @billwear @troyanov Do you think we should also make clearer that the ToR thing is more of a mid term vision for rackd in general but nothing we do today?

Sorry, that’s a lot of feedback to digest. Hope it helps, though!