Automated OS Image Testing

Adding new tests

Testing a feature

The tests for each image live in systemtests.tests_per_machine:test_full_circle, under the conditional block image_to_test encapsulating the optional test_images step; Image tests have long execution times and only represent meaningful information in the image testing pipeline.

There are two context managers specific to image tests that will help in your journey here:

  • report_feature_tests, creates a logger for this feature and reports a failure/pass status depending on whether exceptions are thrown by execution within it’s with block.
  • release_and_redeploy_machine: releases a machine and executes code in the with block, ensuring the machine is redeployed afterwards, even if an exception is thrown during the with block exection.

We’ll use the storage configuration as an example for how to setup a new feature test and have it report in the temporal workflow.

From above, the steps to adding a new storage configuration to a machine in MAAS is:

  • release the machine if it is already deployed
  • set the storage layout, passing in the required configuration parameters
  • (re)deploy the machine.
  • check the machine deployed correctly.

And of course, we want to repeat this for each storage layout we wish to test, ie: flat, lvm, and bcache:

testable_layouts = ["flat", "lvm", "bcache"]
for storage_layout in testable_layouts:
    with report_feature_tests(
        testlog, f"storage layout {storage_layout}"
    ), release_and_redeploy_machine(
        maas_api_client,
        machine,
        osystem=deploy_osystem,
        oseries=deploy_oseries,
        timeout=TIMEOUT,
    ):
        maas_api_client.create_storage_layout(machine, layout_type=storage_layout, options={})

deploy_osystem, deploy_oseries, and TIMEOUT are variables set when initially deploying the machine used in the test, where create_storage_layout executes:

machine set-storage-layout machine["system_id"] storage_layout={layout_type}

for the machine under test

This would give log outputs similar to the following, depending on execution state:

  • [<test name>].storage layout flat: Starting test
  • [<test name>].storage layout flat: <error message>
  • [<test name>].storage layout flat: FAILED
  • [<test name>].storage layout flat: PASSED

Reporting the feature

Okay, great, we have a feature we can test. We still need to let temporal know this is a feature we should report.
This requires extending the image_reporting_workflow, specifically the parse_test_results activity, as that is where we parse the test results into a format understood by following activities.

We’ll make use of a few functions and classes here:

  • get_step_from_results, this returns only the desired log segment corresponding to a test step, as test_full_circle has multiple test steps (enlist, metadata, commission, deploy, test_image, rescue), many of which are not neccesary for the feature being parsed.
  • determine_feature_state, this searches the supplied log for the feature name, followed by the regex :?\s(\w+):?\s(?:\-\s)?([A-Z]{4,}).
    This matches <feature_name> <feature_type>: <feature_state> (ie: storage layout flat: PASSED), returning the feature_type and feature_state, as well as any errors if they are present.
  • FeatureStatus, a dataclass that neatly wraps the reported state of a tested feature.
    All of the above additionally scan, and compile a combined resultset, for all the machine architectures used in the test.
if image_tests := get_step_from_results(this_image_result, "test_image"):
  if storage_state := determine_feature_state("storage layout", image_tests):
    info, summary = storage_state
    storage_conf = FeatureStatus(
        "Storage Configuration",
        info=info,
        summary=summary,
    )
    image_results.storage_conf = storage_conf

The image_tests conditional block is shared between all features that need to parse the test_image test step.
Additionally, image_results is a variable holding the ImageTestResults dataclass, which wraps the entire testing results of a specific image, including all of it’s tested features. This dataclass contains the operations neccesary to convert itself to a dictionary for committing in the results repo.

Executing new code

If workers contain a build_id, then congratulations, your workflow now utilizes worker versioning. If they don’t, ignore this section until they do.

There are two locations that need to be updated:

  • worker_versions.yaml - This YAML file tells the workers which build versions are current and compatible with eachother
  • workers - Each worker should contain a build_id with the most up-to-date version required for it’s workflow.

The versioning in this YAML uses a trimmed down equivalent to semver, that is:
1.x is compatible with any other 1.x. If a worker has version 1.2 but the workflow is 1.3, the worker will continue to work unless a 1.3 worker comes along.
2.x is not compatible with 1.x. If a worker has version 1.2 but the workflow is 2.3, the worker will terminate itself after completing it’s next workflow, even if no other workers exist.
tl;dr: deprecated workers terminate immediately, compatible workers terminate only if other compatible workers exist.

We added extra features to the image_reporting workflow that we really want all workers to update to using. In other words, it’s kind of a breaking change. we only need to update the image_reporting workflow however, so we can only update the image reporting section of the versions dictionary.

---
queues:
  - e2e_tests
  - image_building
  - image_testing
  - image_reporting
  - mono_queue

versions:
+ image_reporting:
+   - 2.0
  default:
    - 1.0

we also need to modify the worker to accept this new versioning
from common_tasks import start_worker
from image_reporting_workflow import activities as image_reporting_activities
from image_reporting_workflow import workflows as image_reporting_workflows

if __name__ == "__main__":
    start_worker(
        task_queue="image_reporting",
        workflows=image_reporting_workflows,
        activities=image_reporting_activities,
-       build_id=1.0,
+       build_id=2.0,
    )

TL;DR

Extend test_machine:test_full_circle to include something like the following in the image_tests condition block:

for $feature_type in $list_of_supported_feature_types:
    with report_feature_tests(
        testlog, f"$feature_name {$feature_type}"
    ):
        $Test_the_feature_here

And extend image_reporting_workflow:parse_test_results to include something like the following:

if image_tests := get_step_from_results(this_image_result, "test_image"):
    if $feature_state := determine_feature_state("$feature_name", image_tests):
        state, readable, info = $feature_state
        $feature_results = FeatureStatus(
            "$feature_name",
            state=state,
            readable_state=readable,
            info=info,
        )
        image_results.$feature_name = $feature_results

(Where everything preceded by a $ should be named appropriately, of course)

Modify worker_versioning.yaml and the worker affected to reflect changes in the workflow and force running workers to switch to the new changes.