Debugging node storage without commissioning

Hi there.

TL;DR

How can I debug my node storage json without having to re-commission a node?

Background

I’ve been writing a custom commissioning script to output extra-storage, as documented on this thread. This is working well, EXCEPT that every change that I make to my commissioning script, and that I want to test, requires that I re-commission the node. Since each commissioning run boots the node and runs scripts, each test iteration takes a long time.

Thus the question: can I somehow interact with MaaS directly to try to apply the json to the MaaS node?

What I’ve found so far: this thread directly invoking the MaaS Python. Here’s where I’m at now:

# snap run --shell maas -c 'maas-region shell'
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from maasserver.models import Machine
>>> machine = Machine.objects.get(hostname="mynode")
>>> node = machine.as_node()
>>> node
<Node: bw7exd (mynode)>
>>> from maasserver.storage_layouts import CustomStorageLayout
>>> CustomStorageLayout
<class 'maasserver.storage_layouts.CustomStorageLayout'>
>>> l = CustomStorageLayout(node)
>>> l.boot_disk
<PhysicalBlockDevice: TOSHIBA MG03ACA1 S/N 76P7KLUDF 1.0 TB attached to bw7exd (mynode)>

I feel like I’m really close. How do I provide the json from the 50-maas-01-commissioning step to this code?

Even more background

Past runs with json configuration errors resulted in a stack trace within /var/snap/maas/common/log/regiond.log. Here’s an example:

2023-07-06 19:38:15 metadataserver.api: [critical] mynode.mynet(bw7exd): commissioning script '50-maas-01-commissioning' failed during post-processing.
        Traceback (most recent call last):
          File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/api.py", line 860, in signal
            target_status = process(node, request, status)
          File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/api.py", line 682, in _process_commissioning
            self._store_results(
          File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/api.py", line 565, in _store_results
            script_result.store_result(
          File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/models/scriptresult.py", line 372, in store_result
            signal_status = try_or_log_event(
        --- <exception caught here> ---
          File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/api.py", line 483, in try_or_log_event
            func(*args, **kwargs)
          File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/builtin_scripts/hooks.py", line 1123, in process_lxd_results
            _process_lxd_resources(node, data)
          File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/builtin_scripts/hooks.py", line 637, in _process_lxd_resources
            storage_devices = _update_node_physical_block_devices(
          File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/builtin_scripts/hooks.py", line 881, in _update_node_physical_block_devices
            custom_layout = get_storage_layout(custom_storage_config)
          File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 138, in get_storage_layout
            entries = _get_storage_entries(config["layout"])
          File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 493, in _get_storage_entries
            entries = _flatten(config)
          File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 315, in _flatten
            items.extend(flattener(name, data))
          File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 196, in _flatten_disk
            items.extend(_disk_partitions(name, data.get("partitions", [])))
          File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 284, in _disk_partitions
            size=_get_size(part["size"]),
          File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 554, in _get_size
            raise ConfigError(f"Invalid size '{size}'")
        maasserver.storage_custom.ConfigError: Invalid size '50GB'

This was really great to find out why my json was causing errors.

Once I’ve fixed all those errors, sometimes my storage json still would not apply. After commissioning, the node would have a blank storage layout. I had to repeatedly re-commission the node to blindly guess what the problem was. In my case, I had allocated too much space to my partitions, which meant they could not fit on the node’s hardware. Unfortunately, there is no stack trace or logged error when this happens, it just fails silently (note to devs: can this error be made more obvious somehow?).

Thus I’m at my current situation which prompted this post. I am using trial and error to figure out my maximum partition sizes. Because I have to re-commission every time I make a change, this becomes very time-consuming.

Hi @greenmoss!

As you have noticed we do a static validation of the json you provide using jsonschema. The equivalent of your code is the following:

from maasserver.storage_custom import _validate_schema

layout = {
  "layout": {
    "storage": {
      "type": "raid",
      "level": 5,
      "members": [
        "sda",
        "sdb",
        "sdc"
      ],
      "spares": [
        "sdd",
        "sde"
      ],
      "fs": "btrfs"
    }
  },
  "mounts": {
    "/data": {
      "device": "storage"
    }
  }
}
_validate_schema(layout)

This is good only as preflight check to catch syntax errors. I don’t think there is no other quicker way for you to test the application of that configuration.

The other resource we have is https://maas.io/docs/storage-layouts-reference which lists the supported options.

Hope this helps

Thanks for your reply!

Something in the code is silently rejecting my extra-storage json. It’s definitely not a schema validation failure, because I have had those violations before, and I fixed all errors. Now I no longer have validation failures.

The current failure is silent, but it is definitely due to exceeding the amount of available storge blocks. I can prove this by increasing the requested partition sizes. Once I do that and recommission, I can see in the MaaS web UI that the node reverts to “No storage (blank) layout”. I can also see from the output of 50-maas-01-commissioning that my extra-storage was included.

So the question then becomes: what in the code is rejecting the extra-storage, and reverting to “No storage (blank) layout”?

SOLVED

I pulled in the json storage-extra section from the commissioning output. It successfully applied to the machine:

json_layout = json.loads('{"layout":{<trimmed for brevity>}')
storage_layout = maasserver.storage_custom.get_storage_layout(json_layout)
maasserver.storage_custom.apply_layout_to_machine(storage_layout, machine)

I then edited one of the partitions in the json-extra, to purposefully exceed disk space. When I try to apply the new json using the steps above, I get this error:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 164, in apply_layout_to_machine
    apply_layout(machine, entry, block_devices)
  File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 346, in _apply_layout_partition
    block_devices[entry.name] = models.Partition.objects.create(
  File "/snap/maas/27405/usr/lib/python3/dist-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/snap/maas/27405/usr/lib/python3/dist-packages/django/db/models/query.py", line 453, in create
    obj.save(force_insert=True, using=self.db)
  File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/models/partition.py", line 226, in save
    return super().save(*args, **kwargs)
  File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/models/cleansave.py", line 46, in save
    self.full_clean(exclude=exclude_clean_fields, validate_unique=False)
  File "/snap/maas/27405/usr/lib/python3/dist-packages/django/db/models/base.py", line 1251, in full_clean
    raise ValidationError(errors)
django.core.exceptions.ValidationError: {'size': ['Partition cannot be saved; not enough free space on the block device.']}

For my purposes, this is exactly what I want. I want to fiddle with the disk sizes to remove wasted/extra blocks, without a costly recommissioning step.

If there’s a way to do this from the API, or from the web GUI, please let me know. Otherwise, I consider this solved.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.