Hi there.
TL;DR
How can I debug my node storage json without having to re-commission a node?
Background
I’ve been writing a custom commissioning script to output extra-storage
, as documented on this thread. This is working well, EXCEPT that every change that I make to my commissioning script, and that I want to test, requires that I re-commission the node. Since each commissioning run boots the node and runs scripts, each test iteration takes a long time.
Thus the question: can I somehow interact with MaaS directly to try to apply the json to the MaaS node?
What I’ve found so far: this thread directly invoking the MaaS Python. Here’s where I’m at now:
# snap run --shell maas -c 'maas-region shell'
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from maasserver.models import Machine
>>> machine = Machine.objects.get(hostname="mynode")
>>> node = machine.as_node()
>>> node
<Node: bw7exd (mynode)>
>>> from maasserver.storage_layouts import CustomStorageLayout
>>> CustomStorageLayout
<class 'maasserver.storage_layouts.CustomStorageLayout'>
>>> l = CustomStorageLayout(node)
>>> l.boot_disk
<PhysicalBlockDevice: TOSHIBA MG03ACA1 S/N 76P7KLUDF 1.0 TB attached to bw7exd (mynode)>
I feel like I’m really close. How do I provide the json from the 50-maas-01-commissioning
step to this code?
Even more background
Past runs with json configuration errors resulted in a stack trace within /var/snap/maas/common/log/regiond.log
. Here’s an example:
2023-07-06 19:38:15 metadataserver.api: [critical] mynode.mynet(bw7exd): commissioning script '50-maas-01-commissioning' failed during post-processing.
Traceback (most recent call last):
File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/api.py", line 860, in signal
target_status = process(node, request, status)
File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/api.py", line 682, in _process_commissioning
self._store_results(
File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/api.py", line 565, in _store_results
script_result.store_result(
File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/models/scriptresult.py", line 372, in store_result
signal_status = try_or_log_event(
--- <exception caught here> ---
File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/api.py", line 483, in try_or_log_event
func(*args, **kwargs)
File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/builtin_scripts/hooks.py", line 1123, in process_lxd_results
_process_lxd_resources(node, data)
File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/builtin_scripts/hooks.py", line 637, in _process_lxd_resources
storage_devices = _update_node_physical_block_devices(
File "/snap/maas/27405/lib/python3.10/site-packages/metadataserver/builtin_scripts/hooks.py", line 881, in _update_node_physical_block_devices
custom_layout = get_storage_layout(custom_storage_config)
File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 138, in get_storage_layout
entries = _get_storage_entries(config["layout"])
File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 493, in _get_storage_entries
entries = _flatten(config)
File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 315, in _flatten
items.extend(flattener(name, data))
File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 196, in _flatten_disk
items.extend(_disk_partitions(name, data.get("partitions", [])))
File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 284, in _disk_partitions
size=_get_size(part["size"]),
File "/snap/maas/27405/lib/python3.10/site-packages/maasserver/storage_custom.py", line 554, in _get_size
raise ConfigError(f"Invalid size '{size}'")
maasserver.storage_custom.ConfigError: Invalid size '50GB'
This was really great to find out why my json was causing errors.
Once I’ve fixed all those errors, sometimes my storage json still would not apply. After commissioning, the node would have a blank storage layout. I had to repeatedly re-commission the node to blindly guess what the problem was. In my case, I had allocated too much space to my partitions, which meant they could not fit on the node’s hardware. Unfortunately, there is no stack trace or logged error when this happens, it just fails silently (note to devs: can this error be made more obvious somehow?).
Thus I’m at my current situation which prompted this post. I am using trial and error to figure out my maximum partition sizes. Because I have to re-commission every time I make a change, this becomes very time-consuming.