Unable to update storage size if node is in deployed state

aluria · 24 August 2020 07:49

Hi,

Last week, I filed a bug (LP#1892384) about a known issue that affects day 2 operations. I see this topic has been treated before, and I wanted to raise the question on why this type of change is not allowed in MAAS.

The use case scenario I mentioned in the bug is one where a KVM deployed via Pods needed to be vertically scaled (not enough storage for the current amount of Prometheus metrics). The changes via virsh and KVM’s Linux terminal can be run without downtime:

# get target to resize. this will show list of targets and their source on host
virsh domblklist $domain_name

# to check current block info
virsh domblkinfo $domain_name $drive_target

# resize drive ($size must be in KiB, not bytes)
virsh blockresize $domain_name $target_path $size

# this should give you a confirmation of the block device resize
virsh domblkinfo $domain_name

# then, from the Linux console
growpart $device_path $partition_num

# Check if the change was applied to the block device
lsblk

# Then resize filesystem to new size. Ex of $device_path_and_num is /dev/vda1)
resize2fs $device_path_and_num

# Confirm new capacity is usable
df -h

It sounds wrong that the KVM runs a new “flavor” while MAAS keeps the old “flavor” (besides that a future redeploy would use the old “flavor” again). Furthermore, if libvirt/qemu allows “hot” changes (without downtime), the same should be allowed by MAAS. There are cases where redeploying into a new KVM with an upgraded “flavor” is not an option:

hosting machine constraints. Machine can hold a resize but not host a new KVM.
stored data would need to be moved to the new KVM
the running service does not support HA, so downtime would be expected

On the other hand, and as mentioned in the reported bug, I think allowing this type of change should not be restricted to KVMs. Common day 2 operations related to storage involves replacing failed disks without downtime. I understand, this case scenario would be less common as the disk may run on top of LVM or RAID (so, disk uuid does not change; although there are storage layouts where disks are used as JBOD without any abstraction, such as Ceph OSDs or Swift nodes), and replaced disks are usually of the same size.

Kind regards,
-Alvaro.