Using MAAS 2.9.2 deb, deploying a machine - physical or virtual - is no longer working. It pops back to Allocated with Failed to allocate the required AUTO IP addresses. There are about 100 IPs available. Logs show this:
Next IP address to allocate from '146.118.52.0/23' has been observed previously: 146.118.53.212 was last claimed by e4:1f:13:81:6e:7c via ens3 (physical) on maas-nimbus-rackd at 2019-06-29 06:23:02.807641.
File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 4515, in _claim_auto_ips
raise StaticIPAddressExhaustion(
maasserver.exceptions.StaticIPAddressExhaustion: Failed to allocate the required AUTO IP addresses
I deleted that entry, and got the same error but without the “has been observed previously” mention.
Some part of the subnet are allocated to Dynamic pools to support old nodes deployed with DHCP, which we are phasing out as we redeploy everything to Auto assign. The last time I got this error, I deleted a Dynamic range, and it started working again.
Sorry, we don’t have a schedule at the moment. We’re focused on getting 3.0 out, which should be done in a couple of weeks. After that we’ll work on getting a new version of 2.9 out.
Actually I should also ask - is the bug fixed in 3.0? We may look at upgrading to that as a matter of course, but also motivated by the bug fix if it’s there!
@gregoryo2017, as the MAAS tech author, i’m trying to gauge how network size affects various parts and functions of MAAS. this will help me speak more directly to specific, machine-count-related issues with the doc. can you tell me, how big is your MAAS (machines, racks, whatever)?
Since 2017 we’ve had a single regiond+rackd KVM VM for deploying our Ceph + OpenStack cluster. That cluster is now four racks with 6 KVM Pod hypervisors running over 100 virtual machines (including dev and test) and about 100 baremetal nodes. Current count: 228 machines.
Recently we have installed a separate regiond outside these racks, in preparation for a second rackd to be installed in a new set of 12 racks being deployed this year. That will have some 250 baremetal nodes running Ceph, and 24 KVM Pod nodes running haproxy and Ceph monitors.
It seems the bug is still there in 3.0 (snap 3.0.0-10029-g.986ea3e45), deploying a VM with 2 interfaces (one in DHCP, and one or more in AUTO IP) fails.
Post upgrade, is there any actions needed to fix this ?
I cannot explain how this inconsistency had happens, but it seems MaaS were thinking that a couple of IP were still allocated to some machines, which were not match with the reality.
I manage to solve the issue by manually removing these IP from the DB.