Auto assign IP not working

Using MAAS 2.9.2 deb, deploying a machine - physical or virtual - is no longer working. It pops back to Allocated with Failed to allocate the required AUTO IP addresses. There are about 100 IPs available. Logs show this:

Next IP address to allocate from '146.118.52.0/23' has been observed previously: 146.118.53.212 was last claimed by e4:1f:13:81:6e:7c via ens3 (physical) on maas-nimbus-rackd at 2019-06-29 06:23:02.807641.
	  File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 4515, in _claim_auto_ips
	    raise StaticIPAddressExhaustion(
	maasserver.exceptions.StaticIPAddressExhaustion: Failed to allocate the required AUTO IP addresses

This looks a bit like https://bugs.launchpad.net/maas/+bug/1904810

I found 146.118.53.212 in the database, maasserver_neighbour table:

2227    2019-02-05 01:53:23.261001+00   2019-06-29 06:23:02.807641+00   146.118.53.212  1561789382      \N      137     e4:1f:13:81:6e:7c       6370

I deleted that entry, and got the same error but without the “has been observed previously” mention.

Some part of the subnet are allocated to Dynamic pools to support old nodes deployed with DHCP, which we are phasing out as we redeploy everything to Auto assign. The last time I got this error, I deleted a Dynamic range, and it started working again.

Can anyone suggest what to do next?

Thanks,
Greg.

You’re likely hitting https://bugs.launchpad.net/maas/2.9/+bug/1902425, which will be fixed in the next 2.9 point release.

Thank you. Is there a schedule for that release? Is there a general way to look at point release schedules?

Thanks,
Greg.

Sorry, we don’t have a schedule at the moment. We’re focused on getting 3.0 out, which should be done in a couple of weeks. After that we’ll work on getting a new version of 2.9 out.

Okay, thank you for the update. Can you tell me where to watch for version announcements? A feed (email, discourse…) to check or subscribe to?

Actually I should also ask - is the bug fixed in 3.0? We may look at upgrading to that as a matter of course, but also motivated by the bug fix if it’s there!

Yes, the bug is fixed in 3.0. So if possible, upgrading to 3.0 when it’s out would be the quickest fix.

We announce our releases here on discourse in the News section: https://discourse.maas.io/c/news/7

Okay thanks. I’ve found the alert button so I’ll get a notification.

@gregoryo2017, as the MAAS tech author, i’m trying to gauge how network size affects various parts and functions of MAAS. this will help me speak more directly to specific, machine-count-related issues with the doc. can you tell me, how big is your MAAS (machines, racks, whatever)?

Since 2017 we’ve had a single regiond+rackd KVM VM for deploying our Ceph + OpenStack cluster. That cluster is now four racks with 6 KVM Pod hypervisors running over 100 virtual machines (including dev and test) and about 100 baremetal nodes. Current count: 228 machines.

Recently we have installed a separate regiond outside these racks, in preparation for a second rackd to be installed in a new set of 12 racks being deployed this year. That will have some 250 baremetal nodes running Ceph, and 24 KVM Pod nodes running haproxy and Ceph monitors.

HTH,
Greg.

1 Like

thanks, @gregoryo2017, that’s a fantastic description, and a nice layout. helps me very much!