Auto assign IP not working

gregoryo2017 · 20 May 2021 09:23

Using MAAS 2.9.2 deb, deploying a machine - physical or virtual - is no longer working. It pops back to Allocated with Failed to allocate the required AUTO IP addresses. There are about 100 IPs available. Logs show this:

Next IP address to allocate from '146.118.52.0/23' has been observed previously: 146.118.53.212 was last claimed by e4:1f:13:81:6e:7c via ens3 (physical) on maas-nimbus-rackd at 2019-06-29 06:23:02.807641.

	  File "/usr/lib/python3/dist-packages/maasserver/models/node.py", line 4515, in _claim_auto_ips
	    raise StaticIPAddressExhaustion(
	maasserver.exceptions.StaticIPAddressExhaustion: Failed to allocate the required AUTO IP addresses

This looks a bit like https://bugs.launchpad.net/maas/+bug/1904810

I found 146.118.53.212 in the database, maasserver_neighbour table:

2227    2019-02-05 01:53:23.261001+00   2019-06-29 06:23:02.807641+00   146.118.53.212  1561789382      \N      137     e4:1f:13:81:6e:7c       6370

I deleted that entry, and got the same error but without the “has been observed previously” mention.

Some part of the subnet are allocated to Dynamic pools to support old nodes deployed with DHCP, which we are phasing out as we redeploy everything to Auto assign. The last time I got this error, I deleted a Dynamic range, and it started working again.

Can anyone suggest what to do next?

Thanks,
Greg.

ack · 20 May 2021 14:15

You’re likely hitting https://bugs.launchpad.net/maas/2.9/+bug/1902425, which will be fixed in the next 2.9 point release.

gregoryo2017 · 24 May 2021 01:56

Thank you. Is there a schedule for that release? Is there a general way to look at point release schedules?

Thanks,
Greg.

bjornt · 25 May 2021 14:48

Sorry, we don’t have a schedule at the moment. We’re focused on getting 3.0 out, which should be done in a couple of weeks. After that we’ll work on getting a new version of 2.9 out.

gregoryo2017 · 26 May 2021 00:39

Okay, thank you for the update. Can you tell me where to watch for version announcements? A feed (email, discourse…) to check or subscribe to?

gregoryo2017 · 27 May 2021 03:16

Actually I should also ask - is the bug fixed in 3.0? We may look at upgrading to that as a matter of course, but also motivated by the bug fix if it’s there!

bjornt · 27 May 2021 08:25

Yes, the bug is fixed in 3.0. So if possible, upgrading to 3.0 when it’s out would be the quickest fix.

We announce our releases here on discourse in the News section: https://discourse.maas.io/c/news/7

gregoryo2017 · 27 May 2021 14:56

Okay thanks. I’ve found the alert button so I’ll get a notification.

billwear · 1 June 2021 16:56

@gregoryo2017, as the MAAS tech author, i’m trying to gauge how network size affects various parts and functions of MAAS. this will help me speak more directly to specific, machine-count-related issues with the doc. can you tell me, how big is your MAAS (machines, racks, whatever)?

gregoryo2017 · 2 June 2021 00:56

Since 2017 we’ve had a single regiond+rackd KVM VM for deploying our Ceph + OpenStack cluster. That cluster is now four racks with 6 KVM Pod hypervisors running over 100 virtual machines (including dev and test) and about 100 baremetal nodes. Current count: 228 machines.

Recently we have installed a separate regiond outside these racks, in preparation for a second rackd to be installed in a new set of 12 racks being deployed this year. That will have some 250 baremetal nodes running Ceph, and 24 KVM Pod nodes running haproxy and Ceph monitors.

HTH,
Greg.

billwear · 2 June 2021 15:03

thanks, @gregoryo2017, that’s a fantastic description, and a nice layout. helps me very much!

cedric-lemarchand · 22 July 2021 12:02

It seems the bug is still there in 3.0 (snap 3.0.0-10029-g.986ea3e45), deploying a VM with 2 interfaces (one in DHCP, and one or more in AUTO IP) fails.

Post upgrade, is there any actions needed to fix this ?

billwear · 10 August 2021 17:48

@cedric-lemarchand, could you please add a comment to the bug? if this didn’t get fixed, the engineering team needs to get bugged again!

cedric-lemarchand · 11 August 2021 09:23

I cannot explain how this inconsistency had happens, but it seems MaaS were thinking that a couple of IP were still allocated to some machines, which were not match with the reality.

I manage to solve the issue by manually removing these IP from the DB.

Cheers

billwear · 11 August 2021 14:16

cool. networking is still a dark art, even for the most experienced practitioners.

system · 13 August 2021 14:16

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.