How best to use MAAS with overlapping subnets?

I have a subnet, lets call it 172.1.2.0/24 that my company has a hard requirement that we will have multiple clusters of machines using that same subnet.

For clarity, imagine the following …

cluster 1 - 172.1.2.0/24

... machine-1 => 172.1.2.5
... machine-2 => 172.1.2.6
... more ... etc ....

cluster 2 - 172.1.2.0/24 (same network as cluster 1)

... machine-3 => 172.1.2.5  (same ip as machine-1 above)
... machine-4 => 172.1.2.6  (same ip as machine-2 above)
... more ... etc ....

cluster 3 - …same…

... same same ...

and this is all normal and desirable; says us :).
Some important points (restated in some case for clarity) …

  1. We are NOT using DHCP from MAAS. We have DHCP provided by in-place network hardware.
  2. We expect overlapping (or “duplicated” if you prefer) networks 172.1.2.0/24
  3. We expect our machines to have static ip’s assigned by our “just-in-time” cloud-init script provided at maas “deploy” time.

Now, my question/concern is motivated by my observation about table maasserver_staticipaddress which has two uniqueness “contraints” expressed as
1.

constraint maasserver_staticipaddress_alloc_type_ip_8274db4c_uniq
        unique (alloc_type, ip)

and 2.

create unique index maasserver_staticipaddress_discovered_uniq
    on maasserver_staticipaddress (ip)
    where (NOT (alloc_type = 6));

When I brought this up in another discussion with @r00ta, the suggestion was to set the IP mode of the boot interface of the machine to DHCP
Now, I did a test with two machines, one in each network (same network 172.1.2.0/24) and I’m concerned that this doesn’t resolve the potential issue.

  1. I deleted the machines (clear out the region to start fresh).
  2. I discovered both machines (interfaces end up in auto by default and populate maasserver_staticipaddress with two addresses in 172.1.2.0/24)
    1. one machine came up with ip ending in 250, the other 233
    2. maasserver_staticipaddress table has a row for both of those two addresses
  3. I change both interfaces from auto to dhcp as indicated in Custom cloud-init network configuration examples please? - #7 by alfred-stokespace
  4. I performed a commission of both (did not check “preserve network”)
    1. no change in machine ips, one ip ending in 250, the other 233
    2. maasserver_staticipaddress table has a row for both of those two addresses (no change)
  5. Again, changed both interfaces from auto to dhcp
  6. Deployed to both machines with our custom cloud-init (ie. w/static ip netplan assignments)
  7. I examine maasserver_staticipaddress table and the same two ip addresses 250 and 233 are still present.

And finally, my question…

If I keep discovering/commissioning/deploying new machines, will I eventually reach a failure point due to conflicts in maasserver_staticipaddress?

How should I be using MAAS when I have these multiple overlapping subnets?

As a test of my concern stated above

If I keep discovering/commissioning/deploying new machines, will I eventually reach a failure point due to conflicts in maasserver_staticipaddress...

I constructed the following test…

Both clusters have existing network hardware providing DHCP service independent of MAAS. So I configured each clusters’ dhcp pool to have a single address available 172.1.2.5

cluster 1 - 172.1.2.0/24

... machine-1 => 172.1.2.5

cluster 2 - 172.1.2.0/24

... machine-2 => 172.1.2.5

and made sure to delete all existing references in MAAS (as well as removed all existing leases in networking appliances)

I turned on these machines in a one-at-a-time fashion and allowed them to complete their discovery phase.

What I observed, is that the ip address moved between the two machines.
The maasserver_staticipaddress table just maintained the same entry, no change that I could detect.

So this test seems to suggest that there won’t be a hard failure as we grow.

Our process for deploying will include a custom netplan yaml artifact provided via cloud-init that forces machines to static ips. That will leave machine entries in maas with outdated ip entries, but those ip entries will “hop” around to whatever the newly discovered machine is.

If I’m correct, then perhaps there isn’t an issue here. I think this works.

It’d would be great to get some official confirmation from engineers/support.

another update…

I tried “commissioning” both machine-1 and machine-2 from the same setup above at exactly the same time.

The result was that one machine completed commissioning and one went back into “New” status. I then subsequently put that failed machine back through commissioning and the ip address swapped back to it and commissioning succeeded.

So,… it would seem that there is a negative consequence. But this might be acceptable. We can leave open a wide enough range in duplicated networks that collisions are unlikely across networks and possibly-additionally-or-alternatively set the expectation within the company that won’t allow simultaneous maas workflow activities crossing duplicated networks.

Also, I confirmed that the row in maasserver_staticipaddress is actually changing. It’s id and created/updated date changes. So the logic seems to be …

  1. query maasserver_staticipaddress for ip
  2. if already exists for different machine delete row
  3. insert new row for ip related to the current machine.

Until it’s not MAAS that is assigning those IPs to the machines there will be no duplicated-key errors. In such scenario MAAS is just observing that the IP is moved from one machine to another (or even to another device that is not managed by MAAS - in that case you would see it under the network discovery page).

AFAIU you are not using MAAS DHCP and IP management at all, so in theory you should not get any error regarding the IPs.

The result was that one machine completed commissioning and one went back into “New” status. I then subsequently put that failed machine back through commissioning and the ip address swapped back to it and commissioning succeeded.

I’d say this is expected as you have just one ip address and it was already assigned to another machine. The result is that the second machine fails to get an IP from your external dhcp server