I found a annoying issue with MaaS as a DNS server for the local .maas domain.
I have an example quite easy to reproduce :
I have 2 machines with 1 interface
this interface has the default untagged network used for PXE (subnet 10.0.0.0/24)
this interface also support a vlan network (let’s call it vlan1000, subnet 10.10.10.0/24)
pxe network is unconfigured after deployment, I don’t want this network to be up and configured once deployed, I only want an IP on the vlan1000 network.
an IP is set (auto) by MaaS for the vlan1000 network.
I deploy these 2 machines (machine1/machine2) and after a while, they are up and running with Ubuntu Focal 20.04.
I ssh machine1 and I do a “host machine2” and the IP I get is the IP that MaaS gave during PXE boot and not the IP the machine gets once deployed.
This is very annoying because, for example, here is the flow for one machine :
machine1 starts
machine1 gets an IP on the PXE network (10.0.0.150)
machine1 gets registered to MaaS DNS with IP 10.0.0.150
machine1 is configured with only 1 IP on vlan1000 (10.10.10.10), pxe subnet is unconfigured
machine1 is still configured with IP 10.0.0.150 in the MaaS DNS
after a while (maybe 10mn), the DNS cache is refreshed and the machine is registered with the good IP
This is annoying because this machine is not responding with its hostname by that time which can lead to strange behaviors during deployments with things like juju.
Just FYI, I reduced MaaS DNS TTL to the minimum (which is 1) but that doesn’t change anything, I have the impression that this 10mn delay is due to the DNS caching done at the client level by systemd-resolved
I also missing your DNS well ALL of your dns system. When US passed the bill where you cant run your closed off own nameservers and resolvers anymore without permission from a goverment agency. We do, and I need 4 nameservers and own resolvers, PowerDNS to be specific, any product you have that support it is Openstack but that isnt for bare metal servers exactly, its comes with a bunch of things we do not need. MAAS or Landscape should had the same support, Powerdns or knot dns with a web guide for it. Then all would been PERFECT!
I think you answered your own question @Hybrid512. The issue is not the DNS service, but the client’s cache which is not something that can be controlled by MAAS DNS or DHCP, or any other service, it is only configurable within your own servers’ resolver.
It would be helpful to understand what workflow is interrupted by this experience to help identify ways to resolve or work around it.
If you’re working on a pipeline that uses MAAS API and does resolution of host both before and after getting a final IP, your pipeline will need to flush your resolver’s cache.
Am I correct to assume that the issue does not happen on the built machine, but only on your controlling/operations terminal?
If this is happening on your newly deployed workload, please open a bug detailing a reproducer and all additional scripts and post-build cloud-init scripts you may be using during deployment.