MAAS 2.8/stable named stops responding to .maas lookups.

Hi Maas’ers,

We’ve had 2.8 up and running on our live infrastructure for a week now, and today we found an issue with name resolution.

Oddly root resolution works fine “dig google.com”, but .maas and .maas-internal fails to return any records.

Perplexing, it is … as Yoda might say…

I checked the serial in each zone fine, and it’s being incremented when changes are made.

Adding a new domain also sees the files created, but once again no resolution.

Any suggestions gratefully received.

–J

We’ve still not quite managed to get to the root of this issue, I’m not.a bind expert and the generated files ‘look fine to me’ but we are still experiencing issues with dns lookups.

From a default deployment, all we’ve done so far to is to add a DHCP server to the 10.0.10.0/24 subnet.

Then we’ve added a new top level domain .acms, and added a test ‘A’ record.

So, I would expect to be able to ‘dig’ this address

.

But no, this does not seem to work.

So, next step is to look at the generated zone files…

; Zone file modified: 2020-07-27 19:02:08.563459.
$TTL 30
@ IN SOA acms. nobody.example.com. (
0000000026 ; serial
600 ; Refresh
1800 ; Retry
604800 ; Expire
30 ; NXTTL
)

@ 30 IN NS maas.
test-addr 30 IN A 10.0.10.2

Currently none of our 2.8 MAAS deployments seem to be working.

Any thoughts on how to debug this issue would be apprecitated.

–J

are you using systemd-resolved? Perhaps the DHCP has kicked in and reset resolved to use the built in server instead of MAAS’ named? Easy way to tell is to look to see who is listening on port 53.

I have two rack-controllers on my local network, and occasionally I see that one of them has stopped resolving what appear to be ‘internal’ domains in MAAS. I guess that confirms an issue, I have never managed to debug it beyond restarting the affected controller.

Hi Jeremy,

I’ve confirmed there’s only one named running and it’s the one from the maas snap.

Cheers,

–J

Thanks for posting, I was starting to think I was going nuts.

Oddly if the system is left alone for a few days, it’s starts to resolve again.

I’m starting to think it’s something to do with the serial’s on the SOA sections, but that’s just a hunch at this time.

The problem is that if the maas-internal address does not resolve, new machines can’t be deployed as they use the mass-internal domain to get packages etc.

I’ll keep working and see if I can get a complete picture of what’s going on…

–J

1 Like

Hi @joolski,

I face same problem cant resolve any internal address. Do you have working workaround for this issue?

BR,
Stefan

Hi Stefan,

We’ve not completely got to the bottom of this as yet, however we’ve found that restarting the mass supervisor seems to at least get resolution working.

systemctl restart snap.maas.supervisor

Out of interest, are you running maas on 18.04 or 20.02 ?

–Jools

Hi @joolski,

We are running MAAS on 18.04 from apt package not from snap. Huh we don’t have this service:
maas-dhcpd6.service maas-http.service maas-rackd.service maas-syslog.service
maas-dhcpd.service maas-proxy.service maas-regiond.service

BR,
Stefan

Hi @joolski,

I found my problem. We use external DHCP servers and there are still old deprecated DNS ip’s I change with correct ones and everything start working as usual.

BR,
Stefan

1 Like

Nice work, @tension183. Those are hard to spot, well done.

I don’t think the issue is external DNS or DHCP; I have this occasionally and I don’;t have either of those. MAAS handles all DNS and DHCP on my networks, so it has full responsibility for the issue.

@joolski, @sabdfl is right. could you file a big (if you haven’t already)?