BIND failing on MAAS 2.7, causing high CPU usage

Shortly after service start, BIND enters STARTED state, fails, then continually attempts to restart, causing high CPU usage (over 70% average on a four-core VM with E5-2630v3).

Supervisor-run log shows this continually:

2020-05-03 17:52:35,137 INFO spawned: ‘bind9’ with pid 373887
2020-05-03 17:52:36,552 INFO success: bind9 entered RUNNING state, process has stayed up for > than 1 seconds (start
secs)
2020-05-03 17:52:36,552 INFO exited: bind9 (terminated by SIGKILL; not expected)
2020-05-03 17:52:36,556 INFO spawned: ‘bind9’ with pid 373909
2020-05-03 17:52:38,038 INFO success: bind9 entered RUNNING state, process has stayed up for > than 1 seconds (start
secs)
2020-05-03 17:52:38,038 INFO exited: bind9 (terminated by SIGKILL; not expected)
2020-05-03 17:52:38,042 INFO spawned: ‘bind9’ with pid 373928
2020-05-03 17:52:39,437 INFO success: bind9 entered RUNNING state, process has stayed up for > than 1 seconds (start
secs)
2020-05-03 17:52:39,437 INFO exited: bind9 (terminated by SIGKILL; not expected)

MAAS status shows:
sudo maas status
bind9 STARTING
dhcpd RUNNING pid 370435, uptime 0:07:29
dhcpd6 STOPPED Not started
http RUNNING pid 370482, uptime 0:07:25
ntp RUNNING pid 370301, uptime 0:07:36
postgresql RUNNING pid 370127, uptime 0:07:45
proxy RUNNING pid 370310, uptime 0:07:35
rackd RUNNING pid 370129, uptime 0:07:45
regiond RUNNING pid 370131, uptime 0:07:45
syslog RUNNING pid 370289, uptime 0:07:36

I’ve re-installed on clean systems multiple times with same issue. Only way to prevent issue is to disable “Authoritative” for the maas domain. This isn’t much of an issue for me as I’m using a separate DNS server anyway.

@zrsolis, did you install from snap or the debian package?

Installed it from Snap

Hi,

I noticed the same behaviour on my region server

  • Ubuntu 18.04 (VM)
  • MaaS 2.7.0 (snap install)

This is an extract from my /var/snap/maas/common/log/supervisor-run.log:

2020-05-05 15:15:15,352 INFO spawned: 'bind9' with pid 8827
2020-05-05 15:15:17,253 INFO success: bind9 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-05-05 15:15:18,616 INFO exited: bind9 (terminated by SIGKILL; not expected)
2020-05-05 15:15:18,633 INFO spawned: 'bind9' with pid 8848
2020-05-05 15:15:19,996 INFO success: bind9 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-05-05 15:15:22,072 INFO exited: bind9 (terminated by SIGKILL; not expected)
2020-05-05 15:15:22,110 INFO spawned: 'bind9' with pid 8877
2020-05-05 15:15:23,116 INFO success: bind9 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-05-05 15:15:26,691 INFO exited: bind9 (terminated by SIGKILL; not expected)

This is an extract from /var/snap/maas/common/log/regiond.log:

2020-05-05 13:27:06 maasserver.region_controller: [critical] Failed configuring DNS; killing and restarting
        Traceback (most recent call last):
          File "/snap/maas/5229/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 459, in callback
            self._startRunCallbacks(result)
          File "/snap/maas/5229/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks
            self._runCallbacks()
          File "/snap/maas/5229/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/snap/maas/5229/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1532, in unwindGenerator
            return _inlineCallbacks(None, gen, Deferred())
        --- <exception caught here> ---
          File "/snap/maas/5229/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
            result = g.send(result)
          File "/snap/maas/5229/lib/python3.6/site-packages/maasserver/region_controller.py", line 224, in _checkSerial
            "Failed to reload DNS; timeout or rdnc command failed."
        maasserver.region_controller.DNSReloadError: Failed to reload DNS; timeout or rdnc command failed.

This is an extract from /var/snap/maas/common/log/named.log:

05-May-2020 14:59:32.724 starting BIND 9.11.3-1ubuntu1.11-Ubuntu (Extended Support Version) <id:a375815>
05-May-2020 14:59:32.724 running on Linux x86_64 4.15.0-99-generic #100-Ubuntu SMP Wed Apr 22 20:32:56 UTC 2020
05-May-2020 14:59:32.724 built with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--libexecdir=/usr/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--enable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=/usr' '--with-libjson=/usr' '--without-lmdb' '--with-gnu-ld' '--with-geoip=/usr' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib/softhsm/libsofthsm2.so' '--with-randomdev=/dev/urandom' '--with-eddsa=no' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-uW3Pyl/bind9-9.11.3+dfsg=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
05-May-2020 14:59:32.724 running as: named -c /var/snap/maas/5229/bind/named.conf -g
05-May-2020 14:59:32.724 ----------------------------------------------------
05-May-2020 14:59:32.724 BIND 9 is maintained by Internet Systems Consortium,
05-May-2020 14:59:32.724 Inc. (ISC), a non-profit 501(c)(3) public-benefit
05-May-2020 14:59:32.724 corporation.  Support and training for BIND 9 are
05-May-2020 14:59:32.724 available at https://www.isc.org/support
05-May-2020 14:59:32.724 ----------------------------------------------------
05-May-2020 14:59:32.724 found 4 CPUs, using 4 worker threads
05-May-2020 14:59:32.724 using 3 UDP listeners per interface
05-May-2020 14:59:32.769 using up to 4096 sockets
05-May-2020 14:59:32.818 loading configuration from '/var/snap/maas/5229/bind/named.conf'
05-May-2020 14:59:32.819 reading built-in trust anchors from file '/snap/maas/current/etc/bind/bind.keys'
05-May-2020 14:59:32.820 GeoIP Country (IPv4) (type 1) DB not available
05-May-2020 14:59:32.820 GeoIP Country (IPv6) (type 12) DB not available
05-May-2020 14:59:32.820 GeoIP City (IPv4) (type 2) DB not available
05-May-2020 14:59:32.820 GeoIP City (IPv4) (type 6) DB not available
05-May-2020 14:59:32.820 GeoIP City (IPv6) (type 30) DB not available
05-May-2020 14:59:32.820 GeoIP City (IPv6) (type 31) DB not available
05-May-2020 14:59:32.820 GeoIP Region (type 3) DB not available
05-May-2020 14:59:32.820 GeoIP Region (type 7) DB not available
05-May-2020 14:59:32.820 GeoIP ISP (type 4) DB not available
05-May-2020 14:59:32.820 GeoIP Org (type 5) DB not available
05-May-2020 14:59:32.820 GeoIP AS (type 9) DB not available
05-May-2020 14:59:32.820 GeoIP Domain (type 11) DB not available
05-May-2020 14:59:32.820 GeoIP NetSpeed (type 10) DB not available
05-May-2020 14:59:32.820 using default UDP/IPv4 port range: [32768, 60999]
05-May-2020 14:59:32.820 using default UDP/IPv6 port range: [32768, 60999]
05-May-2020 14:59:32.822 listening on IPv6 interfaces, port 53
05-May-2020 14:59:32.848 binding TCP socket: address in use
05-May-2020 14:59:32.848 listening on IPv4 interface lo, 127.0.0.1#53
05-May-2020 14:59:32.851 binding TCP socket: address in use
05-May-2020 14:59:32.851 listening on IPv4 interface ens160, 192.168.92.14#53
05-May-2020 14:59:32.866 binding TCP socket: address in use
05-May-2020 14:59:32.867 listening on IPv4 interface virbr0, 192.168.122.1#53
05-May-2020 14:59:32.873 binding TCP socket: address in use
05-May-2020 14:59:32.877 unable to listen on any configured interfaces
05-May-2020 14:59:32.878 loading configuration: failure
05-May-2020 14:59:32.878 exiting (due to fatal error)

These messages are printed continuosly, every time named is restarted. Cannot tell what’s happening. Apparently they first appeared yesterday. I’m still investigating.

Regards,

)+(auloop

Hi, zrsolis.

I solved my issue, which was caused by some mistakes I made. Don’t know if it could apply to your case too, but my suggestion is to take a look at named.log (which, for snap-based environment is located at /var/snap/maas/common/log/named.log).

For me, it was the place where I found a clue about what was happening:

05-May-2020 14:59:32.848 binding TCP socket: address in use
05-May-2020 14:59:32.848 listening on IPv4 interface lo, 127.0.0.1#53
05-May-2020 14:59:32.851 binding TCP socket: address in use
05-May-2020 14:59:32.851 listening on IPv4 interface ens160, 192.168.92.14#53
05-May-2020 14:59:32.866 binding TCP socket: address in use
05-May-2020 14:59:32.867 listening on IPv4 interface virbr0, 192.168.122.1#53
05-May-2020 14:59:32.873 binding TCP socket: address in use
05-May-2020 14:59:32.877 unable to listen on any configured interfaces

This was because, while troubleshooting another issue, I tried to install the regular apt package for maas-region. Therefore a second bind9 instance has been installed on standard paths and the first time I rebooted my region server, this instance started up first and took control of tcp port 53.

In synthesis, I did a very stupid thing and probably is not the same for you. Anyway to take a look at named.log could be a good starting point for your troubleshooting too.

Good luck,

)+(

For anyone who may have been banging their head against the wall with this same issue, the root cause ended up being that the link-layer scan on one interface detected interfaces with a subnet range that overlapped with the primary interface used on the MAAS rack+region controller. Changing the IP ranges between the interfaces so they would no longer overlap was the only way to get it to stop, otherwise the DNS service would flap so much it would eventually fill the file system and stop everything from running (I did eventually add a logrotate rule).

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.