I’ve recently switched to use MAAS 2.9 due to MAAS changing boot order. I’ve noticed however that after sime time of correct oprtation MAAS managed bind crashes wit following log message:
28-Oct-2020 13:24:54.653 uv_export failed: permission denied
28-Oct-2020 13:24:54.653 uv_export failed: permission denied
28-Oct-2020 13:24:54.653 listening on IPv6 interface vnet8, fe80::fc54:ff:fe03:7f67%28#53
28-Oct-2020 13:24:54.673 udp.c:83: INSIST(csock->fd >= 0) failed, back trace
28-Oct-2020 13:24:54.673 #0 0x555e9f8c2e43 in ??
28-Oct-2020 13:24:54.673 #1 0x7fc90796eac0 in ??
28-Oct-2020 13:24:54.673 #2 0x7fc90798bf4d in ??
28-Oct-2020 13:24:54.673 #3 0x7fc907c5f82b in ??
28-Oct-2020 13:24:54.673 #4 0x7fc907c605d0 in ??
28-Oct-2020 13:24:54.673 #5 0x7fc907c60c1e in ??
28-Oct-2020 13:24:54.673 #6 0x555e9f8e0a6b in ??
28-Oct-2020 13:24:54.673 #7 0x555e9f8e406e in ??
28-Oct-2020 13:24:54.673 #8 0x7fc907995fe1 in ??
28-Oct-2020 13:24:54.673 #9 0x7fc90745e609 in ??
28-Oct-2020 13:24:54.673 #10 0x7fc90737f293 in ??
28-Oct-2020 13:24:54.673 exiting (due to assertion failure)
It happens on all 3 machines I have my MAAS cluster on.
MAAS version is: 2.9.0~beta5 (9002-g.2a342196f) (snap)
Aditionally I’ve noticed that named is complaining about max open files limit (it is set on OS level to 65535 but SNAP seems to have it’s own limit of 4096):
28-Oct-2020 12:34:37.872 max open files (4096) is smaller than max sockets (21000)
It seems that bind has changed max-socket settings: https://kb.isc.org/docs/aa-01314 ->
2. ISC_SOCKET_MAXSOCKETS changed from 4096 to 21000
The SNAP profile however still keeps 4096:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size unlimited unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 1543416 1543416 processes
Max open files 4096 4096 files
Max locked memory 16777216 16777216 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 1543416 1543416 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
That poses a problem when named tries to start up.
In this case, when upstream DNS was temporarily unreachable, stalled forwarded queries used up all 4096 available sockets. Named however expected that 21k sockets are available and crashed while trying to allocate more sockets.
Workaround: systemctl edit snap.maas.supervisor.service and put:
LimitNOFILE=65535
LimitNOFILESoft=65535
That ensures that named will be able to allocate all expected 21k sockets