[MAAS 3.0] DHCPD Configuration file errors encountered

Hello folks!
(Edit: MAAS Version is snap-3.0.0-10029-g.986ea3e45)

On my MAAS staging environment I’m having some DHCPD issues with my rack controller. DHCPD service is bouncing with the following logspam:

2021-06-30T22:27:23Z dhcpd[50891]: Internet Systems Consortium DHCP Server 4.4.1
2021-06-30T22:27:23Z dhcpd[50891]: Copyright 2004-2018 Internet Systems Consortium.
2021-06-30T22:27:23Z dhcpd[50891]: All rights reserved.
2021-06-30T22:27:23Z dhcpd[50891]: For info, please visit https://www.isc.org/software/dhcp/
2021-06-30T22:27:23Z dhcpd[50891]: Config file: /var/snap/maas/common/maas/dhcpd6.conf
2021-06-30T22:27:23Z dhcpd[50891]: Database file: /var/snap/maas/common/maas/dhcp/dhcpd6.leases
2021-06-30T22:27:23Z dhcpd[50891]: PID file: /var/snap/maas/common/maas/dhcp/dhcpd6.pid
2021-06-30T22:27:23Z dhcpd[50891]: Wrote 0 deleted host decls to leases file.
2021-06-30T22:27:23Z dhcpd[50891]: Wrote 0 new dynamic host decls to leases file.
2021-06-30T22:27:23Z dhcpd[50891]: Wrote 0 NA, 0 TA, 0 PD leases to lease file.
2021-06-30T22:27:23Z dhcpd[50891]: Bound to *:547
2021-06-30T22:27:23Z dhcpd[50891]: Listening on Socket/7/eth0.11/vlan-5002
2021-06-30T22:27:23Z dhcpd[50891]: Sending on   Socket/7/eth0.11/vlan-5002
2021-06-30T22:27:23Z dhcpd[50891]: Server starting service.
2021-06-30T22:27:25Z dhcpd[50914]: Internet Systems Consortium DHCP Server 4.4.1
2021-06-30T22:27:25Z dhcpd[50914]: Copyright 2004-2018 Internet Systems Consortium.
2021-06-30T22:27:25Z dhcpd[50914]: All rights reserved.
2021-06-30T22:27:25Z dhcpd[50914]: For info, please visit https://www.isc.org/software/dhcp/
2021-06-30T22:27:25Z dhcpd[50914]: /var/snap/maas/common/maas/dhcpd.conf line 34: semicolon expected.
2021-06-30T22:27:25Z dhcpd[50914]:     address 2001:
2021-06-30T22:27:25Z dhcpd[50914]:                  ^
2021-06-30T22:27:25Z dhcpd[50914]: /var/snap/maas/common/maas/dhcpd.conf line 119: failover peer failover-vlan-5002: not found
2021-06-30T22:27:25Z dhcpd[50914]:               failover peer "failover-vlan-5002"
2021-06-30T22:27:25Z dhcpd[50914]:                              ^
2021-06-30T22:27:25Z dhcpd[50914]: Configuration file errors encountered -- exiting
2021-06-30T22:27:25Z dhcpd[50914]:
2021-06-30T22:27:25Z dhcpd[50914]: If you think you have received this message due to a bug rather
2021-06-30T22:27:25Z dhcpd[50914]: than a configuration issue please read the section on submitting
2021-06-30T22:27:25Z dhcpd[50914]: bugs on either our web page at www.isc.org or in the README file
2021-06-30T22:27:25Z dhcpd[50914]: before submitting a bug.  These pages explain the proper
2021-06-30T22:27:25Z dhcpd[50914]: process and the information we find helpful for debugging.
2021-06-30T22:27:25Z dhcpd[50914]:
2021-06-30T22:27:25Z dhcpd[50914]: exiting.

I had a go at disabling DHCP for that vlan, but got an error:

~$ maas $PROFILE vlan update fabric-0 11 dhcp_on=False
RackController matching query does not exist.

It looks like the rack thinks there’s some sort of HA on one of the vlans, but that’s not the case. Please can someone advise the correct way to rectify this? I’m not certain what caused this in the first instance, but I have been testing out different DHCP scenarios including IPV6 so it seems likely at some point that got stuck in some invalid state…

Hi @seffyroff, could you please paste the current content of /var/snap/maas/common/maas/dhcpd.conf ?

Also please paste the output of maas $PROFILE vlan read fabric-0 11

https://termbin.com/w4pk1

~$ maas $PROFILE vlan read fabric-0 11
Success.
Machine-readable output follows:
{
    "vid": 11,
    "mtu": 1500,
    "dhcp_on": true,
    "external_dhcp": null,
    "relay_vlan": null,
    "fabric_id": 0,
    "id": 5002,
    "space": "default",
    "fabric": "fabric-0",
    "secondary_rack": "agrktm",
    "primary_rack": "k7shwa",
    "name": "homelab",
    "resource_uri": "/MAAS/api/2.0/vlans/5002/"
}

Note that I did just add a second rack controller to see if I could alleviate the issue. It did not help.

could you also please attach maas $PROFILE subnets read output?

The error in dhcp config seems to be related to the IPv6 address in the failover peer declaration, trying to understand why it’s configured like this.

@seffyroff, i’m taking over for @ack this week. did you find this yet?

Hi @billwear! thanks for your persistence. I’ve grabbed that output here: maas $PROFILE subnets read.

@seffyroff, so when i look at your subnets output, i see one IPv4 CIDR with DHCP on: “name”: “10.0.11.0/24”, and 196 IPv6 CIDRs with DHCP on. just a quick perusal at the end of the day, so don’t rule out 6 o’clock stupid. but does that make sense in your network?

Hey @billwear it may well be that my approach of ‘turn it on and see what happens’ may require a little more thought than I had been led to believe. I shall take myself off to the IPV6 docs to see how to do it properly. If I have any grasp of this at all it’s that DHCP6 isn’t strictly necessary anyway so I should probably just turn that off. I’m definitely in the early stages of finding the edges of this. I appreciate your counsel. Having also recently been reminded that Juju doesn’t like IPV6 I might just put this on ice for a while.

As my approach when I get into a pickle is to repair rather than repave so I can gain some knowledge, I’ve removed IPV6 interfaces from the controllers. Those IPV6 subnets are still there, can I just delete the fabric and it’ll repop, or is there a straightforward way of manually cleaning up the subnets? I would guess perhaps a script that iterates over the subnets and deletes each offending one?

EDIT:
I wonder if using some logical grouping might help - I can add all the fabrics/vlans/subnets into one space, then delete anything that isn’t in that space - or perhaps add the ones I don’t want into another space and delete the whole space.

Or not… :slight_smile:

@seffyroff, i see i missed your questions. bleh, bad me. before diving in, did you figure out the answers?