HA DHCP configuration broke MaaS

Hey there,

I enabled High Availability DHCP on my 2 rack and region+rack controllers, 192.168.20.2 and 192.168.5.2
and for some reason I am unable to figure out why exactly, my DHCP services are no longer working and fail to restart.

I have attempted to sudo snap restart maas.supervisor and sudo maas init region+rack... and sudo maas init rack...
None of these worked. I have attached the log files below, kindly let me know if I am missing any crucial bits of information.

CloudOperator.maas - 192.168.20.2 - region+rack controller
/var/snap/maas/common/log/maas.log
/var/snap/maas/common/log/dhcpd.log
/var/snap/maas/common/log/rackd.log
/var/snap/maas/common/log/regiond.log
/var/log/syslog


Ubuntu18S3.maas - 192.168.5.2 - rack controller
/var/snap/maas/common/log/maas.log
/var/snap/maas/common/log/dhcpd.log
/var/snap/maas/common/log/rackd.log
/var/log/syslog

From my understanding the rack controller (Ubuntu18S3.maas) should be on VLAN5 as it has IP address of 192.168.5.2 but the MaaS UI is showing VLAN20. I think the HA-DHCP configuration must have messed it up but my assumption may be proven wrong here.

I have pasted the subnets and VLANS here as well:

Not sure why so many IPV6 subnets show up but I only marked the VLANS as tagged in order to ensure MAAS-provided DHCP would occur.

I am unable to commission any of my devices as none of them are currently receiving the DHCP offer. I would appreciate any insight from members of this community. Thank you for your time and attention.

I attempted to manually modify the dhcpd.conf file since that is what the error logs point to:

Internet Systems Consortium DHCP Server 4.4.1 Copyright 2004-2018 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ /var/snap/maas/common/maas/dhcpd.conf line 35: semicolon expected. peer address fd8a: ^ /var/snap/maas/common/maas/dhcpd.conf line 120: failover peer failover-vlan-5006: not found failover peer "failover-vlan-5006" ^ Configuration file errors encountered -- exiting

But somehow it seems the file is in active use so after every change I make, the file is overwritten back to its present unmodified state.

I’m not entirely sure if I should attempt to pause the snap maas.supervisor or something in order to reset the configuration file. It would be nice if I didn’t have to reconfigure my MaaS cluster from scratch just to overcome this error. Any tips on how this could be investigated further would be greatly appreciated.

Hey there, @ryzengrind!

Based on the information you provided, it seems that you’re experiencing issues with the DHCP services on your MAAS controllers after enabling High Availability (HA) DHCP. Additionally, you mentioned that modifying the dhcpd.conf file doesn’t persist, and you’re unsure about how to investigate the issue further without reconfiguring your entire MAAS cluster.

You said, “any tips”, so here are a few suggestions to (hopefully?) help you troubleshoot the issue:

  1. Snap services: You mentioned that you attempted to restart the maas.supervisor service, but it didn’t resolve the problem. You can try stopping the service and then modifying the dhcpd.conf file. Once the modifications are made, start the service again. Use the following commands:
sudo snap stop maas.supervisor
sudo nano /var/snap/maas/common/maas/dhcpd.conf
sudo snap start maas.supervisor

This will stop the MAAS supervisor, allow you to modify the file, and then start the supervisor again. Make sure to backup the original dhcpd.conf file before making any changes.
2. File ownership and permissions: Ensure that the dhcpd.conf file is owned by the correct user and group. You can use the following command to check ownership if necessary:

ls -lsa /var/snap/maas/common/maas/dhcpd.conf

On my system, ownership is root:root, and everything works fine. I’m not actually sure if there was a period of time when that ownership should have been maas:maas, based on faulty human memory.

  1. Check MAAS logs: Review the MAAS logs for any additional error messages or warnings that could help pinpoint the issue. You mentioned the following log files: maas.log, dhcpd.log, rackd.log, regiond.log, and /var/log/syslog. Examine these logs for any relevant error messages or warnings related to DHCP.
  2. Verify configuration changes: Double-check any recent configuration changes you made related to HA DHCP. Ensure that the syntax and formatting of the dhcpd.conf file are correct, including any failover peer definitions. The error message you shared indicated a syntax issue with a missing semicolon and a failover peer not found.
  3. Consider MAAS reinstall: If the above steps don’t resolve the issue, you may need to consider reinstalling MAAS. However, before doing so, make sure to back up your MAAS configuration and database to avoid losing any important data.

If none of these steps resolve the issue, it might be helpful to share your log files and other information with us, so we can get a deeper perspective on the problem. Best of luck in resolving the DHCP issue in your MAAS cluster!