On a split region/rack snap deployment 3.7.1, we’ve noticed a recent phenomenon whereby occasionally the rack controller will have RAM utilization at nearly 100% of 16GB, almost entirely consumed by the twenty /snap/maas/41216/usr/sbin/maas-agent processes. During these instances, the rack controller is effectively dead and stops servicing DHCP requests. Restarting the maas snap on the rack controller resolves the issue for muliple days, at which time we will inevitably notice the symptoms again. This does not appear to be a gradual creep, rather it seemingly happens within minutes to hours. Working on gathering further data and logs at this point.
Hi @stevenk
From which version did you upgrade from? I’d be curious to know if you will find anything in the logs.
FTR we have two similar bugs reported recently, maybe it is related to your case as well?
Good to know, I had forgotten to check the bug reports specifically, thanks! I had upgraded from the 3.7.0 snap to 3.7.1. The description of those bugs sounds like the behavior I am experiencing as well.
Just a small update:
We were able to observe a similar behaviour in our environment. There is no stable reproducer yet, but we are looking into the issue.
I hit this issue as well. Looks like this is pending to be fixed on 3.7.2 here: Bug #2142793 “maasagent memory leak on DHCP expiry failure due t...” : Bugs : MAAS
This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.