MAAS 3.2 memory leak

I’m using deb package version 3.2.7 with region controller, rack controller and postgres on it’s own 20.04 virtual machine each.
After upgrading from 2.8 to 3.2.6(upgrading from 3.2.6 to 3.2.7 made no difference) I noticed that region and rack controllers started uncontrollably increasing their memory usage for ~1GB per day(or slower on my playground with less bare metal and virtual servers), until OOM-killer comes or System Administrator restarts it.
I tried to debug it myself with tracemalloc, pympler, etc but unsuccessfully.

1 Like

Don’t know if this is related, but I am experiencing similar problems. My MAAS machine has two AMD Epyc with 16C/32T, 128GiB of RAM and 8 TiB of SSD space so poor system performance cannot be the reason, and MAAS is getting slower and slower up to a point where the UI is totally unresponsive and rescue mode failing to boot. I just rebooted the machine which made the UI responsive again, now checking if rescue mode comes up.

You’ve noted a memory leak in MAAS 3.2 Deb install, just to clarify?

Have you seen this behaviour in an earlier/later install (ie: MAAS 3.3, MAAS 3.1), or with a different package manager (Snap)?
Might help to narrow down the source of the problem

I previously used MAAS 2.9 and went the update route through 3.0 and 3.1 to 3.2.6. With 2.9 I experienced no problems.

Installed via DEB.

You’ve noted a memory leak in MAAS 3.2 Deb install, just to clarify?

I’ve used 2.8 deb and then upgraded straight to 3.2.6 deb on same vm’s (along with release upgrade from 18.04 to 20.04) and then problem started.

By the way, I noticed interesting behavior:
after integration of tracemalloc directly in /usr/sbin/regiond via threading, memory leak… gone. But it was act of desperation and I can’t use it to resolve problem.

I would reccomend you file a bug report at https://bugs.launchpad.net/maas, we’re trying to collect bugs into one place. It seems odd if adding memory tracing in regiond removes the memory leak, but that’s another avenue to investigate of course.

As a note, both of you used deb, It may be this problem doesn’t exist on the snap (could either of you test that easily?).

For the reference: bug is created.
Thank you @german-m

@rabjen-iwes would be great if you can join into bug discussion on Launchpad and give us a bit more information about your setup.

On a side note, do you have MAAS monitoring enabled?