Updated to 2.6 from 2.5 now UI slow and not usable

Any idea why after the upgrade from 2.5 to 2.6 the UI now is unusable? No logs whatsoever or errors. Postgres every time I go on the machines tab, goes 100% and it takes 2/3 minutes to load ~200 nodes.

Is there any db configuration I should do after the upgrade? The documentation on the upgrade is not existent. If not can anyone point me to what could be the cause of this issue?

MAAS is on a vm, 16 cores and 16gb of ram.


same here, it’s at leasr 10x slower than before

Sorry to hear you are having issues with UI in 2.6. We will work on improvements for the point release.

Could you please switch grouping to none in the machine listing view and report back if the performance returns to previous level?

Doesn’t help, loading machine still very slow, and even after machines fully loaded, it’s still very slow to view single machine

also switching between different view will reload full machine list every time…

Are the machines being loaded in batches, or are you seeing nothing at all until all 200 nodes have been loaded?

It’s useful for us to know how many ~seconds until the table first appears, and then how many seconds until all 200 machines are loaded. Many thanks.

1 Like

While testing locally I’m seeing 25 machines loading in batches. It took ~30 seconds to load 240 machines and once loaded a UI is responsive.

Unfortunately the problem does not solve for us.

  • The machines are loaded in batches of 25. The first 25 are very, very slow to load (around 2/3 minutes). The rest 175 they load in 30/45 seconds.
  • Removing group does not help, also, once leaving the machine page the grouping resets to the default group by status.
  • Being stupid I did not backup the DB before the upgrade. Is there a way to SAFELY revert to 2.5? We absolutely cannot wait for a point release patch.

I hate to post anywhere just to say “me too” but this is one of those times. Loading 400-some machines takes several minutes now. 2.5 was much faster, and I’m interested in rolling back, but the stable PPA repo only has 2.6 packages on it now.

We’re also having trouble PXE booting on 2.6 due to the changes required for UEFI and our switch firewall ACLs, so this is turning into a mess if I can’t undo it or fix it soon.

Thanks for reporting this and sorry you hit this, it’s something we’re actively looking in to.

I have filed this as https://bugs.launchpad.net/maas/+bug/1835199, please keep ‘me too’ comments to the appropriate part of Launchpad (i.e. https://bugs.launchpad.net/maas/+bug/1835199/+affectsmetoo )

We are looking into a 2.6.1 point release for this.

Thanks for your reply @sparkiegeek . At the same time, a point release in my experience could take weeks. We need a method to revert from 2.6 to 2.5. is that even possible?

If a downgrade is not doable, could you let us know when you have a patch that could at least mitigate the issue?

Simply restore from your backup that you took before upgrading. Recognising that you admit you didn’t make such a backup (eek!) there’s no documented way of downgrading, and I am not familiar with a safe path (for example, there have been database migrations that are one way transformations).

Your best bet is to subscribe to the bug and follow activity there

any updates on those issues? (really hope the fix release could coming out soon

Seems this one only target frontend issue: https://bugs.launchpad.net/maas/+bug/1835199
however we also seen 100% cpu in postgres query as reported in https://bugs.launchpad.net/maas/+bug/1835316

I’ve caught some insane query like:

SELECT “maasserver_node”.“id”, “maasserver_node”.“created”, “maasserver_node”.“updated”, “maasserver_node”.“system_id”, “maasserver_node”.“hardware_uuid”, “maasserver_node”.“hostname”, “maasserver_node”.“description”, “maasserver_node”.“pool_id”, “maasserver_node”.“domain_id”, “maasserver_node”.“address_ttl”, “maasserver_node”.“status”, “maasserver_node”.“previous_status”, “maasserver_node”.“status_expires”, “maasserver_node”.“owner_id”, “maasserver_node”.“bios_boot_method”, “maasserver_node”.“osystem”, “maasserver_node”.“distro_series”, “maasserver_node”.“architecture”, “maasserver_node”.“min_hwe_kernel”, “maasserver_node”.“hwe_kernel”, “maasserver_node”.“node_type”, “maasserver_node”.“parent_id”, “maasserver_node”.“agent_name”, “maasserver_node”.“error_description”, “maasserver_node”.“zone_id”, “maasserver_node”.“cpu_count”, “maasserver_node”.“cpu_speed”, “maasserver_node”.“memory”, “maasserver_node”.“swap_size”, “maasserver_node”.“bmc_id”, “maasserver_node”.“instance_power_parameters”, “maasserver_node”.“power_state”, “maasserver_node”.“power_state_queried”, “maasserver_node”.“power_state_updated”, “maasserver_node”.“last_image_sync”, “maasserver_node”.“token_id”, “maasserver_node”.“error”, “maasserver_node”.“netboot”, “maasserver_node”.“ephemeral_deploy”, “maasserver_node”.“license_key”, “maasserver_node”.“creation_type”, “maasserver_node”.“boot_interface_id”, “maasserver_node”.“boot_cluster_ip”, “maasserver_node”.“boot_disk_id”, “maasserver_node”.“gateway_link_ipv4_id”, “maasserver_node”.“gateway_link_ipv6_id”, “maasserver_node”.“default_user”, “maasserver_node”.“install_rackd”, “maasserver_node”.“install_kvm”, “maasserver_node”.“enable_ssh”, “maasserver_node”.“skip_bmc_config”, “maasserver_node”.“skip_networking”, “maasserver_node”.“skip_storage”, “maasserver_node”.“url”, “maasserver_node”.“dns_process_id”, “maasserver_node”.“managing_process_id”, “maasserver_node”.“current_commissioning_script_set_id”, “maasserver_node”.“current_installation_script_set_id”, “maasserver_node”.“current_testing_script_set_id”, “maasserver_node”.“locked”, (SELECT U2.“description” FROM “maasserver_event” U0 INNER JOIN “maasserver_eventtype” U2 ON (U0.“type_id” = U2.“id”) WHERE (U0.“node_id” = (“maasserver_node”.“id”) AND U2.“level” >= $1) ORDER BY U0.“created” DESC, U0.“id” DESC LIMIT $2) AS “status_event_type_description”, (SELECT U0.“description” FROM “maasserver_event” U0 INNER JOIN “maasserver_eventtype” U2 ON (U0.“type_id” = U2.“id”) WHERE (U0.“node_id” = (“maasserver_node”.“id”) AND U2.“level” >= $3) ORDER BY U0.“created” DESC, U0.“id” DESC LIMIT $4) AS “status_event_description”, “maasserver_domain”.“id”, “maasserver_domain”.“created”, “maasserver_domain”.“updated”, “maasserver_domain”.“name”, “maasserver_domain”.“authoritative”, “maasserver_domain”.“ttl”, “auth_user”.“id”, “auth_user”.“password”, “auth_user”.“last_login”, “auth_user”.“is_superuser”, “auth_user”.“username”, “auth_user”.“first_name”, “auth_user”.“last_name”, “auth_user”.“email”, “auth_user”.“is_staff”, “auth_user”.“is_active”, “auth_user”.“date_joined”, “maasserver_zone”.“id”, “maasserver_zone”.“created”, “maasserver_zone”.“updated”, “maasserver_zone”.“name”, “maasserver_zone”.“description”, “maasserver_bmc”.“id”, “maasserver_bmc”.“created”, “maasserver_bmc”.“updated”, “maasserver_bmc”.“bmc_type”, “maasserver_bmc”.“ip_address_id”, “maasserver_bmc”.“power_type”, “maasserver_bmc”.“power_parameters”, “maasserver_bmc”.“name”, “maasserver_bmc”.“architectures”, “maasserver_bmc”.“capabilities”, “maasserver_bmc”.“cores”, “maasserver_bmc”.“cpu_speed”, “maasserver_bmc”.“memory”, “maasserver_bmc”.“local_storage”, “maasserver_bmc”.“local_disks”, “maasserver_bmc”.“iscsi_storage”, “maasserver_bmc”.“pool_id”, “maasserver_bmc”.“zone_id”, “maasserver_bmc”.“tags”, “maasserver_bmc”.“cpu_over_commit_ratio”, “maasserver_bmc”.“memory_over_commit_ratio”, “maasserver_bmc”.“default_storage_pool_id”, “maasserver_bmc”.“default_macvlan_mode” FROM “maasserver_node” LEFT OUTER JOIN “maasserver_domain” ON (“maasserver_node”.“domain_id” = “maasserver_domain”.“id”) LEFT OUTER JOIN “auth_user” ON (“maasserver_node”.“owner_id” = “auth_user”.“id”) INNER JOIN “maasserver_zone” ON (“maasserver_node”.“zone_id” = “maasserver_zone”.“id”) LEFT OUTER JOIN “maasserver_bmc” ON (“maasserver_node”.“bmc_id” = “maasserver_bmc”.“id”) WHERE (“maasserver_node”.“node_type” = $5 AND “maasserver_node”.“node_type” = $6 AND “maasserver_node”.“id” > $7) ORDER BY “maasserver_node”.“id” ASC LIMIT $8

confirmed that above query is the problem, after truncate maasserver_event table, cpu issue is gone

Thanks for spotting that. I really hope a point release or a patch could be available soon.

What file did you modify and how for your fix? @hyuwang

Many thanks

pg sql issue can be solved by truncate maasserver_event table (see my other post: Db flood by maas event log ), but there is also frontend performance issue (fixed committed, not release yet), you could try compile from source.

I can confirm that the event table size is making a huge difference for me. Here’s an example. I have an environment with 446 servers in it. Under MAAS 2.6.2 it was taking 1min 13 seconds to refresh the Machines page. I checked and truncated the table:

sudo bash
su - postgres
\c maasdb

SELECT nspname || ‘.’ || relname AS “relation”,
pg_size_pretty(pg_total_relation_size(C.oid)) AS “total_size”
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN (‘pg_catalog’, ‘information_schema’)
AND C.relkind <> ‘i’
AND nspname !~ ‘^pg_toast’
ORDER BY pg_total_relation_size(C.oid) DESC

          relation              | total_size 

public.maasserver_event | 4227 MB
public.metadataserver_scriptresult | 229 MB
public.metadataserver_nodeuserdata | 78 MB
public.maasserver_neighbour | 3752 kB
public.maasserver_node | 2000 kB
(5 rows)

truncate table public.maasserver_event;

After doing this, the Machines page loads all 446 servers in only 9 seconds now.

Upon inspection, the vast majority of the events are power query events. You can delete just these events with this Postgres command:

DELETE FROM maasserver_event WHERE type_id = 8;

Hi, this is a very old thead, but you will stumble upon it if you search for maasserver_event and find out it is multi-gigabyte large.

I found that the hardware sync feature produces a lot of events, for the past 6 month I have more than 5 million with less than 30 machines.

So to reduce size i did:

DELETE FROM maasserver_event WHERE type_id > 62 and type_id < 69;
DELETE 5262169

I verified beforehand:

SELECT id,name,description FROM public.maasserver_eventtype;
 60 | ABORTED_DEPLOYMENT                   | Aborted deployment
 61 | FAILED_COMMISSIONING                 | Failed commissioning
 62 | REQUEST_NODE_START                   | User powering up node
 63 | NODE_HARDWARE_SYNC_MEMORY            | Node Memory hardware sync state change
 64 | NODE_HARDWARE_SYNC_INTERFACE         | Node Interface hardware sync state change
 65 | NODE_HARDWARE_SYNC_BLOCK_DEVICE      | Node Block Device hardware sync state change
 66 | NODE_HARDWARE_SYNC_PCI_DEVICE        | Node PCI Device hardware sync state change
 67 | NODE_HARDWARE_SYNC_USB_DEVICE        | Node USB Device hardware sync state chage
 68 | NODE_HARDWARE_SYNC_CPU               | Node CPU hardware sync state change
 69 | SCRIPT_RESULT_ERROR                  | Script result lookup or storage error
 72 | NODE_POWER_CYCLE_FAILED              | Failed to power cycle node

Hope this helps the search function.