Hey there,
We’ve been seeing intermittent slow page loads on our MAAS server.
Sometimes on the machines list, sometimes on the KVM list, sometimes on the Images list.
When we hit a very slow patch it can take 20-40 seconds to load one of the given pages.
Here’s our setup:
3x SuperMicro X9DRT (blades), running Ubuntu 18.04 Server, 1x Samsung 850 EVO 500GB (boot drive), 2x Samsung 860 EVOs (mirrored zpool for LXDs), 64GB RAM, 10G LAN connection
3x LXDs each with a region+rack server (snap version)
3x LXDs each running PostgreSQL (deployed with juju using the postgresql charm version 208 [postgresql version 10.12]) as the database for MAAS.
176x Hardware Machines are checked into MAAS
~35x KVMs deployed across 77 KVM hosts (in MAAS)
Watching usual system statistics has been no help (CPU/load/disk bandwith+IO/network bandwidth+IO), as far as these go they seem to be pretty quiet.
I’ve been struggling to find anything else that could be the cause, so I’ve been watching Postgres for slow queries (>5 seconds), but I have no idea if what I’m seeing is out of the ordinary or not:
2020-08-12 20:46:53 UTC [22532]: [3-1] db=maasdb,user=maas LOG: duration: 31784.074 ms fastpath function call: "lo_unlink" (OID 964)
2020-08-12 20:47:32 UTC [40006]: [3-1] db=maasdb,user=maas LOG: duration: 6950.174 ms statement: COMMIT
2020-08-11 22:30:34 UTC [27895]: [3-1] db=maasdb,user=maas LOG: duration: 19999.379 ms fastpath function call: "loread" (OID 954)
2020-08-11 22:30:34 UTC [9323]: [5-1] db=maasdb,user=maas LOG: duration: 15769.444 ms statement: SELECT "maasserver_node"."id", "maasserver_node"."created", "maasserver_node"."updated", "maasserver_node"."system_id", "maasserver_node"."hardware_uuid", "maasserver_node"."hostname", "maasserver_node"."description", "maasserver_node"."pool_id", "maasserver_node"."domain_id", "maasserver_node"."address_ttl", "maasserver_node"."status", "maasserver_node"."previous_status", "maasserver_node"."status_expires", "maasserver_node"."owner_id", "maasserver_node"."bios_boot_method", "maasserver_node"."osystem", "maasserver_node"."distro_series", "maasserver_node"."architecture", "maasserver_node"."min_hwe_kernel", "maasserver_node"."hwe_kernel", "maasserver_node"."node_type", "maasserver_node"."parent_id", "maasserver_node"."agent_name", "maasserver_node"."error_description", "maasserver_node"."zone_id", "maasserver_node"."cpu_count", "maasserver_node"."cpu_speed", "maasserver_node"."memory", "maasserver_node"."swap_size", "maasserver_node"."bmc_id", "maasserver_node"."instance_power_parameters", "maasserver_node"."power_state", "maasserver_node"."power_state_queried", "maasserver_node"."power_state_updated", "maasserver_node"."last_image_sync", "maasserver_node"."error", "maasserver_node"."netboot", "maasserver_node"."ephemeral_deploy", "maasserver_node"."license_key", "maasserver_node"."creation_type", "maasserver_node"."boot_interface_id", "maasserver_node"."boot_cluster_ip", "maasserver_node"."boot_disk_id", "maasserver_node"."gateway_link_ipv4_id", "maasserver_node"."gateway_link_ipv6_id", "maasserver_node"."default_user", "maasserver_node"."install_rackd", "maasserver_node"."install_kvm", "maasserver_node"."enable_ssh", "maasserver_node"."skip_bmc_config", "maasserver_node"."skip_networking", "maasserver_node"."skip_storage", "maasserver_node"."url", "maasserver_node"."dns_process_id", "maasserver_node"."managing_process_id", "maasserver_node"."current_commissioning_script_set_id", "maasserver_node"."current_installation_script_set_id", "maasserver_node"."current_testing_script_set_id", "maasserver_node"."locked" FROM "maasserver_node" WHERE ("maasserver_node"."node_type" IN (2, 4) AND "maasserver_node"."system_id" = 'ae7a7x')
2020-08-14 04:13:11 UTC [40654]: [3-1] db=maasdb,user=postgres LOG: duration: 8866.240 ms statement: COPY public.metadataserver_scriptresult (id, created, updated, status, exit_status, script_name, stdout, stderr, result, script_id, script_set_id, script_version_id, output, ended, started, parameters, physical_blockdevice_id, suppressed, interface_id) TO stdout;
2020-08-11 22:30:34 UTC [27881]: [3-1] db=maasdb,user=maas LOG: duration: 19160.838 ms statement: UPDATE "maasserver_node" SET "power_state_queried" = '2020-08-11T22:30:15.608069'::timestamp WHERE "maasserver_node"."system_id" IN ('4a6mff', '44f6e6', '6qk8bd', 'ah3b6c', 'akeeaq', 'c8rdnn', '4ff66m', 'b6cpwt', 'fkrbqk', '73rhne')
2020-08-11 23:36:23 UTC [13011]: [3-1] db=maasdb,user=maas LOG: duration: 26002.864 ms statement: SELECT pg_advisory_xact_lock(20120116, 7)
Current counts for the above queries:
grep lo_unlink postgresql-10-main.log | wc -l
24
grep COMMIT postgresql-10-main.log | wc -l
390
grep loread postgresql-10-main.log | wc -l
201
grep 'SELECT "maasserver_node"' postgresql-10-main.log | wc -l
201
grep COPY postgresql-10-main.log | wc -l
4
grep UPDATE postgresql-10-main.log | wc -l
943
grep pg_advisory_xact_lock postgresql-10-main.log | wc -l
94
For the queries above, the Durations listed are pretty representative of those queries (at least when they’re >5 seconds).
Is the above strange or out of order?
Is there something else we should be looking into?
Or are the slow UI times we’re seeing expected performance?
Thank you!
-Derek