Unreliable MAAS login

Hello,
I’m doing generic login as described here, https://gist.github.com/leshikus/837534a5d5c6fde1275b459fc4a3e50a

Sometimes it works, sometimes it does not.

I see the following errors in the rackd.log

2023-08-24 11:03:00 provisioningserver.utils.services: [critical] Failed to update and/or record network interface configuration: Expecting value: line 1 column 1 (char 0); interfaces: {'docker0': {'type': 'bridge', 'mac_address': '02:42:52:f6:3c:a5', 'links': [{'mode': 'static', 'address': '172.17.0.1/16'}], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monitored': True}, 'ens5': {'type': 'physical', 'mac_address': '52:54:00:bf:3e:c3', 'links': [{'mode': 'static', 'address': '192.168.121.19/24', 'gateway': '192.168.121.1'}], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monitored': True}, 'ens6': {'type': 'physical', 'mac_address': '52:54:00:06:72:0e', 'links': [{'mode': 'static', 'address': '172.16.1.254/24'}], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monitored': True}}
	Traceback (most recent call last):
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
	    current.result = callback(  # type: ignore[misc]
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
	    current_context.run(_inlineCallbacks, r, gen, status)
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
	    result = current_context.run(
	  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	--- <exception caught here> ---
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1074, in do_action
	    yield self._updateInterfaces(interfaces)
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
	    result = current_context.run(
	  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1171, in _updateInterfaces
	    yield self._run_refresh(
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
	    result = current_context.run(
	  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1203, in _run_refresh
	    yield deferToThread(
	  File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
	    result = inContext.theWork()  # type: ignore[attr-defined]
	  File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
	    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
	  File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
	    return self.currentContext().callWithContext(ctx, func, *args, **kw)
	  File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
	    return func(*args, **kw)
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
	    result = func(*args, **kwargs)
	  File "/usr/lib/python3/dist-packages/provisioningserver/refresh/__init__.py", line 56, in refresh
	    failed_scripts = runscripts(
	  File "/usr/lib/python3/dist-packages/provisioningserver/refresh/__init__.py", line 170, in runscripts
	    post_process_hook(
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1228, in _annotate_commissioning
	    lxd_data = json.load(fp)
	  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
	    return loads(fp.read(),
	  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
	    return _default_decoder.decode(s)
	  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
	    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
	  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
	    raise JSONDecodeError("Expecting value", s, err.value) from None
	json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

what may cause this? I wonder where 172.17.0.1 may come from (some docker?) - I use 172.16.0.1 as a cluster network

This error is preceded with this one:

2023-08-24 11:03:00 provisioningserver.utils.services: [critical] Failed to update and/or record network interface configuration: Expecting value: line 1 column 1 (char 0); interfaces: {'docker0': {'type': 'bridge', 'mac_address': '02:42:52:f6:3c:a5', 'links': [{'mode': 'static', 'address': '172.17.0.1/16'}], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monitored': True}, 'ens5': {'type': 'physical', 'mac_address': '52:54:00:bf:3e:c3', 'links': [{'mode': 'static', 'address': '192.168.121.19/24', 'gateway': '192.168.121.1'}], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monitored': True}, 'ens6': {'type': 'physical', 'mac_address': '52:54:00:06:72:0e', 'links': [{'mode': 'static', 'address': '172.16.1.254/24'}], 'enabled': True, 'parents': [], 'source': 'machine-resources', 'monitored': True}}
	Traceback (most recent call last):
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
	    current.result = callback(  # type: ignore[misc]
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
	    current_context.run(_inlineCallbacks, r, gen, status)
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
	    result = current_context.run(
	  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	--- <exception caught here> ---
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1074, in do_action
	    yield self._updateInterfaces(interfaces)
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
	    result = current_context.run(
	  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1171, in _updateInterfaces
	    yield self._run_refresh(
	  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
	    result = current_context.run(
	  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1203, in _run_refresh
	    yield deferToThread(
	  File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 244, in inContext
	    result = inContext.theWork()  # type: ignore[attr-defined]
	  File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 260, in <lambda>
	    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
	  File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 117, in callWithContext
	    return self.currentContext().callWithContext(ctx, func, *args, **kw)
	  File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 82, in callWithContext
	    return func(*args, **kw)
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/twisted.py", line 203, in wrapper
	    result = func(*args, **kwargs)
	  File "/usr/lib/python3/dist-packages/provisioningserver/refresh/__init__.py", line 56, in refresh
	    failed_scripts = runscripts(
	  File "/usr/lib/python3/dist-packages/provisioningserver/refresh/__init__.py", line 170, in runscripts
	    post_process_hook(
	  File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1228, in _annotate_commissioning
	    lxd_data = json.load(fp)
	  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
	    return loads(fp.read(),
	  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
	    return _default_decoder.decode(s)
	  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
	    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
	  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
	    raise JSONDecodeError("Expecting value", s, err.value) from None
	json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

hi @dataved,

The login command is calling just the region and not the racks, so I think these messages from the rack are unrelated. What do you get when you run the login command?

Thanks @r00ta . I’m looking into this further

  1. this happens when several maas initializations in several clusters happen in parallel; it seems that exactly one initialization continues, all other fail due to login problem;
  2. the actual output is “Remote end closed connection without response”

what could be a reason for this? which log file should I inspect?

I look into regiond.log and see

2023-09-14 23:56:32 maasserver.ipc: [info] Worker pid:17203 registered RPC connection to ('ec84xm', '172.17.0.1', 5252).

All my clusters have 172.16.<clusterid>.* networks. I wonder if they all use 172.17.0.1 for some RPC and have a conflict.

Do you see the issue only immediately after the initialization (like maas init ...) of MAAS or also later?

it seems there are no issues before auth, I even get a key. Here what I do:

Here what I do:

Installing MAAS
passed

Check if postgres user exists
passed

Create Postgres User
passed

Check if maas Database exists
passed

Create MaaS Database
passed

Check if MaaS is initialized
maas status
passed

Initialize MaaS
maas init region+rack --database-uri "postgres://{{ db_user }}:{{ db_pass }}@localhost/{{ db_name }}"
passed

Create MaaS API User
passed

Stop maas regiond service to scan subnets
passed

Stop maas rackd service to scan subnets
passed

Start maas regiond service
passed

Start maas rackd service
passed

Get MaaS login API key
passed

Login as MaaS user admin
multiple retries (900 sec), failed with Remote end closed connection without response

Here is how the end of regiond.log looks:

2023-09-14 23:55:58 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/hwe-20.04-lowlatency-edge/focal/20230327/boot-initrd HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1)
2023-09-14 23:55:58 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/hwe-20.04-lowlatency-edge/focal/20230327/boot-kernel HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1)
2023-09-14 23:56:00 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/hwe-20.04-lowlatency/focal/20230327/boot-initrd HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1)
2023-09-14 23:56:00 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=5253, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=59120, flowInfo=0, scopeID=0))
2023-09-14 23:56:00 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=5253, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=59128, flowInfo=0, scopeID=0))
2023-09-14 23:56:00 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=5253, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=59130, flowInfo=0, scopeID=0))
2023-09-14 23:56:00 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:192.168.121.237:59120'.
2023-09-14 23:56:00 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:192.168.121.237:59128'.
2023-09-14 23:56:00 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:192.168.121.237:59130'.
2023-09-14 23:56:00 regiond: [info] 127.0.0.1 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
2023-09-14 23:56:02 maasserver.ipc: [info] Worker pid:17201 registered RPC connection to ('ec84xm', '192.168.121.237', 5253).
2023-09-14 23:56:02 maasserver.ipc: [info] Worker pid:17201 registered RPC connection to ('ec84xm', '192.168.121.237', 5253).
2023-09-14 23:56:02 regiond: [info] 127.0.0.1 GET /MAAS/images-stream/ubuntu/amd64/hwe-20.04/focal/20230327/boot-initrd HTTP/1.1 --> 200 OK (referrer: -; agent: python-simplestreams/0.1)
2023-09-14 23:56:02 maasserver.ipc: [info] Worker pid:17201 registered RPC connection to ('ec84xm', '192.168.121.237', 5253).
2023-09-14 23:56:30 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:172.17.0.1', port=5252, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=59200, flowInfo=0, scopeID=0))
2023-09-14 23:56:30 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:172.17.0.1', port=5252, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=59212, flowInfo=0, scopeID=0))
2023-09-14 23:56:30 twisted.internet.protocol.Factory: [info] RegionServer connection established (HOST:IPv6Address(type='TCP', host='::ffff:172.17.0.1', port=5252, flowInfo=0, scopeID=0) PEER:IPv6Address(type='TCP', host='::ffff:192.168.121.237', port=59220, flowInfo=0, scopeID=0))
2023-09-14 23:56:30 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:192.168.121.237:59200'.
2023-09-14 23:56:30 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:192.168.121.237:59212'.
2023-09-14 23:56:30 maasserver.rpc.regionservice: [info] Rack controller authenticated from '::ffff:192.168.121.237:59220'.
2023-09-14 23:56:32 maasserver.ipc: [info] Worker pid:17203 registered RPC connection to ('ec84xm', '172.17.0.1', 5252).
2023-09-14 23:56:32 maasserver.ipc: [info] Worker pid:17203 registered RPC connection to ('ec84xm', '172.17.0.1', 5252).
2023-09-14 23:56:32 maasserver.ipc: [info] Worker pid:17203 registered RPC connection to ('ec84xm', '172.17.0.1', 5252).

Here is how the end of maas.log looks:

2023-09-14T23:55:44.113380+00:00 jumphost maas.import-images: [info] Downloading image descriptions from http://localhost:5240/MAAS/images-stream/streams/v1/index.json
2023-09-14T23:55:44.114355+00:00 jumphost maas.import-images: [info] Rack downloading image descriptions from 'http://localhost:5240/MAAS/images-stream/streams/v1/index.json'.
2023-09-14T23:55:44.431274+00:00 jumphost maas.import-images: [info] Downloading boot resources from http://localhost:5240/MAAS/images-stream/streams/v1/index.json
2023-09-14T23:56:01.927727+00:00 jumphost maas.rpc.rackcontrollers: [info] Existing rack controller 'jumphost' running version 3.4.0~rc1-14302-g.bb0dc28c1 has connected to region 'jumphost'.
2023-09-14T23:56:02.031356+00:00 jumphost maas.rpc.rackcontrollers: message repeated 2 times: [ [info] Existing rack controller 'jumphost' running version 3.4.0~rc1-14302-g.bb0dc28c1 has connected to region 'jumphost'.]
2023-09-14T23:56:02.126607+00:00 jumphost maas.import-images: [info] Writing boot image metadata.
2023-09-14T23:56:02.128889+00:00 jumphost maas.import-images: [info] Linking boot images snapshot /var/lib/maas/boot-resources/snapshot-20230914-235544
2023-09-14T23:56:02.143218+00:00 jumphost maas.uefi_amd64: [error] Unable to find a copy of bootx64.efi, grubx64.efi in the SimpleStream and the packages shim-signed, and grub-efi-amd64-signed are not installed. The uefi_ebc_tftp bootloader type may not work.
2023-09-14T23:56:02.144511+00:00 jumphost maas.import-images: [info] Cleaning up old snapshots and cache.
2023-09-14T23:56:02.145341+00:00 jumphost maas.import-images: [info] Finished importing boot images.
2023-09-14T23:56:07.941549+00:00 jumphost maas.refresh: [info] Refreshing rack controller hardware information.
2023-09-14T23:56:32.000208+00:00 jumphost maas.rpc.rackcontrollers: [info] Existing rack controller 'jumphost' running version 3.4.0~rc1-14302-g.bb0dc28c1 has connected to region 'jumphost'.
2023-09-14T23:56:32.114817+00:00 jumphost maas.rpc.rackcontrollers: message repeated 2 times: [ [info] Existing rack controller 'jumphost' running version 3.4.0~rc1-14302-g.bb0dc28c1 has connected to region 'jumphost'.]
2023-09-14T23:56:38.200377+00:00 jumphost maas.refresh: [info] Refreshing rack controller hardware information.

I suspect that when you call maas user login the maas processes are not fully operational yet. maas apikey works because it does not interact with the maas server.

Similarly, this bug has been opened recently. I’ll take a look next week.

Btw, what maas version are you using? deb or snap?

I use deb from ‘ppa:maas/3.4-next’. Thanks for the hint about the server

From what I’ve seen, the nginx server starts before maas actually binds the unix socket where nginx sends the requests. This is why you might get 502 when MAAS is (re)started.

1 Like

My problem was caused by intermittent failures in the name resolution; no MAAS specifics. Thanks for your help.

1 Like