Rack controller init fails with private certs

I’ve got a region controller running with the snap on Rocky 9. Everything was working with our internally-generated certs for TLS until I tried to add a separate rack controller.

The init command returns without any apparent error:

$ sudo maas init rack --maas-url https://$MAAS_IP:5240/MAAS --secret $MAAS_SECRET

…but the rack controller doesn’t appear on the region controller, and /var/log/messages on the rack controller shows a “certificate verify failed” traceback:

Traceback (most recent call last):
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 661, in callback
    self._startRunCallbacks(result)
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
    self._runCallbacks()
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1750, in gotResult
    current_context.run(_inlineCallbacks, r, gen, status)
--- <exception caught here> ---
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1225, in _doUpdate
    eventloops, maas_url = yield self._get_rpc_info(urls)
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1484, in _get_rpc_info
    raise config_exc
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1455, in _get_rpc_info
    eventloops, maas_url = yield self._parallel_fetch_rpc_info(urls)
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1429, in handle_responses
    errors[0].raiseException()
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
    raise self.value.with_traceback(self.tb)
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1390, in _serial_fetch_rpc_info
    raise last_exc
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1382, in _serial_fetch_rpc_info
    response = yield self._fetch_rpc_info(url, orig_url)
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
    result = current_context.run(
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1484, in _get_rpc_info
    raise config_exc
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1455, in _get_rpc_info
    eventloops, maas_url = yield self._parallel_fetch_rpc_info(urls)
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1429, in handle_responses
    errors[0].raiseException()
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/python/failure.py", line 475, in raiseException
    raise self.value.with_traceback(self.tb)
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
    result = current_context.run(
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1390, in _serial_fetch_rpc_info
    raise last_exc
  File "/snap/maas/36363/lib/python3.10/site-packages/provisioningserver/rpc/clusterservice.py", line 1382, in _serial_fetch_rpc_info
    response = yield self._fetch_rpc_info(url, orig_url)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'certificate verify failed')]>]

provisioningserver.rpc.clusterservice: [critical] Failed to contact region. (While requesting RPC info at https://jft-maas.jamfilled.com:5443/MAAS).
Traceback (most recent call last):
  File "/snap/maas/36363/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 661, in callback
    self._startRunCallbacks(result)

I’m not finding any command options (e.g. --cacerts) which seem like they’re meant for dealing with this. How do I get this working?

Thanks.

Does this help Question about native TLS and rack controllers - #6 by wyattrees ?

I read through that and tried making /etc/ssl/certs a directory (not a symlink) and putting the cacert .pem files directly into it on the rack controller, but it didn’t help. From what I’ve read elsewhere, that’s something that works for snaps on Ubuntu, but so far I haven’t seen anything indicating that it helps on RHEL-based distributions. But I may still be doing something incorrectly with that, not sure.

I also tried doing a bind mount of our cacert .pem file over top of /var/lib/snapd/snap/maas/current/lib/python3.10/site-packages/certifi/cacert.pem and /var/lib/snapd/snap/maas/current/lib/python3.10/site-packages/pip/_vendor/certifi/cacert.pem, but that also didn’t help.

FYI running MAAS controllers/racks on RHEL is not supported and you might run into issues

I thought the point of releasing something as a snap was so that it could run on multiple distros without having to worry about underlying differences…?

Sure, but there are actually some limitations and when you deliver a snap you can’t prevent people from installing it on some OS.

MAAS is certified to run only on Ubuntu hosts, other OS are not tested at all since they are not supported

I’m now trying to do it with a LetsEncrypt cert. I’m getting further, since the certificate is accepted and the rack controller shows up on the region server after the init. However, the logs on the rack controller now show this over and over:

maas.pebble[430660]: 2024-07-31T18:07:01.889Z [pebble] Service "agent" starting: sh -c "exec systemd-cat -t maas-agent $SNAP/bin/run-maas-agent"
maas-agent[446618]: INF Logger is configured with log level "info"
maas.pebble[430660]: 2024-07-31T18:07:09.589Z [pebble] GET /v1/services?names=http 57.551µs 200
maas.pebble[430660]: 2024-07-31T18:07:09.589Z [pebble] GET /v1/services?names=dhcpd6 40.025µs 200
maas.pebble[430660]: 2024-07-31T18:07:09.590Z [pebble] GET /v1/services?names=dhcpd 30.338µs 200
maas.pebble[430660]: 2024-07-31T18:07:09.590Z [pebble] GET /v1/services?names=syslog 20.927µs 200
maas.pebble[430660]: 2024-07-31T18:07:09.590Z [pebble] GET /v1/services?names=agent 27.969µs 200
maas.pebble[430660]: 2024-07-31T18:07:09.590Z [pebble] GET /v1/services?names=proxy 36.89µs 200
maas.pebble[430660]: 2024-07-31T18:07:09.590Z [pebble] GET /v1/services?names=bind9 43.923µs 200
maas.pebble[430660]: 2024-07-31T18:07:09.590Z [pebble] GET /v1/services?names=ntp 47.989µs 200
maas.pebble[430660]: 2024-07-31T18:07:39.590Z [pebble] GET /v1/services?names=http 48.144µs 200
maas.pebble[430660]: 2024-07-31T18:07:39.590Z [pebble] GET /v1/services?names=proxy 14.73µs 200
maas.pebble[430660]: 2024-07-31T18:07:39.590Z [pebble] GET /v1/services?names=ntp 34.744µs 200
maas.pebble[430660]: 2024-07-31T18:07:39.590Z [pebble] GET /v1/services?names=dhcpd6 52.544µs 200
maas.pebble[430660]: 2024-07-31T18:07:39.590Z [pebble] GET /v1/services?names=syslog 23.425µs 200
maas.pebble[430660]: 2024-07-31T18:07:39.590Z [pebble] GET /v1/services?names=dhcpd 26.104µs 200
maas.pebble[430660]: 2024-07-31T18:07:39.590Z [pebble] GET /v1/services?names=agent 30.023µs 200
maas.pebble[430660]: 2024-07-31T18:07:39.590Z [pebble] GET /v1/services?names=bind9 84.531µs 200
maas-agent[446618]: ERR Temporal client error error="failed reaching server: last connection error: connection error: desc = \"transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \\\"crypto/rsa: verification error\\\" while trying to verify candidate authority certificate \\\"maas-ca\\\")\""
maas.pebble[430660]: 2024-07-31T18:07:52.290Z [pebble] Service "agent" stopped unexpectedly with code 1
maas.pebble[430660]: 2024-07-31T18:07:52.290Z [pebble] Service "agent" on-failure action is "restart", waiting ~500ms before restart (backoff 1)

Does this ring any bells? Any reason it’d be trying to use a “maas-ca” certificate on the rack controller? Google isn’t giving me any clues.

Thanks.

Hi @andrew-boatrocker

The maas-ca certificate is something that is generated by MAAS on startup and it is saved on the database or Vault if the integration is enabled. We introduced this CA in order to create certificates for mTLS communication between Temporal services. We introduced Temporal with 3.5 release.

The CA certificate is copied later in the logic to a specific location inside the snap and it is loaded by Temporal. The path is /var/snap/maas/current/certificates/cacerts.pem. At the same place we store the cluster certificate and key: /var/snap/maas/current/certificates/cluster.{pem,key}

Could you check that the 3 files are placed correctly at this path?

Hi @skatsaounis

Thanks for the pointer. I see that all three files are present on both the region controller and the rack controller, but the contents of the certs are different between the region and the rack and they appear to have no trust or signing relationship.

Is there a particular way that I should be initializing the rack controller in order to establish trust?

I’d suggest to

  1. stop the region and the rack
  2. delete the certificates on both the region and the rack
  3. restart the region and the rack
  4. check the certificates (they should appear after few seconds)

That seemed to work - thanks!

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

For reference we found the issue and the bugfix will be included in the next minor release 3.5.2 Bug #2076910 “[3.5] \\\"crypto/rsa: verification error\\\” while...” : Bugs : MAAS (ETA next couple of weeks)

1 Like