MaaS URI confusion - rackd not finding regiond

I’m trying to get the simplest region+rack controller set up on 1 bare-metal Ubuntu 22.04.3 LTS machine, and have become baffled by MaaS URI/URLs and how they pertain to region/rack controllers.

I’ve installed MaaS 3.4/beta using snap, followed the instructions I found on here for creating a production postgres db (not the test one), and initialized a region+rack controller using the instructions I found here:

sudo maas init region+rack --database-uri postgres://tim:timspassword@localhost/maasdb

…but the rackd logs indicate that it cannot find the region controller:

2023-08-18 20:20:44 provisioningserver.rpc.clusterservice: [critical] Failed to contact region. (While requesting RPC info at http://<MY_MAAS_HOST>:5240/maas).

…where I answered its question about MaaS URI using the suggested one (which is the one the rackd is trying to contact in the err msg above.)

  • Any ideas on what’s wrong here?
  • Do I need to set MY_MAAS_HOST to localhost for the region+rack use case?

For context,

  • I can successfully access via http the maas UI at http://<MY_MAAS_HOST>:5240, which redirects to http://<MY_MAAS_HOST>:5240/maas.
  • I only see a single controller in the UI, a region controller (see screenshot below). When I Add rack controller, using the commands given yields a warning/error indicating that the controller has already been initialized. So, I’m convinced that the controller exists but is just not able to communicated with its regional cousin…
  • I AM using an external DHCP server, and I know this isn’t supported, but this issue appears to be unrelated to that, and I have set up the DHCP server correctly to point to MaaS for TFTP. PXE clients do see the MaaS server but the tftp fails, and I assume this is because the rackd is encountering problems.

Thanks for your help. :slight_smile:

Hi @timblaktu ! It’s fine to get such errors in the rackd, as a matter of facts sometimes the rack might fail to communicate to the region. But actually you should see Region and rack controller in the Controllers page: could you retry from scratch and ensure you call sudo maas init region+rack --database-uri postgres://tim:timspassword@localhost/maasdb after you install maas and postgres?

Thanks, @r00ta. In my case these rackd errors are persistent and frequent, such that it appears that my rackd has never successfully reached its regiond. Nevertheless, I am working on re-configuring the controller again. I’d actually already done this twice, but only going back as far as the maas init command, and telling it to re-initialize. I assume you mean either to find a way to remove/delete the controller and re-initialize it, or to sudo snap remove maas and start over from there. Which do you think it most appropriate here?

Yup start from scratch with every component, clean up the database as well!

So like:

  1. remove the snap sudo snap remove --purge maas
  2. drop the database and create it again
  3. install the snap
  4. init!

Here’s a log of what I did next:

sudo snap remove maas
sudo snap install --channel=3.4/beta maas

# I already had created the db with:
#  `sudo apt install -y postgresql`
#  `sudo -u postgres createuser -e tim`
#  `sudo -u postgres createdb -O tim maasdb`

tim@metalord:~$ sudo maas status
MAAS is not configured

tim@metalord:~$ sudo maas init region+rack --database-uri postgres://tim:metal0rd@localhost/maasdb
MAAS URL [default=http://192.168.1.9:5240/MAAS]:
MAAS has been set up.
.
.
.
tim@metalord:~$ sudo maas status
bind9                            RUNNING   pid 471922, uptime 0:00:30
dhcpd                            STOPPED   Not started
dhcpd6                           STOPPED   Not started
http                             RUNNING   pid 472651, uptime 0:00:11
ntp                              RUNNING   pid 472070, uptime 0:00:27
proxy                            RUNNING   pid 472132, uptime 0:00:26
rackd                            RUNNING   pid 471925, uptime 0:00:30
regiond                          RUNNING   pid 471926, uptime 0:00:30
syslog                           RUNNING   pid 472078, uptime 0:00:27

tim@metalord:~$ # creating admin user fails bc it's already in the db.. benign
tim@metalord:~$ sudo maas createadmin
Username: admin
Password:
Again:
Email: timblaktu@gmail.com
Import SSH keys [] (lp:user-id or gh:user-id): 
AlreadyExistingUser: A user with the email xxxxx already exists.

Now, connecting to the MaaS web UI, I see the controller listed as region+rack!!

@r00ta, just got your reply, thanks. Looks like it works without re-doing the db, so I’ll proceed from here. Thanks for the support!

@r00ta, I’m getting further now… my machine that is attempting pxe booting is able to “download NBP file”, but MaaS UI isn’t seeing any Machines. I’m getting error about there being no DHCP on any VLANs. I haven’t yet created the subnet corresponding to the IP range used by my external DHCP. Is this supposed to be a “fabric” or a “space”?

Also, do I have to use VLANs for MaaS to work? I can do this, but thought I’d first get it working without.

Studying the networking docs on discourse, I think I’ve answered my previous questions, however, it’s not completely clear in the section on reserving IP address ranges when using external DHCP how to actually do that, i.e. do I create a “space” or “subnet” within an existing “fabric”, or a new “fabric” with new “space” or “subnet”? Also, how do I denote that this THING is to be reserved?? Tagging @billwear for visibility into areas newbies get hung up on for how to improve documentation.

This is what I see in the controller UI:

I found this pop-up over the Space field in one of my subnets, which implies that I need to associate all of my DHCP-Reserved-IP-CIDR Subnets with a space, and then (somehow) denote that space as having the purpose of “Don’t use this MaaS, Don’t do it! It’s reserved for external DHCP!”

Sorry, I was way off there. I found the Reserved Ranges field in the Subnet and VLAN pages, so I figure all I have to do is set my ext DHCP range here for a containing subnet.

In my case, a subnet for 192.168.1.0/24 was created for me out of the box (probably bc on installation, MaaS gleaned these details from the local NIC config), and my DHCP is doling out 192.168.1.100-199, so I set this range in the Reserved Ranges field of the 192.168.1.0/24 subnet.

Circling back to document the other critical elements to getting MaaS successfully PXE booting my hardware…

Basically, this, specifying bootx64.efi as the filename in the DHCP network booting config. I had read in many other places that this needed to be pxelinux.o but this never worked, perhaps bc my metal machines all configure to boot uefi.

Hi @timblaktu
Thanks for the running posts of your experiences. It’s always great to see how our users interact with MAAS so we can continuously improve.
Based on your posts, would I be correct in saying some of our features were not telegraphed in such a way that was easy to see?
Also, has your initial issue been solved, or do you still need help with that?

Thanks @lloydwaltersj, yes I think my confusions and initial issue have been resolved. Yes, the docs are hard to find and know you’re using the right version of them, then hard to find topical info without searching discourse here, and the following features took me a long time to grokk:

  1. Using external DHCP
  2. How to create a prod postgres db (not the test one in your docs) and initialize a single-node region+rack MaaS system. In hindsight, this is only about 5 commands, but it took me a few days to get it right bc no formula for this MVP system appears to be written down in one place, and it requires a…
  3. High-level understanding of how the big pieces fit together. A single good diagram would work wonders here.
  4. Finding daemon logs and troubleshooting

Aside from better diagrams, I would suggest creating an automation solution, even just a shell script, that installs and configures MaaS in an MVP way for first time users.

Hey @timblaktu, are you familiar with either of ansible or terraform? we have the MAAS Ansible Playbooks for setup/teardown and MAAS Terraform Provider for modifying a running MAAS.
We’re always exploring more avenues for installation and configuration of course, so we’re likely to have more than the above in future.

As to the other points, our docs are an ever evolving space, and we welcome community contributions! If you can provide any example docs for your problems, we’d be happy to include them!