I’m trying to get the simplest region+rack controller set up on 1 bare-metal Ubuntu 22.04.3 LTS machine, and have become baffled by MaaS URI/URLs and how they pertain to region/rack controllers.
I’ve installed MaaS 3.4/beta using snap, followed the instructions I found on here for creating a production postgres db (not the test one), and initialized a region+rack controller using the instructions I found here:
…but the rackd logs indicate that it cannot find the region controller:
2023-08-18 20:20:44 provisioningserver.rpc.clusterservice: [critical] Failed to contact region. (While requesting RPC info at http://<MY_MAAS_HOST>:5240/maas).
…where I answered its question about MaaS URI using the suggested one (which is the one the rackd is trying to contact in the err msg above.)
Any ideas on what’s wrong here?
Do I need to set MY_MAAS_HOST to localhost for the region+rack use case?
For context,
I can successfully access via http the maas UI at http://<MY_MAAS_HOST>:5240, which redirects to http://<MY_MAAS_HOST>:5240/maas.
I only see a single controller in the UI, a region controller (see screenshot below). When I Add rack controller, using the commands given yields a warning/error indicating that the controller has already been initialized. So, I’m convinced that the controller exists but is just not able to communicated with its regional cousin…
I AM using an external DHCP server, and I know this isn’t supported, but this issue appears to be unrelated to that, and I have set up the DHCP server correctly to point to MaaS for TFTP. PXE clients do see the MaaS server but the tftp fails, and I assume this is because the rackd is encountering problems.
Hi @timblaktu ! It’s fine to get such errors in the rackd, as a matter of facts sometimes the rack might fail to communicate to the region. But actually you should see Region and rack controller in the Controllers page: could you retry from scratch and ensure you call sudo maas init region+rack --database-uri postgres://tim:timspassword@localhost/maasdb after you install maas and postgres?
Thanks, @r00ta. In my case these rackd errors are persistent and frequent, such that it appears that my rackd has never successfully reached its regiond. Nevertheless, I am working on re-configuring the controller again. I’d actually already done this twice, but only going back as far as the maas init command, and telling it to re-initialize. I assume you mean either to find a way to remove/delete the controller and re-initialize it, or to sudo snap remove maas and start over from there. Which do you think it most appropriate here?
@r00ta, I’m getting further now… my machine that is attempting pxe booting is able to “download NBP file”, but MaaS UI isn’t seeing any Machines. I’m getting error about there being no DHCP on any VLANs. I haven’t yet created the subnet corresponding to the IP range used by my external DHCP. Is this supposed to be a “fabric” or a “space”?
Also, do I have to use VLANs for MaaS to work? I can do this, but thought I’d first get it working without.
Studying the networking docs on discourse, I think I’ve answered my previous questions, however, it’s not completely clear in the section on reserving IP address ranges when using external DHCP how to actually do that, i.e. do I create a “space” or “subnet” within an existing “fabric”, or a new “fabric” with new “space” or “subnet”? Also, how do I denote that this THING is to be reserved?? Tagging @billwear for visibility into areas newbies get hung up on for how to improve documentation.
I found this pop-up over the Space field in one of my subnets, which implies that I need to associate all of my DHCP-Reserved-IP-CIDR Subnets with a space, and then (somehow) denote that space as having the purpose of “Don’t use this MaaS, Don’t do it! It’s reserved for external DHCP!”
Sorry, I was way off there. I found the Reserved Ranges field in the Subnet and VLAN pages, so I figure all I have to do is set my ext DHCP range here for a containing subnet.
In my case, a subnet for 192.168.1.0/24 was created for me out of the box (probably bc on installation, MaaS gleaned these details from the local NIC config), and my DHCP is doling out 192.168.1.100-199, so I set this range in the Reserved Ranges field of the 192.168.1.0/24 subnet.
Circling back to document the other critical elements to getting MaaS successfully PXE booting my hardware…
Basically, this, specifying bootx64.efi as the filename in the DHCP network booting config. I had read in many other places that this needed to be pxelinux.o but this never worked, perhaps bc my metal machines all configure to boot uefi.
Hi @timblaktu
Thanks for the running posts of your experiences. It’s always great to see how our users interact with MAAS so we can continuously improve.
Based on your posts, would I be correct in saying some of our features were not telegraphed in such a way that was easy to see?
Also, has your initial issue been solved, or do you still need help with that?
Thanks @lloydwaltersj, yes I think my confusions and initial issue have been resolved. Yes, the docs are hard to find and know you’re using the right version of them, then hard to find topical info without searching discourse here, and the following features took me a long time to grokk:
Using external DHCP
How to create a prod postgres db (not the test one in your docs) and initialize a single-node region+rack MaaS system. In hindsight, this is only about 5 commands, but it took me a few days to get it right bc no formula for this MVP system appears to be written down in one place, and it requires a…
High-level understanding of how the big pieces fit together. A single good diagram would work wonders here.
Finding daemon logs and troubleshooting
Aside from better diagrams, I would suggest creating an automation solution, even just a shell script, that installs and configures MaaS in an MVP way for first time users.
Hey @timblaktu, are you familiar with either of ansible or terraform? we have the MAAS Ansible Playbooks for setup/teardown and MAAS Terraform Provider for modifying a running MAAS.
We’re always exploring more avenues for installation and configuration of course, so we’re likely to have more than the above in future.
As to the other points, our docs are an ever evolving space, and we welcome community contributions! If you can provide any example docs for your problems, we’d be happy to include them!