Isolated MaaS 2.5 Rack Controller, also confused with "dpkg-reconfigure" options


#1

Hey guys,

I’m trying to split my MaaS deployments into two, first server will be for MaaS Region Controller and the second, isolated, will be the MaaS Rack Controller.

However, since my network topology is a bit complex (yes, just a bit), the Machine’s deployments are failing if I run the Rack stuff in a separated server. When they’re running on the same server, it’s all good.

Here is my network topology:

Number of VLANs: 3

VLAN 2: Called “Metal”, Subnet 192.168.0.0/22, ONLY to ssh into MaaS’ boxes (default gw for both Region and Rack)

VLAN 3: Called “Maas-PXE”, Subnet 192.168.4.0/22, ONLY for MaaS’ PXE Boot (not the default gw of the nodes)

VLAN 4: Called “Public”, Subnet 172.29.232.0/22, ONLY for deployed Machines (this is the default gw of them and how users will reach them)

1- MaaS Region Controller have only 1 IP, 192.168.0.5
2- MaaS Rack Controller have 2 IPs, 192.168.0.10 AND 192.168.4.10 (PXE Net)
3- Deployed Machines are supposed to have 2 IPs, 192.168.4.50~100 (PXE) AND 172.29.232.10

Thing is, when I try to deploy a Machine WITHOUT VLAN 4, it works! However, soon as I try to deploy the very same Machine, with an interface at VLAN 4, the deployment fails.

If I’m not mistaken, I can see that cloud-init can’t communicate with MaaS…

I though that with MaaS 2.5, it would be possible to use the Rack Controller as a Proxy to reach the Region Controller, is this already working with beta 4?

Also, I’m confused about the “dpkg-reconfigure maas-region-controller” option! It says:


Configuring maas-region-controller

The Ubuntu MAAS Server automatically detects the IP address that is used for PXE and provisioning. However, it needs to be reacheable by the clients (e.g L2 or L3 network). If the
automatically detected address is not reacheable by the clients, it needs to be changed.

Ubuntu MAAS PXE/Provisioning network address:

192.168.4.10


I’m confused! Why the MaaS Region Controller should “care” about the PXE Addr if it isn’t supposed to have connection there? Remember, the PXE network is behind the Rack Controller only, not reachable from the Region Ctrl.

Then, on MaaS Rack Controller, the “dpkg-reconfigure maas-rack-controller” says:


Configuring maas-rack-controller

The MAAS cluster controller and nodes need to contact the MAAS region controller API. Set the URL at which they can reach the MAAS API remotely, e.g. “http://192.168.1.1/MAAS”. Since nodes must be able to access this URL, localhost or 127.0.0.1 are not useful values here.

Ubuntu MAAS API address:

http://192.168.4.10:5240/MAAS


So, here, it makes more sense, since the Machines can ONLY reach the Rack Controller, I’m configuring its IP, so, in theory, the Machines will try to contact the Rack controller for metadata.

BUT, it isn’t working!

I even tried to connect the Region Controller directly at the PXE Network (new IP 192.168.4.5) and point to it on both “dpkg-reconfigure maas-*-controller” but, doesn’t work!

If I run both Region and Rack controllers within the same server, it’s all good.

Any idea?

Cheers!
Thiago


#2

Haven’t tried this feature yet but a few links below might help you.

The options you are presented with when you run dpkg-reconfigure are related to the old behavior and haven’t been changed so far it seems.

With the new feature booting machines should use a DNS record pre-created in maas-managed Bind DNS servers that presents a single endpoint for accessing rack controllers on a given subnet. The Maas-PXE must not contain any dns server configuration judging by the code, otherwise the old behavior will be used as a fallback. The domain used by default is called “maas-internal” and the actual URL passed to an ephemeral image in a preseed will look something like this: 192-168-0-0--22.maas-internal:9000. The DHCP server configuration will also contain all rackd addresses as DNS servers before regiond ones.

maas-rack-controller now depends on bind9 which is consistent with 2.5 release notes. (commit 21587a5f8157117b8e5440e1fd55ae918d9858b2)

relevant commits:

309d6343a6109f84a4ddb89c445f632709d4711f
4720d302c1d63a4b4468cfd27fffc5fb943ef8c0
26c5b41124459a21ca26ba8cfcd35fe2cd6f28b5 (Use internal MAAS domain as the preseed URL of a deploying machine)
9ea25240aeeed1b933e7cfb967f02f0b8861367d (Point DHCP nameservers and networking preseed for a deployed machine at the routable rack controllers when use_rack_proxy is true (default))

A couple of questions:

  • Do rack controllers have interfaces on the public VLAN?
  • is there a native VLAN configuration on switchports or is the traffic tagged?

Debugging steps:

  1. enable the verbose curtin mode and get the curtin config used during the deployment attempt:
    maas {session} maas set-config name=curtin_verbose value=true
    maas {session} machine deploy {system_id}
    maas {session} machine get-curtin-config {system_id}

  2. Generate a backdoor-able image and login to the machine that fails to connect to the metadata server to fetch cloud-init logs

  3. enable a BMC console by configuring the relevant kernel parameter passing in maas (this will get you current foreground virtual console and kernel messages when you request a remote console via IPMI - this should include cloud-init messages as well):

    dmesg | grep serial8250
    [ 2.189745] serial8250: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A

    console=ttyS1,115200 console=tty0

Let me know if that helps in debugging.


#3

On backdoorable images: https://docs.maas.io/2.5/en/troubleshoot-faq#backdoor-image-login