Cloud-init error when pxe booting?

So MAAS was suggested on reddit for a bare metal service I was put in charge of getting running.

So been testing it in my home lab but hitting a wall when it comes to actually PXE booting a system.

Everything seems to go fine at first but then I get this error : https://imgur.com/Tzv649z

There was also some kind of error that flashed by real fast about openbsd shell as well.

I can not find any login for the pxe environment online so kinda stuck, any ideas?

I also noticed that the system never shows up in MAAS as commissioning like I see in others videos, but not sure if that is supposed to happen after this part of the process.

This is a fresh install on ubuntu 22.04 done yesterday following this guide: https://maas.io/docs/bootstrap-maas

Any help would be great, or suggestions for another option besides maas?

Hello @bean74695234 and welcome to MAAS community!

It feels that this error might be related to a network/firewall configuration.

MAAS generates grub.cfg that contains cloud-config-url parameter which is then used by cloud-init.

I’d suggest you to check what is being placed into the config and if machine is able to reach requested address.

You can do it either by adding some debug print statements here or in the grub boot menu or by listening TFTP traffic between machine and MAAS rack controller.

Also similar issue was discussed here

Thanks for the tips, I will look into that. Seems like packet sniffing would be the easiest option. Is there a particular string I am looking for?

I am running both the host and client on VM’s for the time being. I have tried both on the same host and separate hosts and even booting a bare metal server but all have the same results.

All on the same network as well of course so that other thread doesn’t seem to be the same issue.

That said I am using pfsense to handle dhcp but the pxe booting starts just fine?

So I tried a bunch of stuff and the cloud-init still times out during boot of the pxe environment but I don’t get the error once it is booted anymore yet the system still never shows up in the maas webgui?

Any ideas?

Is it possible to remove the cloud-init from the pxe environment although that doesn’t appear to be the issue at this point.

I would start looking for /grub/grub.cfg file request and then examine the contents of that file.

|824px;x508px;

grub.cfg generated by MAAS should have datasource_list and cloud-config-url pointing to MAAS installation.

|724px;x508px;

If you are not using MAAS provided DHCP for PXE booting, that might be the problem…

cloud-init is required, because it runs all the scripts provided by MAAS and then communicates back.

Simplified process how it works:

  1. Machine during PXE sends DHCP Discover
  2. MAAS DHCP server replies with DHCP Offer (with option 150 and option 67 being set)
  3. Machine downloads bootloader and ephemeral Ubuntu image from MAAS
  4. Machine boots ephemeral Ubuntu
  5. cloud-init is started and reads kernel options. It knows that is was asked to talk to specific datasource (grub.cfg)
  6. cloud-init fetches required metadata from the datasource, run scripts, report back the status and the information about machine.
  7. MAAS then process the data from the cloud-init and creates a machine.

Let’s try to sniff the traffic and see if there any difference.
I personally never tried using MAAS with external DHCP server for PXE booting, but it seems that depending on a DHCP server type certain configuration is required. Here is just one of examples: https://portegi.es/blog/maas-1

Thanks for the walk through!

I think you are right, I need to rule out pfsense causing issues so I am in the process of setting up a separate network now that I can use for testing, just got to wait for people to not be online so I can rearrange some things. I will try running maas with the internal dhcp server and see if that improves anything before chasing that rabbit.

In production it would be easier to use our pfsense dhcp server but we would only be using 1x of the 2x NIC’s on each system, so I could just setup a separate network on the 2nd nic for MAAS.

Thinking about it, that might be the best option anyways, just a bit more cost but I think we have some old 1gb switches left over from when we upgraded to 10gb anyways.

Should get time to mess with this a bit more in the next few days and will report what I learn!

Maas has so much more information the community is better then forman so far (the other option I am considering).

Just wanted to report that moving everything to a seprate network with maas handling the dhcp did indeed get things working. I will see if I can sort that out later but at least I can start playing with it now to see if it will work for what we need!

Hello @bean74695234

I’m glad you made it working.
If you have any further questions or need more assistance, feel free to ask.

FYI we are planning to migrate to Kea DHCP and improve integration with external DHCP servers. Would be great to know your use case with pfsense.

Cool, basically we use pfsense for the router for various reasons.

The reason for using the dhcp on it vs external are a few reasons.

First we use static dhcp leases for most things due to how many changes happen on this network and having it in pfsense allows us to easily keep track of those changes in the same place we deal with the firewall rules etc.

second, it is a single place to deal with all the routing/networking side of things, we are not at the scale that we need lvl3 routers etc yet so we have a pretty basic networking setup and like to keep it that way. pfsense + managed switches are all we need right now.

third, it is what we know, like most things in IT, it is just easier to work with what you know. Everyone knows it and makes it easier to manage. Plus when dealing with 25gb+ internet, there are not a ton of options for routers without spending the BIG bucks.

Technically we could run everything on the maas dhcp but frankly that is a lot of work that I just don’t see a reason to mess with lol.

Now that I know it works on a separate network I am rearranging my homelab to run some tests as we would have it in production. Just got to borrow another server from the office.

I assume that MAAS would not have any issues running in proxmox?

I see and yes, that makes perfect sense to use existing DHCP in that case.

It seems that pfSense is using ICS DHCP and are also moving to Kea DHCP according to https://redmine.pfsense.org/issues/6960 and in theory with Kea it is possible to make interesting integrations using hook libraries

But it should be also possible just to configure pfSense in a way similar in this blog post. I didn’t try it myself, but it seems to be legit.

Regarding proxmox - I don’t see any issues here. I personally been running MAAS in LXD VMs and containers for quite a while.

Yes, I followed a simular guide when I setup pfsense before and it seemed to work fine in that the PXE booting would start without an issue but the cloud-init would fail for some reason.

I think it is possibly an issue with local name resolution or something but when I manually lookup the local name, it resolves to the correct IP address.

Is there a way to override MAAS to use direct IP instead of name resolution for the provisioning environment?

So been slowly working on this in my spare time, got a test setup working now, one issue I am having is how do you setup the proxmox power control?

I have tried every version of the syntax I can figure out but it never seems to work? I also can not find any details online of how the syntax works.

Good question, I don’t think there is an exposed configuration option for this…

I don’t know much about proxmox but let’s try to solve it!
proxmox power driver logic can be found at https://github.com/maas/maas/blob/master/src/provisioningserver/drivers/power/proxmox.py

  1. Can you please check rackd.log for any errors? More about MAAS logs here
  2. Can you please provide your power config and describe the issue you observe?

Thanks!

The only errors I see in that log seem to be this, seems to complain about the lack of a cluster.

2023-06-27 23:01:54 ClusterClient,client: [info] ClusterClient connection lost (HOST:IPv6Address(type=‘TCP’, host=’

Near as I can make out this should be the correct syntax:

But the only error I can find is : Failed to query node’s BMC - Request failed with response status code: 401.

This is not a deal breaker, we will be using ipmi for the power control in production, was just wanting to simulate the workflow so having the power control working would be nice here.

Now one issue I am having is getting maas to work on a vlan. It does not seem to want to enable the dhcp server on a vlan, only the untagged network. Once again I can work around this in production but would be a lot easier if it would support vlans.

Anytime I try to enable dhcp for a vlan I get this error: This VLAN is not currently being utilized on any rack controller.

I looked it up online and others said they had the same issue and only untagged would work with dhcp.

Hello @bean74695234
Jut FYI it seems that the issue you faced happened because of this bug https://bugs.launchpad.net/maas/+bug/2022926

Thats interesting. Maybe there is something with the proxmox power driver. Unfortunately I don’t have proxmox for testing :frowning:

1 Like

Thanks, hopefully it will be fixed soon, not being a dev, any idea on a timeline when that might get fixed? I have no idea how big of a fix it is lol.

The fix is already there, but not merged yet and it will take some time for us to make a new release (3-4 weeks).

If you want to try it out yourself and check if that solves the issue, here is how you can do it

I am still in the testing phase, think I might do just that actually!

Just wanted to say that this did indeed work! I can ditch mass DHCP!!!

Thanks, my life just got so much easier lol

Thank you for your feedback! I am glad it helped.