Pxe boot forward with UEFI support

I followed the excellent instructions on
Chris Sanders blog in the article MAAS for the home. Unfortunately the line it has to modify dnsmasq.conf only does pxe forwarding for legacy boot.

How do I add forwarding that supports non legacy boot and UEFI?

2 Likes

I had a ton of fun with this just last night. See this old bug that’s still not resolved for metal UEFI.

I added the following snippet on my subnet:

if option arch = 00:07 and exists user-class and option user-class = "iPXE" {
    # iPXE uefi_amd64
    filename "grubx64.efi";
} elsif option arch = 00:09 and exists user-class and option user-class = "iPXE" {
    # iPXE uefi_amd64
    filename "grubx64.efi";
}

That should work if you’re booting via iPXE on those nodes.

1 Like

Like so?

Can you please link to the specific blog your talking about?

You mentioned dnsmasq which MAAS doesn’t use. I assume that means you are using an external DHCP server? In which case setting it as a DHCP snippet in MAAS would have no effect.

Here is the blog post I referred to. It makes a nice home setup. I use it with my nuc homelab.

http://chris-sanders.github.io/2018-02-02-maas-for-the-home/

Indeed it uses external DHCP. Both the blog post writer and I had an Asus router providing dhcp.

This is a tail of the dnsmasq.conf that ended up making UEFI PXE boot work on my setup…

dhcp-host=00:50:B6:BA:24:88,192.168.0.5
dhcp-boot=pxelinux.0,192.168.0.5
dhcp-match=set:efi-x86_64,option:client-arch,9
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-boot=tag:efi-x86_64,bootx64.efi,192.168.0.5

In this setup maas is given a static IP of 192.168.0.5

Important things are not forgetting the double commas… (dhcp-boot=tag:efi-x86_64,bootx64.efi->,<-192.168.0.5)

And with my particular setup dnsmasq.conf wipes all the custom lines if I go into the router web config and change things.

I think I can get persistence with jffs, put that’s for later…

Now, whenever I need to change it I open it with the text editor, add the vital lines for pxe forwarding and then issue cd /etc && killall dnsmasq && dnsmasq -C ./dnsmasq.conf to restart dnsmasq with the changes.

Thanks!

1 Like

Recently sprinting on trying to get external legacy pxe boot to work where maas is the external pxe/tftp server.

I am able to get external dhcp-relay (where maas is the primary dhcp server and relays dhcp to the bridge device on a switch) working using the following in my dnsmasq.conf on my switch:

interface=br66 
port=0
dhcp-relay=10.3.1.1,10.3.2.10

Where 10.3.1.1 (br66) is the network device in my switch that I am running dhcp-relay on, and 10.3.2.10 is my maas server (reachable, but on a different subnet/broadcast domain).

I would be really interested in knowing what configuration can be supplied to dnsmasq to enable using maas to provide external pxe/tftp/dhcp.

I was not able to get remote pxe/tftp working by adding the dnsmasq config

dhcp-boot=pxelinux.0,,10.3.2.10

from the blog.

Played around with a few other dnsmasq configs as well to no avail.

Thoughts? Ideas?

How about

dhcp-match=set:efi-x86_64,option:client-arch,9
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-boot=tag:efi-x86_64,bootx64.efi,10.3.2.10

?

I was having trouble with PfSense running DHCP and getting MAAS to PXE boot from it. The answer for me, seemed to be adding the following settings:
TFTP Server: xxx.xxx.xx.100
Network Booting: Enable
Next Server: xxx.xxx.xx.100
Default Bios Name: bootx64.efi
UEFI 32 bit File Name: bootx64.efi
UEFI 64 bit File NAme: bootx64.efi
(left Root Path empty).

Hope this helps. I’m going to post it as a solution on both forums in case other people are having google problems like I did.

1 Like

@db0west , @landers-robert et al (i.e. to anyone else landing here from search)

I just want to follow up here with some clarity and a message of success:

Why am I here?

  • It’s 2020 and you tried following along with this guide about PXE-booting nodes via MAAS without MAAS DHCP (perhaps because, like me, you found your need-to-be-commissioned nodes picking up DHCP addresses from your router at home before MAAS DHCP could snag them)
  • AND…
  • You’re now seeing this error/notice on your PXE-booted nodes that are running hardware from a non-legacy era (because hey - UEFI, consumer-grade hardware, 2020, etc. etc.):

PXELINUX 6.xx lwIP copyright H. Peter Anvin et al
Unable to locate configuration file

I want to fix this. What do I need?

  • MAAS-host’s IP address (e.g. 10.0.1.42)
  • Router stuff:
    – SSH access to your router enabled
    – JFFS “permanent storage” enabled (see an example for Merlin WRT)
    – JFFS “user scripts” enabled/allowed
  • Motherboard / hardware-node stuff:
    – Ensure your motherboard’s PXE-boot policy is set to “UEFI”
    – In my case (ASRock B450M), this was found in the “CSM” section on the “Boot” menu (end-user example)

What do I need to do?

  • Via SSH, log in to your router (same un/pw you use via UI; e.g. ssh user@10.0.0.1)
  • For ASUS Merlin WRT, you’ll likely land in /tmp/home/root#…
  • Execute the following commands:
cd /
nano jffs/configs/dnsmasq.conf.add
  • Add and save the following content dnsmasq.conf.add (replace 10.0.1.42 with your MAAS-host’s IP):
dhcp-boot=pxelinux.0,,10.0.1.42
dhcp-match=set:efi-x86_64,option:client-arch,9
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-boot=tag:efi-x86_64,bootx64.efi,,10.0.1.42
  • Restart dnsmasq service
service restart_dnsmasq

What the heck did I just do?

  • By default, most routers that facilitate SSH access to the router will only stow what you do in that session, well… in that session
  • To get your changes to persist, it is also common for routers that facilitate this type of interaction to also provide a way to “add” what you want atop the existing implementations
  • By creating the file dnsmasq.conf.add in root’s jffs/configs directory, you’re making it possible for your additions to persist (and apply) after events like a setting change on your router, a reboot of the router, etc.
  • You may have noticed the second , in some of those lines (e.g. pxelinux.0,,10.0.1.42). The commas are a delimeter. But what is that second command… delimiting? There’s nothing there. That’s exactly right; by adding that second comma, you’re conveying that you acknowledge that a hostname for the TFTP server/host is not supplied (and thus, ignored/skipped), favoring the IP address of the TFTP server instead (in this case, “TFTP Server” = MAAS)
  • In order to get your changes to apply immediately, in context of the steps above, you restarted the dnsmasq service

Now, you can leave the DHCP and DNS to your actual router and simply define a reserved range in MAAS (without MAAS DHCP enabled). Nice.

NOTE: At least in our case, there sometimes is a bit of a snag (“Unable to locate configuration file”)

1 Like

@knaledge, fantastic post. this is why we have forums! thanks.

1 Like

EDIT: The following issue outlined in this specific reply was ultimately solved/addressed here

Thanks @billwear! We’ve been finding the MAAS Discourse to be a great resource.

One thing that may benefit from your expertise here, if you have a moment: we notice that while the content of this thread does seem to (mostly) remedy the particular situation we (and others) have experienced, we still occasionally get the following:

Unable to locate configuration file.

What works

To be clear: when MAAS has full DHCP enabled on the fabric/vlan, things seem to work just fine. That, in combination with the (unofficial, but hopefully official soon!) patch for duplicate UUID really has made this, generally speaking, a breeze. Only thing we could benefit from would be the re-introduction of Wake-on-LAN support.

What may benefit from your help

It seems that when MAAS does not have DHCP configured (i.e. we followed OP-linked guide/tweaks here), we still encounter the OP issue - but, weirdly, only on the first 2-3 boots of the node. The node will boot, it spins on DHCP (PXE seek), finds MAAS - nice! - then after a few seconds we get “Unable to locate configuration file”

Now here’s where it gets interesting:

  • The oft-unmentioned “other prompt” in console, at that moment, is “Press any key to retry or reboot to try again” (rough quote from memory; you get the idea)
  • When we leave it alone (letting it trail through those “I’m gonna reboot…” ellipses), or reboot manually - on that second (or third+) reboot, the node picks up PXE from MAAS (just like before), and the config file is found, thus commissioning proceeds

Our network topology is perhaps straight forward (though open to critique/questions):

  • MAAS 2.8.1 (snap) running in a NAS-hosted VM
  • NAS (and underlying VM) makes use of its own “Virtual Switch” pointed to the NAS’ physical LAN adapter
  • NAS itself is physically connected to router
  • router running DHCP/DNS (hence OP/our reply)
  • nodes physically connected to unmanaged physical switch
  • All the above is configured to make to use of subnet 255.255.254.0 (thus allowing the NAS-hosted VM, where MAAS is running, to have the statically-assigned IP 10.0.1.42)

@billwear - any thoughts on this?

  • This really does feel like we may have a simple misconfig in dnsmasq.conf.add …
  • or perhaps the “location” of pxelinux.0 is somewhat different these days (i.e. MAAS 2.8.1 makes use of…? and thus must be accounted for by adjusting definition in the dnsmasq.conf.add file)…
  • or maybe we need to define something else, too in dnsmasq.conf.add…
  • or perhaps some other simple thing

Anything come to mind? We’re so close to having MAAS up and humming. So close!

@knaledge, can you post your dnsmasq config file? may need more information about how it fails, too, but start with the file, if you feel comfortable sharing it.

1 Like

and while I appreciate the compliments, rest assured there’s a whole team back here helping me, i just happen to be the one who patrols discourse lately. want to be fair here!

1 Like

@billwear - Definitely a team effort! Thanks for engaging here :slight_smile:

As mentioned above, dnsmasq.conf is being modified by dnsmasq.conf.add - so here’s the content of the dnsmasq.conf.add:

dhcp-boot=pxelinux.0,,10.0.1.42
dhcp-match=set:efi-x86_64,option:client-arch,9
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-boot=tag:efi-x86_64,bootx64.efi,,10.0.1.42

And as a result of service restart_dnsmasq, the physical router (ASUS AC86U) now has dnsmasq.conf as follows:

pid-file=/var/run/dnsmasq.pid
user=nobody
bind-dynamic
interface=br0
interface=pptp*
no-dhcp-interface=pptp*
no-resolv
servers-file=/tmp/resolv.dnsmasq
no-poll
no-negcache
cache-size=1500
min-port=4096
bogus-priv
domain-needed
dhcp-range=lan,10.0.0.42,10.0.0.84,255.255.254.0,86400s
dhcp-option=lan,3,10.0.0.1
dhcp-option=lan,6,10.0.1.42,0.0.0.0
dhcp-option=lan,252,"\n"
dhcp-option=lan,42,0.0.0.0
dhcp-authoritative
addn-hosts=/etc/hosts.dnsmasq
dhcp-host=24:5E:BE:99:88:77,set:24:5E:BE:99:88:77,10.0.0.42
dhcp-host=AC:BC:32:GG:QQ:XD,set:AC:BC:32:GG:QQ:XD,10.0.0.64
dhcp-host=44:2C:05:LO:LN:O1,set:44:2C:05:LO:LN:O1,10.0.0.44
dhcp-host=52:54:00:11:AA:BB,set:52:54:00:11:AA:BB,10.0.1.42
dhcp-name-match=set:wpad-ignore,wpad
dhcp-ignore-names=tag:wpad-ignore
dhcp-script=/sbin/dhcpc_lease
script-arp
dhcp-boot=pxelinux.0,,10.0.1.42
dhcp-match=set:efi-x86_64,option:client-arch,9
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-boot=tag:efi-x86_64,bootx64.efi,,10.0.1.42

@billwear - I have an update here. Success! I’ll post more in the AM (or well, later this morning at a more “normal” hour).

The influencers:

  • “CSM” (Compatibility Support Mode) on our motherboard (ASRock B450M mini-ITX) has an underlying option for “PXE OpROM Policy” - and it defaults to “Legacy Only”. What’s more, this section is tucked away under the “Boott” menu, away from any other PXE-related settings. Changed it to “UEFI Only”
  • In terms of dnsmasq.conf and PXE traffic: dhcp-match=set:efi-x86_64,option:client-arch,9 does not work with MAAS, at least not with our hardware. Only dhcp-match=set:efi-x86_64,option:client-arch,7 works (“BC”)

Need more info on why arch,9 does not seem to work. Observing RFC4578, it seems our hardware would fall under “x86_64”. Could this be a bug within MAAS? Or perhaps some odd handling upstream? Another thread out in the wild indicates that arch,7 working for us seems to be the expected behavior - since entities that handle PXE bootloading tend to more often than not treat “7” as "I said 'the real x86_64’ ".

The takeaway for us, and perhaps offering up some end-user insight: it’s unclear how we would have been able to intuit “arch,7” being the right choice. We don’t seem to be alone in that, either. I wonder - would this concept fit, at all, within MAAS documentation somewhere?

Perhaps with the above in mind, I see there is a thread linked from @landers-robert that does seem to focus in on “7 and 9” - shim vs. grub - and the thread ends on a somewhat dreary outlook for “bare metal” users :wink: haha. That said, I’m not at all convinced I quite understand what is being discussed in that thread, so I’ll defer to the judgement of others.

Anyhow, with those two modifications, and following along with the original contributions here (including my own - “team work”, as you mentioned :slight_smile: ) - MAAS now discovers and enlists our hardware nodes 100% reliably, every time (provided that we have @georcon 's hopefully-soon-to-be-official “patch” applied in order to avoid duplicate UUIDs) .

I’ll update all my contributions here tomorrow. What a journey!

1 Like

hey, @knaledge, well done! and please hold me accountable for explaining this better by filing a doc bug on “arch,7” – those get tracked and i get nagged, which is useful. fantastic work, though.

FTR on contibutions, all food comes to me… :wink:

1 Like