Enlistment port restriction

PS :

uname -a
Linux maas2 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:41 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
lsb_release --all
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

@mkl1, sorry this one fell thru the cracks. did you get it resolved yet?

Hello Bill,

I was off :wink:

We have identified some restictions linked to our level two networks that include a lot of different variations of spaning tree protocols that give some link loss multiple time during different pahses on the interface(s) that are used to boot from the network. (BIOS/EFI/PXE/GRUB/KERNEL…)
We are still strugling to find a solution, maybe in grub to repeat, retry, wait, whatever, for the process to always work.

The admin port was directly connected to our maas prototype, but plugin it to the network show the same issues.

That would be great if you could lead me to the path inside the maas of the grub.cfg file loaded by the maas so we could study it and increase it’s reliability during network boot in case of network failures.

Not sure then if we will not meet other booting vs network availability issues.

Thanks in advance,
Have a nice day,
Best Regards,
Mickaël.

PS: T
Then quid 802.1x… I have some work to do ;-(

@mk1, I’m not quite sure that you’re asking here. can you try asking again?

Hello Bill,

I hope you’re doing fine !

The issue is that during the differents phases of PXE booting we lost network.
(As explained, maybe due to spanning tree vs net driver load, ects…)
And so, sometime we end up at he grub prompt forever.
We would like to read the grub configuration file posted durng PXE boot to try to increase it’s realiability.
Where is(are) located, under a MaaS server, the grub.cfg file(s) posted during PXE booting ?

Thanks in advance,
Have a nice day,
Best Regards,
Mickaël.

ah, i get it. i’m asking around. let you know.

-best.

Hello Bill,

Not sure why, I have to check if no one change my network environment, but machines are not booting anymore and I get stuck in grub>.

So I took the time to look around :

env
prefix=(tftp,rack-ip)/grub

ls
(memdisk) …
ls (memdisk)/
grub.cfg

cat (memdisk)/grub.cfg
if [ -e $prefix/x86_64-efi/grub.cfg ]; then
source $prefix/x86_64-efi/grub.cfg
elif [ -e $prefix/grub.cfg-amd64 ] ; then
source $prefix/grub.cfg-default-amd64
else
source $prefix/grub.cfg
fi

I really don’t like much this default grub, which is so much prone to network failures.
I also can’t find it in MaaS cause I guess it is build on the fly.
I would rather see a kind of while, not if…

Could you please makes some trials on networks with spannnig tree enabled so you can encounter the issues? And maybe open a case?

Thanks in advance,
Have a nice day,
Best Regards,
Mickaël.

Hi @mkl1,

You mentioned that when trying to commission your machines are “getting stuck on grub”. If you watch the boot process, are they getting stuck on “Fetching netboot image”?

I noticed in the spec sheet of your server that the ethernets are 10G intel (X722 controllers). There is a race condition bug on some intel NICs (#1437353). I have personally seen it on i350s and X540s and have seen it in the forums recently. I don’t know if it’s applicable to the X722 controllers, but the solution has been to flash the firmware with intel’s flash utility.

Hello,

We are stuck in "grub> " cause, my guess is thet the grub is not able to fetch $prefix/whatever…
And never ever retry, which is stupid…

I will look into intel Ethenet chip race condition.
But I guess the issue is spanning tree…
If you would have read the whole thread…

Have a nice day,
Best Regards,
Mickaël.

well, @mk1, for some odd reason, i decided to draw a very crude picture:

]

on a snap, the grub file (if that’s what it’s using) would be located in:

/var/snap/maas/common/maas/boot-resources/current/bootloader/

i don’t have a package install of MAAS handy atm, but you can find the bootloader directory easily enough with:

find / -name bootloader -print 2>/dev/null

you’ll need to figure out which bootloader you’re using, as i think there are 3 directories under that one, each with some number of bootloaders, possibly.

Hello Bill,

Thanks for your answer.

Still not helping cause I guess grub.cfg is embedded in the grubx64.efi.
So I could not do any trials modifying the grub.cfg.

Another path, compare to the logs, maybe the lease time for PXE are too short.

Any CLI commands exemple to handle lease times for PXE and then for exploitation ones?

Thanks in advance,
Have a nice day,
Best Regards,
Mickaël.

@mkl1, i’m not sure i know of any, but i’ll ask around real quick.

i think you’ll have to construct a dhcp snippet to handle that for you – it’s a standard thing, not something MAAS provides a handle for, but you should be able to pull it off.

Thanks for your answer Bill.

But :

sudo grep lease /var/lib/maas/dhcpd.conf
# Shorter lease time for PXE booting
   default-lease-time 30;
   max-lease-time 30;
# Define lease time globally (can be overriden globally or per subnet
default-lease-time 600;
max-lease-time 600;

How to build the snippet only for PXE?
I tried to use the CLI to get the info (using -h --help) but it does not provide me with good information on how to build the snippets…

I just detected something doing wireshark inspection.

It look like this :
https://osqa-ask.wireshark.org/questions/22519/tftp-transfer-option-negotiation-failed-error-8-packet-trace/

Why MaaS TFTP server build answers that contains field not requested by the clients ?

@mkl1, hmm…

@mkl1, wdym in this particular statement?

Hello Bill,

TFTP transfert on our systems always retry, which we don’t understand why.
So what I had done is a wireshark trace during TFTP phases which looks like this :
https://osqa-ask.wireshark.org/questions/22519/tftp-transfer-option-negotiation-failed-error-8-packet-trace/
(Maybe not specifically about the field mentionned here)

Meaning the field negociation may not fully repect RFCs, which in turn for somes strictly coded clients, ( or bad coded ones ? I may have been miss leaded) drop and retry.

Have a nice day,
Best Regards,
Mickaël.

Hello Everyone,

Just to close this, yes there is maybe a little bug somewhere in the TFTP client (Intel) or server (MaaS) code but my issue was mainly because I was trying to use the second BMC interface.

Solved there :

Have a nice day,
Best regards,
Mickaël.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.