The Setup:
Host: MAAS 2.5beta4(all) from snap --edge --devmode on bionic Virtualbox VM, bridged networking to single interface in (Windows 10 1809) host)
Client: Virtualbox VM on same host, bridged networking to same interface. Promisc enabled on VM
Network:
Single subnet, MAAS providing DHCP. Rack Controller reports DHCPd is running.
The issue: Client begins to pxeboot, then fails to get the kernel:
Intel UNDI, PXE-2.1 PXE Software Copyright (C) 1997-2000 Intel Corporation Copyright (C) 2010-2017 Oracle Corporation
CLIENT MAC ADDR: 08 00 27 E4 3A ES GUID: CC069ECO-6523-469A-876C-60D 706868F23 CLIENT IP: 10.0.10.101 MASK: 255.255.255.0 DHCP IP: 10.0.10.9 GATEWAY IP: 10.0.10.1
PXELINUX 6.03 lwIP 20171017 Copyright (C) 1994-2014 H. Peter Anvin et al
Booting under MAAS direction... momodeset ro root=squash:http://10.0.10.9:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/squashfs ip=:::: maas-enlist:BOOTIF ip6=off overlayroot=tmpfs overlayroot.cfgdisk=disabled cc:{'datasource list': [ MAAS' ]}end cc cloud-config-url=http:
/10-0-10-0--24.maas-internal:5248/MAAS/metadata/latest/enlist-preseed/?op=get_enlist preseed apparmor=0 log_host=10.0.10.9 log port=5247
Loading http://10.0.10.9:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel... netconn_connect error -11
failed: No such file or directory
Coming at this fresh this morning, I have an inkling it might be firewall related, apologies for the (probable) false alarm. I’ll update when confirmed.
I’m no closer to figuring this out, but assume it’s something to do with the Windows VM host.
I tried using various networking configurations with VMWare, Virtualbox, and Hyper-V to install MAAS, then PXE boot - in each case they get connection refused trying to pull from the MAAS server on port 5248.
UFW is inactive on the Ubuntu installs, and Windows firewall is disabled. I do see a bunch of logspam on the MAAS server related to Apparmor, but it appears to all say ‘ALLOWED’ so I guess that’s just Snap doing it’s verbose thing. Googling around there seems to be some sort of thing about pxebooting VMs but I can’t get a clear answer, and it’s beyond my elementary understanding of this scenario. I know usually this all works great on my Linux dev cluster.
From the client output posted above it looks like the VM is getting DHCP and TFTP is working. Its failing getting the kernel over HTTP.
Are you using MAAS DHCP and bootloaders from images.maas.io? Starting with MAAS 2.5 we switched from using pxelinux to lpxelinux. lpxelinux allows us to grab the kernel and initrd over HTTP which improves performance.
Can you check that /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel exists on the rack controller? If not make sure your images are in sync by going to the controllers tab, selecting all controllers, and taking the import images action.
If you beleive this is a networking issue I would try booting the VM with the Ubuntu live CD. Once booted try running
seffyroff@ubujuju:~$ ls -la /var/snap/maas/current/var/lib/maas/boot-resources/current/ubuntu/amd64/ga-18.04/bionic/daily/
total 243324
drwxr-xr-x 2 root root 4096 Nov 1 21:40 .
drwxr-xr-x 3 root root 4096 Nov 1 21:40 ..
-rw-r--r-- 3 root root 57965798 Nov 1 21:20 boot-initrd
-rw-r--r-- 3 root root 8277752 Nov 1 21:20 boot-kernel
-rw-r--r-- 4 root root 182906880 Nov 1 21:20 squashfs
seffyroff@ubujuju:~$
I’m keeping all settings as close to defaults as possible to avoid any config creep issues reproducing this issue. So yes, using MAAS DHCP and default images source.
And for the livecd test, I see the same connection refused error.
Just to be sure I would try running the wget command in the VM running MAAS. If you can download it there something is blocking HTTP connections over port 5248.
Tested this again today, and made a single change - installed from packages instead of Snap, and it all works fine! So I guess those Apparmor messages in the logspam were relevant…
Hi again, I’m wondering about this once again! Are the snap changes to the nginx image downloader specific to rack or region in the snap, or is it more of a global change as the snap contains all operating modes?