[2.5 beta4] Issue PXE Booting Virtualbox VMs

seffyroff · 31 October 2018 20:36

The Setup:
Host: MAAS 2.5beta4(all) from snap --edge --devmode on bionic Virtualbox VM, bridged networking to single interface in (Windows 10 1809) host)
Client: Virtualbox VM on same host, bridged networking to same interface. Promisc enabled on VM
Network:
Single subnet, MAAS providing DHCP. Rack Controller reports DHCPd is running.

The issue: Client begins to pxeboot, then fails to get the kernel:

Intel UNDI, PXE-2.1 PXE Software Copyright (C) 1997-2000 Intel Corporation Copyright (C) 2010-2017 Oracle Corporation
CLIENT MAC ADDR: 08 00 27 E4 3A ES GUID: CC069ECO-6523-469A-876C-60D 706868F23 CLIENT IP: 10.0.10.101 MASK: 255.255.255.0 DHCP IP: 10.0.10.9 GATEWAY IP: 10.0.10.1
PXELINUX 6.03 lwIP 20171017 Copyright (C) 1994-2014 H. Peter Anvin et al
Booting under MAAS direction... momodeset ro root=squash:http://10.0.10.9:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/squashfs ip=:::: maas-enlist:BOOTIF ip6=off overlayroot=tmpfs overlayroot.cfgdisk=disabled cc:{'datasource list': [ MAAS' ]}end cc cloud-config-url=http:
/10-0-10-0--24.maas-internal:5248/MAAS/metadata/latest/enlist-preseed/?op=get_enlist preseed apparmor=0 log_host=10.0.10.9 log port=5247 
Loading http://10.0.10.9:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel... netconn_connect error -11
failed: No such file or directory

seffyroff · 1 November 2018 16:43

Coming at this fresh this morning, I have an inkling it might be firewall related, apologies for the (probable) false alarm. I’ll update when confirmed.

seffyroff · 1 November 2018 21:53

I’m no closer to figuring this out, but assume it’s something to do with the Windows VM host.

I tried using various networking configurations with VMWare, Virtualbox, and Hyper-V to install MAAS, then PXE boot - in each case they get connection refused trying to pull from the MAAS server on port 5248.

UFW is inactive on the Ubuntu installs, and Windows firewall is disabled. I do see a bunch of logspam on the MAAS server related to Apparmor, but it appears to all say ‘ALLOWED’ so I guess that’s just Snap doing it’s verbose thing. Googling around there seems to be some sort of thing about pxebooting VMs but I can’t get a clear answer, and it’s beyond my elementary understanding of this scenario. I know usually this all works great on my Linux dev cluster.

ltrager · 2 November 2018 00:32

From the client output posted above it looks like the VM is getting DHCP and TFTP is working. Its failing getting the kernel over HTTP.

Are you using MAAS DHCP and bootloaders from images.maas.io? Starting with MAAS 2.5 we switched from using pxelinux to lpxelinux. lpxelinux allows us to grab the kernel and initrd over HTTP which improves performance.

Can you check that /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel exists on the rack controller? If not make sure your images are in sync by going to the controllers tab, selecting all controllers, and taking the import images action.

If you beleive this is a networking issue I would try booting the VM with the Ubuntu live CD. Once booted try running

$ wget http://10.0.10.9:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel

seffyroff · 2 November 2018 05:16

seffyroff@ubujuju:~$ ls -la /var/snap/maas/current/var/lib/maas/boot-resources/current/ubuntu/amd64/ga-18.04/bionic/daily/
total 243324
drwxr-xr-x 2 root root      4096 Nov  1 21:40 .
drwxr-xr-x 3 root root      4096 Nov  1 21:40 ..
-rw-r--r-- 3 root root  57965798 Nov  1 21:20 boot-initrd
-rw-r--r-- 3 root root   8277752 Nov  1 21:20 boot-kernel
-rw-r--r-- 4 root root 182906880 Nov  1 21:20 squashfs
seffyroff@ubujuju:~$

I’m keeping all settings as close to defaults as possible to avoid any config creep issues reproducing this issue. So yes, using MAAS DHCP and default images source.
And for the livecd test, I see the same connection refused error.

ltrager · 2 November 2018 07:48

Just to be sure I would try running the wget command in the VM running MAAS. If you can download it there something is blocking HTTP connections over port 5248.

seffyroff · 2 November 2018 17:26

Thanks for the responses, Lee. I really appreciate you taking the time.
Here’s the wget command on the MAAS VM:

seffyroff@ubujuju:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:0a:40:04 brd ff:ff:ff:ff:ff:ff
    inet 10.0.10.9/24 brd 10.0.10.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fe0a:4004/64 scope link
       valid_lft forever preferred_lft forever
seffyroff@ubujuju:~$ wget http://10.0.10.9:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel
--2018-11-02 17:24:45--  http://10.0.10.9:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel
Connecting to 10.0.10.9:5248... failed: Connection refused.
seffyroff@ubujuju:~$ wget http://127.0.0.1:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel
--2018-11-02 17:24:47--  http://127.0.0.1:5248/images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel
Connecting to 127.0.0.1:5248... failed: Connection refused.

seffyroff · 5 November 2018 22:03

Tested this again today, and made a single change - installed from packages instead of Snap, and it all works fine! So I guess those Apparmor messages in the logspam were relevant…

mpontillo · 7 November 2018 17:40

I think you’re actually hitting a known bug. Sorry about that! Looks like we’re targeting a fix for the 2.5.0 final release.

seffyroff · 19 November 2018 05:44

I’m hitting this bug again, and was wondering if there’s a known workaround until this gets some love?

seffyroff · 14 December 2018 05:31

Happy holidays all, I hope this issue gets some love in the new year.

seffyroff · 19 January 2019 19:59

Hi again, I’m wondering about this once again! Are the snap changes to the nginx image downloader specific to rack or region in the snap, or is it more of a global change as the snap contains all operating modes?

system · 8 September 2020 19:27

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.