Add a script as a pre and post step to all actions that require a managed server PXE boot

petermakowski · 30 January 2024 07:16

Issue

Network provisioning with PXE boot could be a production vulnerability. An (internal) attacker could introduce a second DHCP server
and a destructive image on the network, causing all or part of the servers to reboot, thus destroying partially an information system.

Expectations

So, to reduce this risk, the aim would be to execute a POST script after any action requiring a PXE boot, which could turn off the network port, change its vlan ID and/or disable the PXE boot on the managed server. And a PRE script reapply the configurations/conditions to enable PXE boot. Both scripts must of course be written and implemented by MaaS administrators (end user).

As the maas user is already a BMC administrator, the id and password of this account could be supplied as a parameter (or environment) to these scripts, to take advantage the existing secret storage security.
Also the inventory of the managed server has the be accessible/provided to the scripts to manage network devices with discovered parameters (device/port).

Last point : the two scripts have to be executed on the rackd component to be able to reach the BMC network interface.

This was discussed with Aymen FRIKHA from CANONICAL aymen.frikha@canonical.com

Originally reported by https://github.com/pduveau on GitHub: https://github.com/canonical/maas.io/issues/816

troyanov · 30 January 2024 07:55

An (internal) attacker could introduce a second DHCP server

MAAS can scan for rogue DHCP server. We can extend this to: if there are any rogue DHCP servers discovered, MAAS won’t allow any power management via MAAS until administrator will mark it as a known DHCP (for example)

causing all or part of the servers to reboot, thus destroying partially an information system.

I don’t understand this part. How rogue DHCP server will cause servers reboot?

r00ta · 30 January 2024 13:05

I think this scenario is covered by the already ongoing work to secure the boot process of the machines.

In order to share the workflow we are evaluating with the community you can take a look at this chart

In short, the server would trust only images coming from the server with the certificate that was set in UEFI.

Side note: if a server does not support certificate pinning in UEFI, it’s actually impossible to prevent the machine from loading a malicious image from a malicious DHCP server.

phduveau · 1 February 2024 15:17

Hello,
In fact, disabling dhcp boot in the managed server is highly recommended by our security team. They have not shared the detailed threat scenario with me. I think the goal is to minimize the attack surface usable by an internal attacker.
The UEFI HTTPSBoot mechanism is a good approach, but it involves an initial manual action by the administrator that we’d like to avoid. Of course, it can be scripted, but it’s not totally in line with our objectives.
With DC distributed in 3 countries with 3 different local teams, we’re looking for an almost 100% automated provisioning process. The servers will be delivered already configured by the manufacturer (known initial password for BMC, pxe configuration…).
The introduction of pre/post home-made scripts would enable us to ensure this level of automation, while complying with security recommendations.

r00ta · 1 February 2024 16:13

Hi @phduveau , thanks for sharing your setup!

Could you please elaborate more what should trigger the execution of POST script(s)? What are the conditions to trigger these scripts?

phduveau · 1 February 2024 16:54

The pre script should :

change the VLAN id of the ports of the switches (we got infos on the port with LLDP during hardware discovery by MaaS)
activate PXE through BMC (IPMI or Redfish (the target)) on the network card connected to modifies port.

The post script should reverse what was done with pre script. The target VLAN ID has to be previously provided to MaaS from the inventory file and CLI. The main action is the PXE desactivation.

We just need to provide those scripts to MaaS. We do not expect MaaS to do all the job by itself.

The condition is the action done by MaaS requires a PXE boot.

r00ta · 1 February 2024 17:31

Confirming my comprehension: Is the sequence you envision as follows?

The user requests a deployment on the machine.
MAAS initiates the deployment, powering on the machine via BMC.
The machine boots up, acquires an IP from a DHCP server, and begins downloading images from the MAAS rack.
MAAS detects that the machine is PXE-booting and executes scripts based on your customized logic.

Right?

phduveau · 1 February 2024 18:00

Hello,
Not exactly: The script cannot be executed by the managed server, it’s too late. The script must be executed by the MaaS rack machine.
The sequence is as follows from my point of view:

The user requests deployment on the machine.
MaaS determines which rack to use.
MAAS “transfers” the pre and post scripts to the rackd instance.
Rackd executes the pre-script
If successful, Rackd continues with the standard steps as requested by the user.
After the last standard step (successful or not), Rackd terminates by executing the post-script.

The managed server is not involved in these scripts.

r00ta · 1 February 2024 18:22

Unfortunately this is not going to solve your security problem at all. An attacker could easily bypass all the kind of security checks you would run in the rack to try to detect a malicious DHCP server on the network. PXE is not secure by design and IMO this is not the right way to tackle your scenario (which is 100% legit).

As I said, the right solution for this scenario is to use HTTPS boot with trusted certificates.

However, I’m wondering if this “pre-flight” deployment check could be useful for other scenarios. Do you have anything else in mind that you would like to run there? Is there any other security threat you are trying to mitigate in addition to the malicious DHCP server?

phduveau · 1 February 2024 18:45

We know that DHCP is unsecured. We just do not want to leave servers in a configuration where each boot sequence can boot on an unwanted boot image.

r00ta · 2 February 2024 16:50

I understand that in your threat model you might identify that keeping PXE as first boot entry is a risk. I’d even say that using PXE in general is a threat that you must analyze.

Now, let’s assume MAAS supports HTTPS boot, you keep HTTPS boot as first entry and you remove PXE boot entirely from the picture, would your security team identify that as a threat?