My Ultimate Guide to launch an ESXI 7.0 machine with vCenter 7.0 integration and MaaS 3.4 LAB ENVIRONMENT

Hi,


This post serves the purpose just to explain how I managed to successfully launch an ESXI 7.0 Machine integrated with vCenter using the latest MaaS version (3.4).
Disclaimer: LAB SETUP NOT PRODUCTION and Specific to my own requirements. Your results/setup may vary.


1- Setup Description:

Machines:
HP DL380 Gen 10 Plus Server with UEFI only enabled
NICs: - 2 for Provisioning Untagged
- 2 for External Access/VMKernel with VLAN
- 1 ILO Untagged
Specifics: Vt-d Enabled


MaaS VM:
OS: Ubuntu 22.04
vNICs: - 1 Network for Provisioning
- 1 Network for ILO Access
- 1 Network for External Access/VMKernel
Specifics:
- Nested Virtualization enabled
- Performed NAT configuration to NAT between interface External to Provisioning (due to this Esxi vmkernel interface selection) since I wanted to use both Provisioning and External interface on ESXi but since we don’t have a way to inform which interface is used by ESXI for the VMKernel I just left the external interface configured on MaaS. After this I had one problem which was through the External Interface I couldn’t reach the provisioning Network where MaaS is configured ( for the machine to fetch the Metadata for instance) and since I didn’t want to directly connect the External and Provisioning networks I just used NAT on the MaaS VM.


2- Maas ESXi Packer Image Files Preparation:
Using the MaaS VM clone the repo https://github.com/canonical/packer-maas.git (for future reference this is the commit I used: git checkout cb16c6a ) and just follow the steps at https://github.com/canonical/packer-maas/tree/main/vmware-esxi with the following changes below on the Customizing Image part:

  • To solve the issue below when running the image and connecting to vCenter:
    EXAMPLE of the error if this change isn’t made:
    Collecting the logs inside ESXI SSH:
tail -100 /var/log/maas.log
INFO: Writing file /altbootbank/maas/vcenter.yaml
Traceback (most recent call last):
  File "/altbootbank/maas/vcenter", line 27, in <module>
    import requests
  File "/vmfs/volumes/6718b4f4-0647aa2c-f383-a0286e3511c9/maas/requests/__init__.py", line 43, in <module>
    import urllib3
  File "/vmfs/volumes/6718b4f4-0647aa2c-f383-a0286e3511c9/maas/urllib3/__init__.py", line 41, in <module>
    raise ImportError(
ImportError: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.0.2zh-fips  30 May 2023'. See: https://github.com/urllib3/urllib3/issues/2168


We must change back the version of the requests python package to an old one (introduced in this PR: https://github.com/canonical/packer-maas/pull/114)
On packer-maas/vmware-esxi/requirements.txt :
requests==2.25.1


.

  • Next in addition to whatever changes you need to do (change the serial number, set root password etc) due to this (Esxi vmkernel interface selection), vCenter reachability (DNS server) and enabling ssh I also had to add the configuration below to KS.cfg:

On packer-maas/vmware-esxi/KS.CFG after the %firstboot section:

# Wait for the connectivity to be up
sleep 60

# Add route to MaaS VM in order to NAT traffic from the VMKernel Network to the Provisioning Network to reach MaaS
esxcfg-route -a MAAS_IP_PROVISIONING_NETWORK/32 MAAS_IP_VMKERBEL_NETWORK

# Set the DNS
esxcli network ip dns server add --server=DNSIP 

# Enable SSH
vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh

# Enable ESXi shell
vim-cmd hostsvc/enable_esx_shell
vim-cmd hostsvc/start_esx_shell


.

  • Other issue that is already identified is that there were some changes on vCenter 7.0 API that broke the packer-maas/vmware-esxi/maas/vcenter script (https://github.com/canonical/packer-maas/pull/31). So I just performed the changes suggested in the pull request (for now I don’t need retrocompability)

    .

  • Finally I just changed the storage script to allow for additional created datastores (through MaaS) to be extended automatically (WARNING: rough code with no production quality):

On packer-maas/vmware-esxi/maas/storage-esxi:

Changed the extend_default function to:

def extend_default(disks, vmfs_datastores):
    """Extend the default datastore if no VMFS config is given."""
    dev_path = None
    part_num = 0
    part_start = 0
    part_end = 0
    last_end = 0
    datastores_to_extend = [{'name': vmfs_datastore["name"]} for vmfs_datastore in vmfs_datastores.values()]
    volumes = check_output(["esxcli", "storage", "vmfs", "extent", "list"])
    extend_vmfs = True
    for volume in volumes.decode().splitlines():
        volume = volume.split()
        for datastore in datastores_to_extend:
            if volume[0] == datastore['name']:
               datastore['dev_path'] = "/vmfs/devices/disks/%s" % volume[3]
               datastore['part_num'] = volume[4]

    for datastore in datastores_to_extend:
       p = Popen(["partedUtil", "fixGpt", datastore['dev_path']], stdin=PIPE)
       p.communicate(input=b"Y\nFix\n")

       # Get the sector the partition currently starts on.
       part_info = check_output(["partedUtil", "get", datastore['dev_path']])
       for part in part_info.decode().splitlines():
           if extend_vmfs and part.startswith("%s " % datastore['part_num']):
               datastore['part_start'] = part.split()[1]
               break
           else:
               last_end = part.split()[2]

       # Get the last sector of the disk to extend the datastore to.
       part_info = check_output(["partedUtil", "getUsableSectors", datastore['dev_path']])
       datastore['part_end'] = part_info.decode().split()[1]

       datastore['vmfs_part'] = "%s:%s" % (datastore['dev_path'], datastore['part_num'])


       if extend_vmfs:
           check_call(
               ["partedUtil", "resize", datastore['dev_path'], datastore['part_num'], datastore['part_start'], datastore['part_end']]
           )
           check_call(["vmkfstools", "--growfs", datastore['vmfs_part'], datastore['vmfs_part']])
       else:
           part_start = str(int(last_end) + 1)
           check_call(
               [
                   "partedUtil",
                   "add",
                   disk["path"],
                   disk["ptable"],
                   # partedUtil expected this as one argument
                   "%s %s %s AA31E02A400F11DB9590000C2911D1B8 0"
                   % (part_num, part_start, part_end),
               ]
           )
           check_call(
               ["vmkfstools", "-C", "vmfs6", "-S", "datastore1", vmfs_part]
           )

And added this line (extend_default(disks, vmfs_datastores)) to the main function:

    else:
        process_disk_wipes(disks)
        partition_disks(disks, partitions)
        mkvmfs(disks, partitions, vmfs_datastores)
	    extend_default(disks, vmfs_datastores)

    info("Done applying storage configuration!")


if __name__ == "__main__":
    main()



3- Maas ESXI Image Creation and Upload:

  • To create the image:
    Since I installed packer through the manual method (unzipping the newest release) the make command doesn’t work since dpkg doesn’t find the package:
dpkg-query: package 'packer' is not installed and no information is available
Use dpkg --info (= dpkg-deb --info) to examine archive files.
make: *** [Makefile:15: check-deps] Error 1

As such used the packer method do create the image:

dos2unix KS.CFG  #had some issues in this file with tabs/spaces and encoding which this command should solve
make scripts.tar.xz  #had to create this since I didn't use the Makefile
sudo packer init .
sudo PACKER_LOG=1 PACKER_LOG_PATH="/home/USER/packerlog.txt" packer build -var 'vmware_esxi_iso_path=VMware-ESXi-7.0.3-22348816-HPE-703.0.0.11.5.0.6-Oct2023.iso' .
sudo chown USER:USER vmware-esxi.dd.gz  #to solve some issues with upload permissions to MaaS I set the owner to the current VM non-root user



- To upload the image:

maas login admin https://MAASFQDN/MAAS/api/2.0/ $(head -1 api-key-file) -k
maas admin $PROFILE boot-resources create name='esxi/7.0' title='VMware ESXi 7.0' architecture='amd64/generic' filetype='ddgz' content@=vmware-esxi.dd.gz -k

NOTE: if you are uploading multiple times the ESXI image (for tests, or trying to make it work) sometimes it seems that if the multiple attempts have the same image name it isn’t replaced so I just slightly change the name of the image at each upload



4- MaaS ESXi Deployment:
Prior to the deployment I still had to perform two changes on MaaS Machine:

  • Due to this Esxi vmkernel interface selection I left only on MaaS Machine the External/VMKernel interface configured and as such I had to set the default provisioning interface to unconfigured but due to this bug Bug #2036405 “cannot unlink subnet after commission” : Bugs : MAAS I couldn’t so I just did what I suggested on the latest reply to the bug:
    What i did meanwhile (not ideally but for me it solves the question) is just change the vlan of the PXE interface (ens1f0 in my case) to another dummy vlan (which also sets the PXE interface to unconfigured) and eventually change it back to the provisioning vlan (or just let it be on the dummy one):
ens1f0_interface_id=$(sudo maas admin interfaces read $system_id -k | jq -r '.[] | select (.name=="ens1f0") | .id')
dummy_vlan_id=$(sudo maas admin vlans read 3 -k | jq -r '.[] | select (.name=="dummy") | .id')
provisioning_vlan_id=$(sudo maas admin vlans read 0 -k | jq -r '.[] | select (.name=="untagged") | .id')
sudo maas admin interface update $system_id $ens1f0_interface_id vlan=$dummy_vlan_id-k &>/dev/null
sudo maas admin interface update $system_id $ens1f0_interface_id vlan=$provisioning_vlan_id -k &>/dev/null (optionally change it back)



- Due to this bug Vcenter_registration variable question - #5 by gperry , Bug #1982484 “vcenter.yaml not being created due to possible vce...” : Bugs : MAAS :
Thanks to @gperry for the insights on this. Since I didn’t want to create the Machines by hand but have them autodiscovered I opted for the DB change approach which is less than ideally but it solves the issue on a LAB environment:

chassis_serial=$(sudo maas admin machines read hostname="MACHINENAME" -k | jq -r '.[] | .hardware_info.chassis_serial')
node_id=$(sudo -i -u postgres psql -d maas -c "SELECT node_id FROM maasserver_nodemetadata WHERE key = 'chassis_serial' AND value = '$chassis_serial'" -t -A)
sudo -i -u postgres psql -d maas -c "INSERT INTO maasserver_nodemetadata (created, updated, key, value, node_id) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'vcenter_registration', "false", $node_id) ON CONFLICT (node_id, key) DO UPDATE SET updated=excluded.updated, value=excluded.value;" &>/dev/null



Finally for the deployment I just Deploy the ESXI Image with a single External Interface and added more datastores other than the default one.


5- Pending Issues:
Ability to have a BOND interface with VLAN on esxi (by quickly looking at the maas/netplan-esxi code it doesn’t seem it handles that situation) and it just creates the BOND on esxi boot but doesn’t set the VLAN.
For now I just reverted to using a single interface with VLAN.



Thanks and hope this post helps someone.
BR

2 Likes

@jonaspaulo, This tutorial presents a thorough overview of various aspects. Just noticed it doesn’t touch on VMware’s no-go on cloning boot devices (https://kb.vmware.com/s/article/84280) because of vmfs corruptions. Any thoughts on tackling that specific challenge?

Hi @trsoumi88,

I haven’t tackled that one and not come across any corruption issues until know. But i will check when possible two of my machines and see if they have the same uuid.
Thanks

1 Like