Our nodes were set to quick erase.
I realized a week ago that our nodes deployed but wonky things were happening when it came to software deployments via cloud-init; things I’d never seen before. I chalked the issue up to our software, which is in alpha and put the project down.
Yesterday I picked the project back up and started digging into it. I learned that cloud-init was failing to run, which led me to the problem! The machines don’t do anything during quick erase or in other words, the last image burned in is still on the machine after a release/deploy, so the last cloud-init, which has already run does not do anything to deploy our software.
So, I have tried a couple of things. I tried switching from quick erase to full erase. No dice. I wondered if it might be the fact that we’re using LVM storage. So I changed it to flat and that didn’t work (using full erase).
Basically, nothing is working. So I watched the console of an erase session. DHCP is failing to serve up whatever image is used for erasing, I think, so the machine is falling back to the installed OS, which is why we are still seeing the old OS every reboot.
Any idea how to approach this?