@georcon - You have our most heartfelt gratitude, by the way. We too were like, “Oookaaay… so ‘commissioning failed…’ but only on ‘lshw’. Why?” Like the experience you had, ours was a bit arduous as well. Despite having found your (initial) thread, we went through the following before “figuring it out”:
- Re-commission node (hey, why not?)
- Delete then re-commission the node
- More searching via Google, calling it a day and eventually resigning to sleep
- Coming at it fresh, we then removed MAAS (2.7), re-built the MAAS-hosting VM, then deployed 2.8.1
- Commission - no go
- Then we thought, “Maybe it’s the physical hardware composition? Let’s move the troubleshooting-GPU to another node”
- Same result (no-go on commissioning)
We then finally examined the “UUID” when parsing the logs - and there it was. On MAAS 2.7, we captured the ‘lshw’ commissioning log for “perceived-as-bad Node1” and then compared it line by line to “perceived-as-bad, somehow, Node2” in MAAS 2.8.1
Duplicate UUID. Your initial thread made sense, doubly-so for your follow-up thread - and here we are.
In our case, we’re making use of MAAS 2.8.1 on a VM (Ubuntu Server 20.04), and we have 6x physical nodes comprised of:
- ASRock B450M (mini-ITX; FW: P3.30)
- AMD Ryzen 5 1600 (AM4; 6c/12t)
- G.Skill DDR4 32GB (2x16GB)
- ADATA SX8200 Pro (nVME; 512GB)
- Meanwell 200 PSU
- Custom power controller (160W DC-DC)
Some (if not all) of that hardware, in combination, results in an identical UUID of “03000200-0400-0500-0006-000700080009” for each node.
We realize our cluster is certainly not the Enterprise-class hardware that MAAS is capable of handling, though it’s a testament to the endeavors of the MAAS-related folks that it has gotten us this far.
And that’s what makes this all the more desirable to see a solution become first-class within MAAS itself: from consumer–grade to production-grade, MAAS is more than capable. If these duplicate UUIDs could be remedied by some help from MAAS itself, we’d be golden (and likely many others too, including those that are prototyping with Enterprise-class hardware).
@georcon - is it possible to employ your patch without needing to start from scratch on MAAS? (init, etc.) Snap is a bit new to us, so we’re unsure how to “rebuild” the Snap (as suggested in that follow-up thread of yours). Your suggestion here (wrt Snap) seems viable in lieu of an official patch/etc., even though init’ing would be ideal to avoid.
If not, so be it! Your contributions are appreciated, and it’s helped us a ton. So thank you, either way.