MAAS Site Manager
As a company grows, the number of MAAS regions and data centres also expands simultaneously. In the last release cycle, we’ve done extensive research to understand the various ways that different companies are using MAAS, and this cycle we explored how our in-house team is using MAAS. Our goal is to create a solution for MAAS users that need to scale and manage their regions efficiently. From our research, wwe were able to categorise the usage into 3 categories.
Using AWS as a Region controller
The first way our users set up MAAS aims to remove the need to manage Postgres themselves and create a semi-centralised MAAS setup by installing region controllers on AWS. In this example scenario, there are 4 MAAS instances distributed by 4 geographical locations in the world. So the MAAS instance acts as a PoP for all data centres and there are about 20-60 machines per data centre. In total they are managing around 200-500 machines per PoP.
One of the biggest pain points here is Image Management, especially, custom images.
The current way to manage images requires you to upload the images to the region, then the images will get served to RDS. From here, there is a synchronisation process between MAAS and RDS, so all the rack controllers in the PoP will have to download all the images back from that database. Then when you go into the next PoP, you need to go through the exact same step.
There are 2 problems here:
Dealing with images is manual and redundant
There is a lot of traffic of the same data being stored multiple times in the Database
Database cost in the cloud is expensive
The Telco use case
The Telco cloud use case is interesting because there are different purposes for Bare Metal that we have identified. The first part is the mini sites called MEC and RAN, where they install cell towers to the base stations. There are between 5-15 machines per site and one site represents 1 MAAS instance. In the current setup above we can anticipate to grow these mini sites to around 60K sites all over the country. The second part is their core network, where they need this to run their services. In the example setup, there are between 100-300 machines per Core site and one core site could be governed by 1 or 2 MAAS instances.Given that this is the scale, image management is still a problem, but instead of dealing with 4 MAAS instances, we’re talking about 25 core cloud sites + 60K mini sites. Working images the current way is quite difficult.
Aside the image management, Telco users also really care about operation, monitoring and troubleshooting guidelines as they have a very strict protocol for how to fix things and keeping everything in a state where it is well-maintained, because if there is a power outage, it would be catastrophic if our phones don’t work.
So the important value here is they need to be able to plug any kind of manager to integrate with any systems like MAAS into their NOC.
Using MAAS as a Region and Rack controller
The last use case is our normal use case where they use MAAS as a Rack+region controller. One MAAS region represents 1 Data Centre. In some environments, a company may set up multiple MAAS instances to represent one data centre for different purposes. For instance, one data centre can have 3 MAAS region setups for user testing purposes, development testing purposes, and production. In addition, to make this a High Availability setup, we can also set 3 rack+region controllers for 1 MAAS.
Given what we have learned from this scenario, our current plan is to focus on three big features for this project. There are 3 phases of the MVP that are important for us. The first phase and our current focus is on region management and configuration. (As shown in the number badges below.)
Region management and configuration
Our earliest and roughest concept design focuses on how to display the right information for region overview. Assuming a user has multiple regions per country, the card below represents different locations in that country and the number of MAAS sites in that location. A user may manually configure the type of MAAS site that exists in that location. For instance, in the card below, there are 2 types of sites; micro sites (3-10 machines per site) and core sites (25-100 machines for core computation).
A user can then drill down into a specific location to find out information about a specific MAAS region. The image below shows a mock information for inactive micro sites that are located in East England.
How do you set up your MAAS differently? What information might you find useful to see the overall picture of all your MAAS regions?
Drop your comments below to help us improve MAAS.