Here at DIS, we are big fans of GitLab, and we use it for nearly all parts of our DevSecOps/MLOps lifecycle. This includes Continuous Integration and Continuous Delivery (CI/CD): the practice of constantly merging developers' updates to software, and testing it in an automated fashion.
We are a small company with limited resources, and so to be able to deliver complex and capable systems at pace, we have to maximise the impact and value we get from the tools we have.
One way we have done this is through using autoscaling compute resources to run our CI/CD; this article gives a little bit of detail in how this can be done to allow surges in capacity when required, whilst minimising cost and maintenance overheads.
Test everything, all the time
We have a mantra of "test everything, all the time", and through good CI/CD practices we can achieve that, speeding up our time to implement new features and reducing the likelihood that the new stuff breaks the old stuff.
This involves setting up a number of "jobs" (e.g. build, test, security scanning, deployment) to run every time a changeset is pushed up to our GitLab server by a developer. These jobs are grouped into "pipelines", and in the GitLab ecosystem are run by a piece of software called GitLab Runner.
The results of these tests are used to help colleagues review proposed changes, and ensure that they achieve what they are supposed to without introducing unwanted effects into the system.
We're going to need a bigger computer
As the maturity of our software has increased, so has the demand for compute resource to carry out these build, test and deployment tasks.
In the early days, we ran lots and lots of smaller jobs - builds, unit tests and security scans. These were easily managed using a "traditional" GitLab Runner setup, in which a server runs the jobs in Docker Containers. These jobs typically take a relatively small amount of compute resource, and a few minutes to complete.
As we grew, we found ourselves running more jobs, and more demanding jobs at that. These included complex builds for custom Docker images, and machine learning jobs using most of our Runner server's resources for hours - or even days! We quickly found ourselves in need of a lot more compute power for our GitLab Runner capability, and a new solution.
The easy way to add more compute power to a GitLab Runner capability is somewhat simple: just add more Runner machines! The system makes it easy to share work among multiple Runner machines.
However, if you don't have lots of spare compute power lying around, you have to get it from somewhere, and dedicate it to your Runner fleet. When most CI/CD usage is uneven, with spikes of huge usage followed by periods of relative quiet - not to mention nights and weekends! - runners are sat idle for a lot of the time.
If, like us, most of your compute power is in the cloud, this can mean you are spending money on resources that aren't being used to their maximum potential.
Big, but only when you need it
Our requirement was therefore for a GitLab Runner capability that is relatively small most of the time - to run those short build, test and security scan jobs - but is capable of becoming very large and powerful when we need it to. Easy to say, harder to do!
Luckily, GitLab have a solution - the new Autoscaler GitLab Runner executor. Currently in beta, this allows you to have a GitLab Runner which can spin up new cloud virtual machines on demand to run jobs, and shut them down when they aren't needed.
In our case, we are using this in conjunction with an Azure virtual machine scale set; this is an Azure service which allows you (or in our case, GitLab Runner) to easily spin up any number of VMs with a preset configuration.
This feature has allowed us to maintain a relatively small GitLab Runner server to carry out those small jobs, but that has the capability to call in reinforcements when the big jobs come knocking. This means that our developers can conduct build and test activities to their hearts' content, whilst allowing the organisation to optimise their cloud spend; compute resource is only allocated when it is needed and being used, so we are getting the maximum cloud for our pound.
The technical bit
As with any beta capability, GitLab Runner's Docker Autoscaler executor is subject to change, and has one or two little niggles (though when set up well, it works very well indeed). The following is a list of the things that we at DIS had to do beyond what's explicitly noted in the documentation, or different to it: read on if you are looking to set up your own!
1. Bring your own VM image
There are a couple of types of executor available with the Autoscaler - Shell and Docker. If, like us, you'd prefer to use the Docker executor (recommended), you'll need to ensure that the VM image you use for the scale set (or other cloud provider's equivalent) has it installed, as well as a user account that you are able to access through SSH easily. This probably means creating your own.
2. Get the right version of the fleeting plugin
The trickiest bit of the whole process (for us at least) was installing the "fleeting plugin". This is a GitLabdeveloped plugin for their Runner software which enables it to connect to cloud services and spin up new VMs.
As well as getting this Go software built and installed on the server on which GitLab Runner is also installed, we found that the most obvious (Azure) version of the plugin was not the version which worked for us.
The documentation will point you to the 0.1.0 release of the plugin. This version will not work if you don't plan to access the VMs created in your scale set using public IP addresses!
If, like us, you would prefer to keep your VMs private and use private IP addresses, you should instead download and build the latest 'main' branch of the source code repository. This adds the capability to connect to the generated VMs using a private IP (with 'use_external_addr' set to false in the GitLab Runner config) that is missing from the latest release.
Building the plugin from source also requires Go to be installed on the GitLab Runner server. It can be tricky with system users to ensure that $PATH is reliably set correctly for the Runner process (to find the plugin); we found it easier to use an absolute path to the executable in the GitLab Runner configuration file.