Create an Azure DevOps Services Self-Hosted Agent in Azure Using Terraform, Cloud-init—and Azure DevOps Pipelines!

Posted by Graham Smith on November 14, 2018No Comments (click here to comment)

I recently started a new job with the awesome DevOpsGroup. We're hiring and there's options for remote workers (like me), so if anything from our vacancies page looks interesting feel free to make contact on LinkedIn.

One of the reasons for mentioning this is that the new job brings a new Azure subscription and the need to recreate all of the infrastructure for my blog series on what I will now have to rename to Deploy a Dockerized ASP.NET Core Application to Azure Kubernetes Service Using an Azure DevOps CI/CD Pipeline. For this new Azure subscription I've promised myself that anything that is reasonably long-lived will be created through infrastructure as code. I've done a bit of this in the past but I haven't been consistent and I certainly haven't developed any memory muscle. As someone who is passionate about automation it's something I need to rectify.

I've used ARM templates in the past but found the JSON a bit fiddly, however in the DevOpsGroup office and Slack channels there are a lot of smart people talking about Terraform. When combined with Brendan Burns' post on Expanding the HashiCorp partnership and HashiCorp's post about being at Microsoft Ignite well, there's clearly something going on between Microsoft and HashiCorp and learning Terraform feels like a safe bet.

For my first outing with Terraform I decided to create an Azure DevOps Services (neé VSTS, hereafter ADOS) self-hosted agent capable of running Docker multi-stage builds. Why a self-hosted agent rather than a Microsoft-hosted agent? The main reason is speed: a self-hosted agent is there when you need it and doesn't need the spin-up time of a Microsoft-hosted agent, plus a self-hosted agent can cache Docker images for re-use as opposed to a Microsoft-hosted agent which needs to download images every time they are used. It's not a perfect solution for anyone using Azure credits as you don't want the VM hosting the agent running all the time (and burning credit) but there are ways to help with that, albeit with a twist at the moment.

List of Requirements

In the spirit of learning Terraform as well as creating a self-hosted agent I came up with the following requirements:

  • Use Terraform to create a Linux VM that would run Docker, that in turn would run a VSTS [sic] agent as a Docker container rather than having to install the agent and associated dependencies.
  • Create as many self-hosted agent VMs as required and tear them down again afterwards. (I'm never going to need this but it's a great infrastructure as code learning exercise.)
  • Consistent names for all the Azure infrastructure items that get created.
  • Terraform code refactored to avoid DRY problems as far as possible and practicable.
  • All bespoke settings supplied as variables to make the code easily reusable by anyone else.
  • Terraform code built and deployed from source control (GitHub) using an ADOS Pipeline.
  • Mechanism to deploy ‘development' self-hosted agents when working locally and only allow deploying ‘production' self-hosted agents from the ADOS Pipeline by checking in code.

This blog post aims to explain how I achieved all that but first I want to describe my Terraform journey since going from zero to my final solution in one step probably won't make much sense.

My Terraform Journey

I used these resources to get going with Terraform and I recommend you do the same with whatever is available to you:

It turns out that there were quite a few decisions to be made in implementing this solution and lots of parts that didn't work as expected so I've saved the discussion of all that to the end to keep the configuration steps cleaner.

Configuring Pre-requisites

There are a few things to configure before you can start working with my Terraform solution:

  • Install Azure CLI and Terraform on your local machine.
  • If you are using Visual Studio Code as your editor ensure you have the Terraform extension by Mikael Olenfalk installed.
  • If required, create an SSH key pair following these instructions if you need them. The goal is to have a public key file at ~\.ssh\id_rsa.pub.
  • To allow Terraform to access Azure create a password-based Azure AD service principal and make a note of the appId and password that are returned.
  • Create a new Agent Pool in your ADOS subscription—I called mine Self-Hosted Ubuntu.
  • Create a PAT in your ADOS subscription limiting it to Agent Pools (read, manage) scope.
  • Fork my repository on GitHub and clone to your local machine.

Examining the Terraform Solution

Let's take a look at the key contents of the GitHub repo, which have separate folders for the backend storage on Azure and the actual self-hosted agent.

backend-storage folder
  • main.tf—This is a simple configuration which creates an Azure storage account and container to hold the Terraform state. The names of the storage account and container are used in the configuration of the self-hosted agent.
  • variables.tf—I've pulled the key variables out in to a separate file to make it easier for anyone to know what to change for their own implementation.
ados-self-hosted-agent-ubuntu folder
  • main.tf—I've chosen to keep the configuration in one file for simplicity of writing this blog post, however you'll frequently see the different resources split out in to separate files. If you've followed this tutorial then most of what's in the configuration should be straightforward, however you will see that I've made extensive use of variables. I like to see all of my Azure infrastructure with consistent names so I pass in a base_name value on which the names of all resources are based. What is different from what you will have seen in the tutorial is that some resources have a count = "${var.node_count}" property, which tells Terraform to create as many copies of that particular resource as value of the node_count variable. For those resources the name must be unique of course and you'll see a slightly altered syntax of the name property, for example "${var.base_name}-vm-${count.index}" for the VM name. This creates those types of resources with an incrementing numerical suffix.
  • variables.tf—I've made heavy use of variables both to make it easier for others to use the repo and also to separate out what I think should and shouldn't be tracked via version control. You'll see that variables are grouped in to those that I think should be tracked by version control for traceability purposes and those which are very specific to an Azure subscription and which shouldn't (or in the case of secrets mustn't) be tracked by version control. In the tracked group the variables are set in variables.tf however the non-tracked group have their variables set in a separate (ignored by git) file locally and via Pipeline Variables in ADOS.
  • .gitignore—note that I've added cloud-init.txt to the standard file for Terraform as cloud-init.txt contains secrets.

Configuring Backend Storage

In order to use Azure as a central location to save Terraform state you will need to complete the following:

  1. In your favourite text editor edit \backend-storage\variables.tf supplying suitable values for the variables. Note that storage_account needs to be globally unique.
  2. Open a command prompt, login to the Azure CLI and and navigate to the backend-storage folder.
  3. Run terraform init to initialise the directory.
  4. Run terraform apply to have Terraform generate an execution plan. Type yes to have the plan executed.

Configuring and Running the Terraform Solution Locally

To get the solution working from your local machine to create development self-hosted agents running in Azure you will need to complete the following:

  1. In your favourite text editor edit \ados-self-hosted-agent-ubuntu\variables.tf supplying suitable values for the variables that are designated to be tracked and changed via version control.
  2. Add a terraform.tfvars file to the root of \ados-self-hosted-agent-ubuntu and copy the following code in, replacing the placeholder text with your actual values where required (double quotes are required):
  3. Add a cloud-init.txt file to the root of \ados-self-hosted-agent-ubuntu and copy the following code in, replacing the placeholder text with your actual values where required (angle brackets must be omitted and the readability back slashes in the docker run command must also be removed):
  4. In a secure text file somewhere construct a terraform init command as follows, replacing the placeholder text with your actual values where required (angle brackets must be omitted):
  5. Open a PowerShell console window and navigate to the root of \ados-self-hosted-agent-ubuntu. Copy, paste and then run the terraform init command above to initialise the directory.
  6. Run terraform apply to have Terraform generate an execution plan. Type yes to have the plan executed.

It takes some time for everything to run but after a while you should see ADOS self-hosted agents appear in your Agent Pool. Yay!

Configuring and Running the Terraform Solution in an Azure DevOps Pipeline

To get the solution working in an Azure DevOps Pipeline to create production self-hosted agents running in Azure you will need to complete the following:

  1. Optionally create a new team project. I'm planning on doing more of this so I created a project called terraform-azure.
  2. Create a new Build Pipeline called ados-self-hosted-agent-ubuntu-build and in the Where is your code? window click on Use the visual designer.
  3. Connect up to your forked GitHub repository and then elect to start with an empty job.
  4. For the Agent pool select Hosted Ubuntu 1604.
  5. Create a Bash task called terraform upgrade and paste in the following as an inline script:
  6. Create a Copy Files task called copy files to staging directory and configure Source Directory to ados-self-hosted-agent-ubuntu and Contents to show *.tf and Target Folder to be $(Build.ArtifactStagingDirectory).
  7. Create a PowerShell task called create cloud-init.txt, set the Working Directory to $(Build.ArtifactStagingDirectory) and paste in the following as an inline script:
  8. Create a Bash task called terraform init, set the Working Directory to $(Build.ArtifactStagingDirectory) and paste in the following as an inline script:
  9. Create a Bash task called terraform plan, set the Working Directory to $(Build.ArtifactStagingDirectory) and paste in the following as an inline script:
  10. Create a Bash task called terraform apply, set the Working Directory to $(Build.ArtifactStagingDirectory) and paste in the following as an inline script:
  11. Create a Publish Build Artifacts task called publish artifact and publish the $(Build.ArtifactStagingDirectory) to an artifact called terraform.
  12.  Create the following variables, supplying your own values as required (variables with spaces should be surrounded by single quotes) and ensuring secrets are added as such:
  13. Queue a new build and after a while a while you should see ADOS self-hosted agents appear in your Agent Pool.
  14. Finally, from the Triggers tab select Enable continuous integration.

Discussion

One of the great things I found about Terraform is that it's reasonably easy to get a basic VM running in Azure by following tutorials. However the tutorials are, understandably, lacking when it comes to addressing issues such as repeated code and I spent a long time working through my configurations addressing repeated code and extracting variables. One fun aspect was amending the Terraform resources so that a node_count variable can be set with the desired number of VMs to create. Pleasingly, Terraform seems really smart about this: you can start with two VMs (labelled 0 and 1), scale to four VMs (0, 1, 2 and 3) and then scale back to one VM which will remove 1, 2 and 3 to leave the VM labelled 0.

Do be aware that some of the tutorials don't cover all of the properties of resources and it's worth looking at the documentation for those fine details you want to implement. For example I wanted to be able to SSH in to VMs via a FQDN rather than IP address which meant setting the domain_name_label property of the azurerm_public_ip resource. In another example the tutorial didn't specify that the azurerm_virtual_machine should delete_os_disk_on_termination, the absence of which was a problem when implementing the scaling VMs up and down capability. It was all in the documentation though.

An issue that I had to tackle early on was the mechanism for configuring the internals of VMs once they were up-and-running. Terraform has a remote-exec provisioner which looks like it should do the job but I spent a long time trying to make it work only to have the various commands consistently terminate with errors. An alternative is cloud-init. This worked flawlessly however the cloud-init file has to contain an ADOS secret and I couldn't find a way to pass the secret in as a parameter. My solution was to exclude a local cloud-init.txt file from version control so the secret didn't get committed and then to dynamically build the file as part of the build pipeline.

Another decision I had to make was around workflow, ie how the solution is used in practice. I've kept it simple but have tried to put mechanisms in place for those that need something more sophisticated. For experimenting and getting things working I'm just working locally, although for consistency I'm storing Terraform state in Azure. Where required all resources are labelled with a ‘dev' element somewhere in their name. For ‘production' self-hosted agents I'm using an ADOS build pipeline, with all required resources having a ‘prd' element in their name. Note that there is a Terraform workspaces feature which might work better in a more sophisticated scenario. I've chosen to keep things simple by deploying straight from a build however I do output a Terraform plan as an arttifact so if you wanted you could pick this up and apply it in a release pipeline. I've also not implemented any linting or testing of the Terraform configuration as part of the build but that's all possible of course. You'll also notice that I haven't implemented the build pipeline with the YAML code feature. It was part of my plans but I ran in to some issues which I couldn't fix in the time available.

It's worth noting a few features of the build pipeline. I did look at tasks in the ADOS marketplace and while there are a few I'm increasingly becoming dissatisfied with how visual tasks abstract away what's really happening and I'm more in favour of constructing commands by hand.

  • terraform upgrade—In order to run Terraform commands in an ADOS pipeline you need to have access to an agent that has Terraform installed. As luck would have it the Microsoft-hosted agent Hosted Ubuntu 1604 fits the bill. Well almost! I ran in to an issue early on before I started using separate blobs in Azure to hold state for dev and prd where Terraform wasn't happy that I was developing on a newer version of Terraform than was running on the Hosted Ubuntu 1604 agent. It was a case of downgrading to an older version of Terraform locally or upgrading the agent. I chose the latter which is perfectly possible and very quick. There's obviously a maintenance issue with keeping versions in sync but I created a variable in the ADOS pipeline to help. As it turns out with the final solution using separate state blobs in Azure this is less of a problem however I've chosen to keep the upgrade task in, if for no other reason than to illustrate what's possible.
  • terraform init—Although I've seen the init command have its variables passed to it through as environment variables at the end of this demo I couldn't get the command to work this way and had to pass the variables through as parameters.
  • terraform plan—The environment variables trick does work with the plan command. Note that the command outputs a file called tfplan which gets saved as part of the artifact. I use the file immediately in the next task (terraform apply) but a release pipeline could equally use the file after (for example) someone had inspected what the plan was going to actually do using terraform show.

Finally, there's one killer feature that seems not to have been implemented yet and that's Auto-shutdown of VMs. I find this essential to ensure that Azure credits don't get burned, so if you are like me and need to preserve credits do take appropriate precautions until the feature is implemented. Happy Terraforming!

Cheers -- Graham