Terraform

What is it?

Terraform is a cloud-agnostic IaC (Private) solution.

Terraform is split into two parts:

  • One part is the Terraform engine, which understands...
    • how to read state from a provider
    • read HCL code
    • how to get from the current state of your infrastructure to the state you want your infrastructure to be in.
  • The other part is the provider, which talks to the infrastructure to find out the current state and make changes using the infrastructure’s API.

Provisioning Workflow

There are 3 main CLI commands that involve creating, modifying and destroying infrastructure: plan, apply and destroy.

Terraform CLI

Terminology

Provider

A provider is a connection that allows Terraform to manage infrastructure using an interface (e.g. AWS API)

Resource

A resource represents a piece of real world infrastructure

Examples:

  • an S3 bucket
  • an EKS (Elastic Kubernetes) cluster
  • a Postgres role

Module

A Terraform module is a set of Terraform configuration files in a single directory that can be considered its own standalone Terraform project.

  • it can therefore...
    • contain its own resources, data sources, locals, etc.
    • take variables (ie. inputs on a per-module basis)

Data source

A data source is used to fetch data from a resource that is not managed by the current Terraform project.

  • think of it as a read-only resource that already exists
  • ex. get information on an image used elsewhere in the project, get the details of the database user, get the CA Certificate for a database.

Local

A local is Terraform's representation of a variable.

  • note: not to be confused with Terraform variables.

Variable

A variable is set at runtime, allowing us to vary Terraform's behaviour.

  • Therefore, if Terraform were a function, a variable would be an input to the function.
  • note: not to be confused with locals, which themselves are actually more like variables as used in general programming

*.tfvars is for giving the actual variable values during execution. It allows you to customize the specific execution. A good example will be variable values that are specific to an environment like number to vms or the type/tier/sku of a database and similar.

  • variables defined in this file override the defaults specified in variables.tf

Since you can supply files on the command line you can have scenarios where you supply different *.tfvars file like

terraform plan --file-name dev.tfvars

terraform plan --file-name prod.tfvars
variable "bucket_name" {
    type = string
    # describe what this variable is used for
    description = "the name of the bucket we are creating"
    default = "default_bucket_name"
}

resource "aws_s3_bucket" "bucket" {
    bucket = var.bucket_name
}

Variables can be more complex too:

instance_map = {
    dev = "t3.small"
    test = "t3.medium"
    prod = "t3.large"
}

environment_type = "dev"

And referenced like:

variable "instance_map" {
    type = map(string)
}
variable "environment_type" {}

output "selected_instance" {
    value = var.instance_map[var.environment_type]
}

Types

  • string
  • bool
  • number
  • list(<TYPE>)
  • set(<TYPE>)
    • each value is unique
  • map(<TYPE>)
  • object()
    • like a map, but values can be different types
  • tuple([<TYPE>, …])
    • number of values and order is preserved
  • any
    • unlike any type from Typescript; this any allows Terraform to infer based on the actual value.

Providing variables (4 ways)

  1. When we run terraform init and terraform apply, we will be prompted to provide a value for the variable(s).

  2. pass the value with:

terraform apply -var bucket_name=my_bucket
  1. export environment variables in the terminal prefixed with TF_VAR_:
export TF_VAR_bucket_name=my_bucket
  1. create a terraform.tfvars file (or <ANYNAME>.auto.tfvars):
bucket_name = "my_bucket"

State

State is the place where Terraform stores of all of the resources (and their metadata) it has created.

  • run terraform state list to see all resources existing in state.

This state is used by Terraform to work out how changes need to be made.

  • ex. imagine we did not use state. The first time we run terraform apply, our S3 bucket is created. Each subsequent time we ran that command, we would get Resource already exists errors, since we didn't have a state file to tell us what was there already. Due to the state file's absence, Terraform is assuming this is the first time, so it tries to create the resource. If we had a state file however, then Terraform would know that it has to check that state file to know if it is creating a new resource, updating an existing one, or simply doing nothing (because there would be no changes in state)

the output of the terraform plan command is a diff between the code on your computer and the infrastructure deployed in the real world, as discovered via IDs in the state file.

State is stored in terraform.tfstate

  • Every time you run Terraform, it records information about what infrastructure it created in a Terraform state file.
  • this file should not be committed to Git, since it may contain sensitive information. The recommended solution is to keep it in blob storage (such as S3)

note: The State File Is a Private API. The state file format is a private API that is meant only for internal use within Terraform. You should never edit the Terraform state files by hand or write code that reads them directly. If for some reason you need to manipulate the state file — which should be a relatively rare occurrence — use the terraform import or terraform state commands

If we want to move resource creation from one project to another, state needs to be manipulated directly

  • this can be handled by (example uses a AWS VPC resource)

    1. running terraform state rm aws_vpc.my_vpc command, which will remove the resource from state (so Terraform is no longer managing it), but will not delete the resource in the cloud.
    2. in the new project, copy+paste over the resource and run terraform import aws_vpc.my_pc <VPC_ID>
    3. run terraform apply
  • some resources do not support import. In this case, use terraform state mv

Remote state

If you’re using Terraform for a personal project, storing state in a single terraform.tfstate file that lives locally on your computer works just fine. But if you want to use Terraform as a team on a real product, you run into several problems:

  • Shared storage for state files. To be able to use Terraform to update your infrastructure, each of your team members needs access to the same Terraform state files. That means you need to store those files in a shared location.
  • Locking state files. As soon as data is shared, you run into a new problem: locking. Without locking, if two team members are running Terraform at the same time, you can run into race conditions as multiple Terraform processes make concurrent updates to the state files, leading to conflicts, data loss, and state file corruption.
  • Isolating state files. When making changes to your infrastructure, it’s a best practice to isolate different environments. For example, when making a change in a testing or staging environment, you want to be sure that there is no way you can accidentally break production. But how can you isolate your changes if all of your infrastructure is defined in the same Terraform state file?

Multiple people working on the same Terraform project can introduce a lot of complexity, since a local state file is used to store a record of what has been created. If we run terraform commands on a second machine, it will try to create double the resources.

  • to get around this issue, we can store state in a remote location (e.g. in an S3 bucket)

We specify the remote state location using the backend keyword. Here we are using an S3 bucket:

# state.tf
backend "s3" {
    bucket = "<bucket-name>"
    key = "my-project.state"
    region = "us-west-1"
}

The remote state backend needs to support "locking", which prevents changes to the state while Terraform commands are running.

A good idea is to use S3 bucket versioning so we can time travel through different Terraform states.

Workspaces

Workspaces solve the problem "how do we create multiple environments using the same code?"

terraform.workspace is a special variable that resolves to the current workspace we are running in.

Unless we explicitly specify, we are running in the default workspace.

Local workspaces are stored in terraform.tfstate.d/

  • each workspace has its own state

CLI

  • List workspaces - terraform workspaces list
  • Create new workspace - terraform workspace new development
  • Switch workspaces - terraform workspace select development

Terraform Cloud

Terraform cloud provides us with a method to change our input variables at the top level, meaning each set of infra (for each environment) can have its own set of variables.

With it, we:

  1. create a workspace
  2. point it at a source control repo containing your Terraform code
  3. set the variables for that workspace

Lifecycle

Each resource has a special attribute block called lifecycle that gives us extra control.

It allows us to:

  • create_before_destroy, to ensure a new resource is created prior to deleting the old one
  • prevent_destroy, to prevent Terraform from ever deleting the resource, so long as the property exists
lifecycle {
  prevent_destroy = true
}

Provisioner

Provisioners allow us to run a script (remotely or locally) after a resource has been created.

  • provisioners allow us to step in and solve problems ourselves when they are not solved out of the box by the provider we are using.
  • because provisioners are imperative, they are seen as a last resort approach to solving our problem.

Misc

Multi-line string

Multi-line strings are declared between <<ANYWORD and ANYWORD:

resource "aws_iam_policy" "my_bucket_policy" {
    name = "my-bucket-policy"

    policy = <<POLICY
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "s3:ListBucket"
                ],
                "Effect": "Allow",
                "Resource": [
                    "${data.aws_s3_bucket.bucket.arn}"
                ]
            }
        ]
    }
    POLICY
}

String interpolation (${interpolated_value}) can be used inside a multi-line string.

  • only needed when inside quotes ("")

Outputting to console (stdout)

output "message" {
    value = aws_s3_bucket.my_bucket.id
}

or we can print all attributes exported by a resource:

output "all" {
    aws_s3_bucket.my_bucket
}

Folder structure

All Terraform files should be in a single directory (the Terraform project) at the top level. Any files within subdirectories will be ignored. Conceptually, when we run Terraform commands, everything will be appended into a single file anyway.

  • child directories are used to set up Modules

By convention,

  • set up providers in main.tf.
  • resources named after their type (e.g. sqs.tf, api-gateway.tf)
  • variables in variables.tf

Tools

  • Atlantis - Pull Request automation for Terraform
    • purpose is to have improved code review for infra changes.
  • Terratest - a unit testing framework for Terraform

Alternatives

  • Chef/Puppet - these are configuration management tools. They are designed to configure and manage the already existing infrastructure, while Terraform is designed to set up the infrastructre itself.
    • In other words, Puppet and Chef would be used to configure servers, while Terraform would be used to create the server itself.
  • Pulumi - This IaC tool uses a programming language (like Typescript) instead of a configuration language.

Children
  1. Data Source
  2. Module
  3. Output
  4. Provider
  5. Provisioner
  6. Resource
  7. State
  8. Terraform CLI
  9. Terragrunt
  10. Variables

Backlinks