Post

Terraform pt 3: Create An EKS Cluster and IAM Role/Policy (DevOps the Hard Way series)

Terraform pt 3: Create An EKS Cluster and IAM Role/Policy (DevOps the Hard Way series)

In this post, the Terraform CLI will be used to create an Elastic Kubernetes Service (EKS) Cluster on AWS.

Terraform: creating an EKS cluster and IAM role/policy

The Terraform module provided in this scenario for EKS consists of two files, main.tf and variables.tf.

main.tf

This main.tf file is fairly long, so we will look at it block-by-block:

The terraform block

1
2
3
4
5
6
7
8
9
10
11
12
terraform {
  backend "s3" {
    bucket = "terraform-state-devopsthehardway"
    key    = "eks-terraform-workernodes.tfstate"
    region = "us-east-1"
  }
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

This terraform block is nearly identical to the one presented in the previous blog post. The only difference is the value of key, which tells Terraform what object name to use when saving this module’s state configuration to the S3 backend.

Once this Terraform module is successfully applied, the existing S3 backend bucket should end up containing a new object (named "eks-terraform-workernodes-tfstate"). The object will be used to save this module’s state.

Just like last time, the terraform block must be edited before use, for two reasons:

  1. The value for bucket needs to be changed to the actual (globally unique) bucket name that was created back in the first Terraform post.

  2. Terraform will again require credentials in order to access its backend S3 bucket.

A resource block (for an IAM role)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# IAM Role for EKS to have access to the appropriate resources
resource "aws_iam_role" "eks-iam-role" {
  name = "devopsthehardway-eks-iam-role"

  path = "/"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

}

Next after the terraform block comes an aws_iam_role resource block, which, predictably, declares an IAM role. The only required argument in an aws_iam_role resource is assume_role_policy. In this particular assume_role_policy, the EKS service itself (eks.amazonaws.com) is being granted permission to assume the role.

Also declared in this block are values for the IAM role’s friendly name and path, which are more intuitive for humans to work with than the role’s actual ARN. As a result of these additional arguments, the IAM role declared in the block above will have a friendly name and path of /devopsthehardway-eks-iam-role.

Two resource blocks (for the IAM role’s policy attachments)

1
2
3
4
5
6
7
8
9
## Attach the IAM policy to the IAM role
resource "aws_iam_role_policy_attachment" "AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks-iam-role.name
}
resource "aws_iam_role_policy_attachment" "AmazonEC2ContainerRegistryReadOnly-EKS" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks-iam-role.name
}

Next comes a pair of related resource blocks. Each of these blocks declares an aws_iam_role_policy_attachment. These policy attachments bind managed IAM policies to a role, so they are useful for granting logically-grouped AWS permissions sets to a specific IAM role.

The AWS-managed policies being attached to /devopsthehardway-eks-iam-role are:

  • AmazonEKSClusterPolicy

    This policy provides Kubernetes the permissions it requires to manage resources on your behalf. Kubernetes requires Ec2:CreateTags permissions to place identifying information on EC2 resources including but not limited to Instances, Security Groups, and Elastic Network Interfaces.

  • AmazonEC2ContainerRegistryReadOnly

    Provides read-only access to Amazon EC2 Container Registry repositories.


To summarize: so far in main.tf, a new IAM role has been declared and the EKS service granted permission to assume it. That role has in turn been granted a standard set of permissions allowing appropriate access from the AWS EKS service to the EC2 and ECR services.


By assuming the new IAM role ( /devopsthehardway-eks-iam-role ):

  • EKS will use the role’s first attached policy for its internal management of EC2
  • EKS will use the role’s second attached policy to access ECR. This ECR access is for storing the image for the containerized Uber API that EKS will be orchestrating.

A resource block (to create an EKS cluster on AWS)

1
2
3
4
5
6
7
8
9
10
11
12
13
## Create the EKS cluster
resource "aws_eks_cluster" "devopsthehardway-eks" {
  name = "devopsthehardway-cluster"
  role_arn = aws_iam_role.eks-iam-role.arn

  vpc_config {
    subnet_ids = [var.subnet_id_1, var.subnet_id_2]
  }

  depends_on = [
    aws_iam_role.eks-iam-role,
  ]
}

While the previous blocks declared the necessary IAM role and policy attachments for an EKS cluster, this block is where the actual cluster is declared, via an aws_eks_cluster resource. The block is fairly minimal, containing only the required arguments along with the depends_on “meta-argument”.

First, notice how the values of role_arn (line 4) and depends_on (line 10) both refer back to the eks-iam-role resource declared earlier in main.tf.

Also note how the vpc_config argument (line 6) references two Terraform input variables: subnet_id_1 and subnet_id_2. Those variables are declared separately in this module’s variables.tf file, which will come up later in this post.

Some resource blocks (declaring an IAM role & policy attachments, for EKS Worker Nodes)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## Worker Nodes
resource "aws_iam_role" "workernodes" {
  name = "eks-node-group-example"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.workernodes.name
}

resource "aws_iam_role_policy_attachment" "AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.workernodes.name
}

resource "aws_iam_role_policy_attachment" "EC2InstanceProfileForImageBuilderECRContainerBuilds" {
  policy_arn = "arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilderECRContainerBuilds"
  role       = aws_iam_role.workernodes.name
}

resource "aws_iam_role_policy_attachment" "AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.workernodes.name
}

These aws_iam_role and aws_iam_role_policy_attachment blocks are similar to the ones seen earlier in this post. However, instead of granting permissions to the EKS service as was done earlier, these blocks are granting (different) permissions to the AWS EC2 service.

In this case, a new IAM role eks-node-group-example is created and the AWS EC2 service itself allowed to assume it. Then the following AWS-managed policies are attached in order to grant appropriate permissions to the new role:

  • AmazonEKSWorkerNodePolicy
  • AmazonEKS_CNI_Policy
  • EC2InstanceProfileForImageBuilderECRContainerBuilds
  • AmazonEC2ContainerRegistryReadOnly

A full breakdown of the specific AWS policies attached above is beyond the scope of this blog post, but the reason for attaching them to the new IAM role eks-node-group-example is to allow EC2-based worker nodes the permissions they require in order to function within an EKS environment.

As of this writing, all four policies are documented at https://docs.aws.amazon.com/aws-managed-policy/latest/reference/policy-list.html.


The reason so many IAM declarations are required throughout this main.tf is that AWS permissions follow a deny-by-default model. In that way, best security practices are followed; because minimal permissions are granted by default, it is less likely that the principle of least privilege will be violated.

The flip side is that a lot of explicit permissions must sometimes be granted in order for things to work properly, as seen throughout this post.


A resource block (for creating an EKS node group on AWS)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
resource "aws_eks_node_group" "worker-node-group" {
  cluster_name    = aws_eks_cluster.devopsthehardway-eks.name
  node_group_name = "devopsthehardway-workernodes"
  node_role_arn   = aws_iam_role.workernodes.arn
  subnet_ids      = [var.subnet_id_1, var.subnet_id_2]
  instance_types = ["t3.xlarge"]

  scaling_config {
    desired_size = 1
    max_size     = 1
    min_size     = 1
  }

  depends_on = [
    aws_iam_role_policy_attachment.AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.AmazonEKS_CNI_Policy,
    #aws_iam_role_policy_attachment.AmazonEC2ContainerRegistryReadOnly,
  ]
}

At the end of main.tf, an EKS Managed Node Group is declared via an aws_eks_node_group resource. Per AWS:

Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters.

Similarly to the aws_eks_cluster resource block seen earlier in main.tf, many values refer either to other blocks or to Terraform input variables.

Note that the instance_type declared here is "t3.xlarge", which does not fall under AWS Free Tier.

The block above marks the end of the main.tf file for this module. Next, we will look at variables.tf.

variables.tf

1
2
3
4
5
6
7
8
9
10
#### `variables.tf`
variable "subnet_id_1" {
  type = string
  default = "subnet-07d002bb7e32b67fc"
}

variable "subnet_id_2" {
  type = string
  default = "subnet-03c49bbe3a085ef44"
}

The two variables declared in this file (subnet_id_1 and subnet_id_2) are referenced every time var.subnet_id_1 and var.subnet_id_2 appear in main.tf. Since their values are never explicitly defined in this example, their default values will take effect.

Will it build?

It’s time to run this Terraform module, using the standard Terraform CLI commands:

1
2
3
$ terraform init
$ terraform plan
$ terraform apply
  1. terraform init runs without issues.
  2. terraform plan throws an error:
    1
    2
    
    Provider "registry.terraform.io/hashicorp/aws" requires explicit configuration. Add a provider block to the root module and configure the provider's required arguments as
    described in the provider documentation.
    

    Looking back at the previous two Terraform modules (the ones for S3 and ECR) as written by the author of the Devops the Hard Way repo, a provider block was always present in main.tf. Could it be an oversight in this particular module that there’s no provider block included?

    Whatever the case, manually adding a provider block (and adding credentials to that block, as has been required in every module so far) solves the problem. The command completes and suggests continuing on and running terraform apply.

  3. terraform apply takes some time to run. After a couple of minutes, the CLI throws another error:
    1
    
    Error: creating EKS Cluster (devopsthehardway-cluster): operation error EKS: CreateCluster, https response error StatusCode: 400, RequestID: ****, InvalidParameterException: The subnet ID 'subnet-07d002bb7e32b67fc' does not exist
    

    Maybe this cluster wasn’t meant to be?

    It turns out that the default values originally provided for the Terraform input variables subnet_id_1 and subnet_id_2 in variables.tf aren’t very meaningful. Since the values for these two input variables were never explicitly defined, the declared default values of “subnet-07d002bb7e32b67fc” and “subnet-03c49bbe3a085ef44” took effect.

    The problem is that those values are seemingly arbitrary, and subnets by those names are highly unlikely to exist in the real-world VPC we created at the outset of this series.

In order to complete the EKS setup via Terraform, valid subnet ids will need to be assigned to both input variables. As seen in previous posts/modules, Terraform input variables can be assigned via a .tfvars file, for example:

terraform.tfvars

1
2
subnet_id_1 = "subnet-****"
subnet_id_2 = "subnet-****"

where the two actual private subnet ids created earlier by CloudFormation go between the quotes.

The working build

After:

  1. creating this new .tfvars file in the same directory as the .tf files for this module,
  2. setting the two correct, real-world AWS private subnet ids as the values subnet_id_1 and subnet_id_2, and
  3. re-running terraform apply,

the command runs for much longer and the cluster eventually deploys successfully.

This can be verified via console or CLI:

Verifying the new EKS cluster

1
2
3
4
5
6
$ aws eks list-clusters
{
    "clusters": [
        "devopsthehardway-cluster"
    ]
}

With the three Terraform CLI commands successfully completed, an EKS cluster has been created on AWS. This means we now have a place to run containerized software.

However, the Uber API we intend to run has not yet been containerized. That is where Docker will come in.

In the next post, we use Docker to containerize the Uber API, so that the API can be run on our new EKS cluster.

This post is licensed under CC BY-NC-SA 4.0 by the author.