Codefresh https://codefresh.io/ Mon, 26 Jan 2026 15:56:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://codefresh.io/wp-content/uploads/2023/04/cropped-favicon_codefresh-150x150.webp Codefresh https://codefresh.io/ 32 32 Pipeline Performance Profiling: Making CI/CD Performance, Cost, and Bottlenecks Visible https://codefresh.io/blog/pipeline-performance-profiling/ https://codefresh.io/blog/pipeline-performance-profiling/#respond Mon, 26 Jan 2026 15:56:34 +0000 https://codefresh.io/?p=17307 Modern CI/CD pipelines are no longer just about whether builds succeed, they’re about how fast, how efficiently, and at what cost they run. Pipeline Performance Profiling is designed to close this gap by making pipeline behavior observable, measurable, and explainable.

The post Pipeline Performance Profiling: Making CI/CD Performance, Cost, and Bottlenecks Visible appeared first on Codefresh.

]]>
Modern CI/CD pipelines are no longer just about whether builds succeed, they’re about how fast, how efficiently, and at what cost they run.  

One theme has come up consistently in customer conversations:  

 “My builds are slow, expensive, and I don’t know where to start fixing that.”

Pipeline Performance Profiling is designed to close this gap by making pipeline behavior observable, measurable, and explainable. Instead of treating build time as a single opaque number, it breaks pipeline execution down into clear phases, steps, and resource signals. Built on OpenTelemetry and Prometheus-compatible, it exposes these insights using open, industry-standard metrics, giving teams the flexibility to analyze pipeline cost and performance with the same monitoring tools they already use. This allows teams to understand where time and resources are spent, why bottlenecks occur, and how to make informed trade-offs between speed, cost, and reliability.

Why We Built Pipeline Performance Profiling

Our customers asked us clear, practical questions:

  • How do I choose the right machine sizes for my builds without wasting money?
  • Where exactly are my pipeline bottlenecks?
  • Why do some steps feel slow even when nothing has changed?
  • How much time are we losing pulling images instead of executing code?
  • Can I analyze Codefresh builds with the same monitoring tools I already use?

Until now, answering these questions required guesswork, manual timing, or support escalations with limited data.

Pipeline Performance Profiling changes that by instrumenting the pipeline runtime itself and exposing step-level time and resource metrics in a way that’s easy to analyze, trend, and correlate.

What Is Pipeline Performance Profiling?

Pipeline Performance Profiling is the first phase of Codefresh’s broader Pipeline Observability initiative.

It provides:

  • Step-level execution timing (active vs. idle)
  • Initialization vs. execution breakdown
  • CPU and memory metrics
  • Cache usage visibility
  • Prometheus-compatible metrics, visualized through Grafana dashboards
  • OpenTelemetry-native instrumentation, so your data fits into existing observability pipelines

This foundation supports developers, DevOps engineers, and platform teams who need evidence-based answers, not assumptions.

Performance: Find Time Sinks and Bottlenecks

One of the hardest performance problems to solve is unexplained slowdown. Builds still succeed, configurations haven’t changed, yet pipelines take longer to complete, leaving teams guessing where the time is going.

“Our builds feel slower, but nothing obvious changed.”

With Pipeline Performance Profiling, teams no longer have to rely on intuition or one-off comparisons. By breaking pipeline execution into measurable phases and steps, the dashboards make it clear where time is actually spent and how that changes over time.

With this visibility, teams can answer questions such as:

Build and Step Duration Trends

Understanding whether performance is improving or regressing requires looking beyond individual builds.

  • How does build duration change over time?
  • Are pipelines getting faster, or are there gradual regressions?
  • Which steps consistently dominate execution time across builds?

These trends help teams spot slowdowns early and focus optimization efforts where they will have the biggest impact.

Initialization Time: Understanding Build Preparation Delays

Many pipeline slowdowns happen before any build step runs. The initialization phase includes several setup stages that can often be optimized to speed things up. Pipeline Performance Profiling makes these stages visible through build preparation duration metrics (such as P95 trends), helping teams quickly see when and why startup time increases.

Common contributors include:

  • Request account clusters: Attaching many clusters can slow build startup as each one is contacted. Limiting pipelines to only the required clusters can significantly reduce initialization time.
  • Validate Docker daemon: Slow validation often indicates delays in provisioning Docker-in-Docker pods, which depend on cluster configuration and capacity.
  • Start composition services: Pipelines with many Docker Compose services may start more slowly due to image pulls and container startup.

By identifying which part of initialization is responsible for delays, teams can make targeted configuration changes and reduce overall pipeline startup time.

Cost: Optimize Resource Usage Without Guesswork

Slow pipelines are frustrating, but expensive pipelines are even worse. Too often, CI/CD resource decisions are made conservatively, teams over-provision CPU and memory to avoid failures, without clear visibility into whether those resources are truly needed. Pipeline Performance Profiling addresses this by exposing how resources are actually consumed over time, enabling teams to use historical data instead of assumptions to make deliberate, data-driven decisions about sizing and efficiency.

Pipeline Performance Profiling helps teams answer questions such as:

Resource Utilization per Pipeline and Step

  • What is the average and peak CPU usage per pipeline and per step?
  • How much memory is actually consumed during builds?
  • Are there steps that briefly spike resource usage while the rest of the build remains under-utilized?

Right-Sizing and Cost Optimization

  • Which pipelines are consistently under-utilizing allocated resources?
  • Where are CPU or memory requests clearly higher than necessary?
  • Can pod sizes or machine types be safely reduced without impacting build stability?

With this visibility, right-sizing becomes a controlled, low-risk process rather than a guessing game. Teams can adjust resource allocations confidently, validate changes over time, and reduce CI infrastructure costs without sacrificing performance or developer productivity.

Cache Utilization: Stop Paying for Repeated Work

Repeated image pulls and dependency downloads are a quiet but persistent drain on pipeline performance. When caching is ineffective or inconsistently used, teams end up paying the cost on every build, longer startup times, wasted network bandwidth, and slower feedback loops for developers.

Pipeline Performance Profiling makes cache behavior visible, so teams can understand whether caching is actually helping and where it falls short. Instead of guessing, they can see the real impact cache usage has on build duration and step execution.

With this visibility, teams can answer questions such as:

  • How often are cache volumes reused across builds?
  • How does overall build time differ when a cache volume is reused versus not reused?
  • How does step duration change when cache hits occur?

This makes it easier to identify pipelines that rarely benefit from caching, steps that could be restructured to improve reuse, and opportunities to reconfigure cache volume usage to improve hit rates, all while reducing build startup time. Even modest improvements in cache utilization can translate into noticeable gains in speed and efficiency at scale.

Bring Your Own Observability Stack

A core design goal for Pipeline Performance Profiling was avoiding lock-in to proprietary tooling such as Datadog. The metrics it exposes are fully compatible with OpenTelemetry and Prometheus and visualized through Grafana, making pipeline performance a first-class part of your existing observability ecosystem rather than a separate, siloed view.

This approach allows teams to:

  • Analyze Codefresh pipeline metrics alongside application and infrastructure data
  • Correlate slow pipeline steps with cluster-level CPU or memory pressure
  • Feed build performance data into existing dashboards, alerts, and cost-analysis workflows

For hybrid and on-prem environments in particular, this brings first-class pipeline observability using tools teams already trust, without requiring new platforms or specialized integrations.

Grafana Dashboards

Metrics are only useful if teams can easily explore, understand, and act on them. To make Pipeline Performance Profiling immediately practical, Codefresh provides ready-to-use Grafana dashboards that turn raw pipeline metrics into clear, actionable insights, helping teams move quickly from “something feels slow” to “here’s where the problem is.”

To help teams get immediate value, Codefresh provides two ready-to-use Grafana dashboards.

1. Pipeline Overview

The Pipeline Overview dashboard is designed to help teams understand how their pipelines behave over time, rather than focusing on a single build in isolation. It provides a high-level view of performance trends, making it easier to spot gradual slowdowns, sudden regressions, or improvements introduced by recent changes.

Using this dashboard, teams can answer questions such as:

  • How is build duration changing over time?
  • What is causing delays during build initialization?
  • Are resources allocated effectively for this pipeline?

2. Build Details

While the Pipeline Overview focuses on trends, the Build Details dashboard zooms in on individual builds to support faster troubleshooting and optimization. It is designed for moments when something looks off and teams need to understand exactly what happened during a specific execution.

With this level of detail, teams can answer questions such as:

  • Which step consumes the most time or resources?
  • Can the step structure be redesigned for faster execution?
  • What caused delays during build bootstrap or startup?

Together, these dashboards provide both the big picture and the fine-grained detail needed to move from detection to diagnosis. Teams can track long-term performance trends, investigate anomalies when they occur, and make informed changes with confidence.

We also encourage teams to extend these dashboards with additional graphs and views tailored to their specific workflows and business goals. As you customize and optimize your dashboards, we’d love to hear about your experience, your feedback helps us continue improving Pipeline Performance Profiling for you and for other customers as well.

What’s Next

Pipeline Performance Profiling is an important first step, and we’re actively iterating on it based on real-world usage and customer feedback. We’re already working closely with early customers, using real builds and environments to validate metrics, dashboards, and workflows — and the feedback so far has been invaluable.

Over the coming months, we’ll continue evolving this capability with a focus on deeper insights and improved usability. Key areas of investment include:

  • Expanded support for SaaS customers: enabling environments where Codefresh hosts both the control plane and runtime to benefit from the same metrics and to analyze them with existing observability tools.
  • Richer Grafana dashboards: making it easier to spot regressions, identify anomalous builds, and understand performance patterns.
  • Foundations for performance management: laying the groundwork for future capabilities such as actionable insights, smarter analysis, and performance regression detection.

As always, this work is driven by real customer needs. We encourage you to start exploring Pipeline Performance Profiling with your own pipelines, extend the dashboards to match your workflows, and let the data guide your optimization efforts. Your usage and feedback help shape what comes next and allow us to continue improving the experience for all Codefresh customers.

The post Pipeline Performance Profiling: Making CI/CD Performance, Cost, and Bottlenecks Visible appeared first on Codefresh.

]]>
https://codefresh.io/blog/pipeline-performance-profiling/feed/ 0
Anatomy of a Pull Request Generator https://codefresh.io/blog/anatomy-of-a-pull-request-generator/ https://codefresh.io/blog/anatomy-of-a-pull-request-generator/#respond Thu, 02 Oct 2025 14:48:57 +0000 https://codefresh.io/?p=17242 Argo CD has built a number of Generators to support various scenarios that developers need when using Argo CD and Kubernetes.  In this post, I’ll be discussing the Pull Request Generator.  A Pull Request Generator is an Argo CD Application Set deployment type that is configured to “watch” a Git repository for Pull Requests (PRs).  […]

The post Anatomy of a Pull Request Generator appeared first on Codefresh.

]]>
Argo CD has built a number of Generators to support various scenarios that developers need when using Argo CD and Kubernetes.  In this post, I’ll be discussing the Pull Request Generator.  A Pull Request Generator is an Argo CD Application Set deployment type that is configured to “watch” a Git repository for Pull Requests (PRs).  Whenever a new PR is submitted that matches the specified filter, Argo CD applies the manifests from the referenced repository and path.  This allows you to test the PR changes in an ephemeral environment.  In addition, the Pull Request Generator cleans up after itself when the PR is closed, removing the resources that it deployed.

The manifest for a Pull Request Generator can look quite daunting as there is some additional configuration required to make it work.  In this post, I’ll break down the manifest so it is a little more consumable.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: octopub-pullrequestgenerator
  namespace: argocd
spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - pullRequest:
      requeueAfterSeconds: 120
      bitbucketServer:
        project: pul
        repo: pullrequestgenerator
        # URL of the Bitbucket Server. Required.
        api: https://bitbucket.octopusdemos.app
        # Credentials for Basic authentication (App Password). Either basicAuth or bearerToken
        # authentication is required to access private repositories
        # Credentials for Bearer Token (App Token) authentication. Either basicAuth or bearerToken
        bearerToken:
          # Reference to a Secret containing the bearer token.
          tokenRef:
            secretName: bitbucket-token
            key: token
        # authentication is required to access private repositories
      # Labels are not supported by Bitbucket Server, so filtering by label is not possible.
      # Filter PRs using the source branch name. (optional)
      filters:
      - branchMatch: ".*-argocd"
  template:
    metadata:
      name: 'octopub-{{.branch}}-{{.number}}'
    spec:
      source:
        repoURL: 'https://bitbucket.octopusdemos.app/scm/pul/pullrequestgenerator.git'
        targetRevision: '{{.head_sha}}'
        path: manifests
      project: "default"
      destination:
        server: https://kubernetes.default.svc
        namespace: 'octopub-{{.branch}}-{{.number}}'
      syncPolicy:
        automated: 
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true

Kind

To support multiple PRs existing at the same time, the Pull Request Generator must use an ApplicationSet.  An ApplicationSet acts as an “application factory” to automatically generate applications from a single manifest file.

kind: ApplicationSet

Spec: Generators

Within the manifest specification (spec), you can define one or more generators.  This post focuses on the Pull Request Generator specifically, which is denoted by pullRequest.

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - pullRequest:

By default, Argo CD will check for pull requests every 30 minutes.  The manifest provides a method to override this value called requeueAfterSeconds.  In my example, I’ve configured Argo CD to check for PRs every two minutes (120 seconds)

Note: Exercise caution when configuring the requeueAfterSeconds as it could lead to API rate limitation for cloud-based source control managers.

  - pullRequest:
      requeueAfterSeconds: 120

Repo Server

Git providers implement the Git API in the same way for most commands such as pulling, pushing, fetching, etc., except for Pull Request.  This requires that the Pull Request Generator specify which Git repository it is using so that it makes the appropriate API calls.  In my example, I configured the Pull Request Generator to work with my Bitbucket Server instance (see here for a list of Git repository providers and the specifics for configuring them).

  - pullRequest:
      requeueAfterSeconds: 120
      bitbucketServer

For the Bitbucket Server configuration, you’ll need to define the following:

  • Project
  • Repo
  • Api
  • Authentication

Project

The Argo CD example in their documentation is misleading when it comes to this value.  Their example makes it look like this is the Bitbucket Project name; however, it is the Project key value it is looking for.  (Bitbucket Server capitalizes the Project key, however, manifests require these values to be lower-case).

  - pullRequest:
      requeueAfterSeconds: 120
      bitbucketServer:
        project: pul

Repo

Projects within Bitbucker Server may have multiple repositories configured.  This is the name of the repository you would like Argo CD to monitor.

  - pullRequest:
      requeueAfterSeconds: 120
      bitbucketServer:
        project: pul
        repo: pullrequestgenerator

API

This value is simply the URL to the Bitbucket Server instance.  In my case, it is https://bitbucket.octopusdemos.app

  - pullRequest:
      requeueAfterSeconds: 120
      bitbucketServer:
        project: pul
        repo: pullrequestgenerator
        # URL of the Bitbucket Server. Required.
        api: https://bitbucket.octopusdemos.app

Authentication

Argo CD needs to be able to authenticate to the Bitbucket Server so it can monitor the requested repositories.  The Bitbucket Server implementation offers two authentication mechanisms:

  • BasicAuth
  • BearerToken

My example uses the BearerToken method.  This value is a Personal Access Token (PAT) for Bitbucket Server

Argo CD uses Kubernetes resources for this authentication, so the PAT is stored as a Secret within your cluster.  This can be created using something similar to this:

apiVersion: v1
kind: Secret
metadata:
  name: bitbucket-token
  labels:
    argocd.argoproj.io/secret-type: repository
  namespace: argocd
type: Opaque
stringData:
  token: <Personal Access Token value>

This secret is then referred to in the bearerToken section

  - pullRequest:
      requeueAfterSeconds: 120
      bitbucketServer:
        project: pul
        repo: pullrequestgenerator
        # URL of the Bitbucket Server. Required.
        api: https://bitbucket.octopusdemos.app
        # Credentials for Bearer Token (App Token) authentication. Either basicAuth or bearerToken
        bearerToken:
          # Reference to a Secret containing the bearer token.
          tokenRef:
            secretName: bitbucket-token
            key: token

Filters

Filters are how you tell the Pull Request Generator what to match on.  In my example, I’m telling Argo CD to create resources only when a PR is created in branches that end in “-argocd”. The match is done with a regular expression so the period is required, despite not being in the branch name.

- pullRequest:
      requeueAfterSeconds: 120
      bitbucketServer:
        project: pul
        repo: pullrequestgenerator
        # URL of the Bitbucket Server. Required.
        api: https://bitbucket.octopusdemos.app
        # Credentials for Bearer Token (App Token) authentication. Either basicAuth or bearerToken
        bearerToken:
          # Reference to a Secret containing the bearer token.
          tokenRef:
            secretName: bitbucket-token
            key: token
        # authentication is required to access private repositories
      # Labels are not supported by Bitbucket Server, so filtering by label is not possible.
      # Filter PRs using the source branch name. (optional)
      filters:
      - branchMatch: ".*-argocd"

Template

This section defines the template to use when creating the Kubernetes resources.  For the Pull Request Generator, we can utilize some variables such as the branch name ( {{.branch}} ) and the numerical value of the PR ( {{.number}} )

  template:
    metadata:
      name: 'octopub-{{.branch}}-{{.number}}'

Spec

The Spec section of a Template follows the same pattern as the standard ApplicationSet specification.  The biggest difference is going to be in the targetRevision and namespace sections.  You are able make use of the variables previously mentioned to create unique namespaces so that each PR has its own, ephemeral environment.  The targetRevision needs to match the PR, this is one of the few times where you can deviate from the recommended GitOps principal of using something other than HEAD.

spec:
      source:
        repoURL: 'https://bitbucket.octopusdemos.app/scm/pul/pullrequestgenerator.git'
        targetRevision: '{{.head_sha}}'
        path: manifests
      project: "default"
      destination:
        server: https://kubernetes.default.svc
        namespace: 'octopub-{{.branch}}-{{.number}}'
      syncPolicy:
        automated: 
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true

Seeing it in action

If everything is configured correctly, whenever a new PR with the specified filter is created, you will see a new application created within Argo CD!

BitBucket UI
Argo CD UI

Conclusion

The Pull Request Generator is a powerful tool that can help reduce bugs by providing a mechanism of testing PRs before they are merged.  In this post, I broke down the Pull Request Generator to help understand what it does and how to construct one.

The post Anatomy of a Pull Request Generator appeared first on Codefresh.

]]>
https://codefresh.io/blog/anatomy-of-a-pull-request-generator/feed/ 0
Top 30 Argo CD Anti-Patterns to Avoid When Adopting Gitops https://codefresh.io/blog/argo-cd-anti-patterns-for-gitops/ https://codefresh.io/blog/argo-cd-anti-patterns-for-gitops/#respond Tue, 19 Aug 2025 16:36:21 +0000 https://codefresh.io/?p=17169 The time has finally come! After the massive success of our Docker and Kubernetes guides, we are now ready to see several anti-patterns for Argo CD. Anti-patterns are questionable practices that people adopt because they seem like a good idea at first glance, but in the long run, they make processes more complicated than necessary.  […]

The post Top 30 Argo CD Anti-Patterns to Avoid When Adopting Gitops appeared first on Codefresh.

]]>
The time has finally come! After the massive success of our Docker and Kubernetes guides, we are now ready to see several anti-patterns for Argo CD. Anti-patterns are questionable practices that people adopt because they seem like a good idea at first glance, but in the long run, they make processes more complicated than necessary. 

Several times, we have spoken with enthusiastic teams that recognize the benefits of Gitops and want to adopt Argo CD as quickly as possible. The initial adoption phase seems to go very smoothly and more and more teams are getting onboarded to Argo CD. However, after a certain point things start slowing down and developers start complaining about the new process.

Like several open source projects, Argo CD has several capabilities that can be abused if you don’t have the full picture in your mind. The end result almost always makes life really difficult for developers. 

So keep your developers happy and don’t fall into the same traps. Here is the full list of the antipatterns we will see:

NumberArea
1Not understanding the declarative setup of Argo CDAdopting Gitops
2Creating Argo CD applications in a dynamic wayAdopting Gitops
3Using Argo CD parameter overridesAdopting Gitops
4Adopting Argo CD without understanding HelmPrerequisite knowledge
5Adopting Argo CD without understanding KustomizePrerequisite knowledge
6Assuming that developers need to know about Argo CDDeveloper Experience
7Grouping applications at the wrong abstraction levelApplication Organization
8Abusing the multi-source feature of Argo CDApplication Organization
9Not splitting the different Git repositoriesApplication Organization
10Disabling auto-sync and self-healAdopting Gitops
11Abusing the target Revision fieldAdopting Gitops
12Misunderstanding immutability for container/git tags and Helm chartsAdopting Gitops
13Giving too much power (or no power at all) to developersDeveloper Experience
14Referencing dynamic information from Argo CD/ Kubernetes manifestsApplication Organization
15Writing applications instead of Application SetsApplication Organization
16Using Helm to package Applications CRDsApplication Organization
17Hardcoding Helm data inside Argo CD applicationsDeveloper Experience
18Hardcoding Kustomize data inside Argo CD applicationsDeveloper Experience
19Attempting to version Applications and Application SetsApplication Organization
20Not understanding what changes are applied to a clusterDeveloper Experience
21Using ad-hoc clusters instead of cluster labelsCluster management
22Attempting to use a single application set for everythingCluster management
23Using Pre-sync hooks for db migrationsDeveloper Experience
24Mixing Infrastructure apps with developer workloadsDeveloper Experience
25Misusing Argo CD finalizersCluster management
26Not understanding resource trackingCluster management
27Creating “active-active” installations of Argo CDCluster management
28Recreating Argo Rollouts with Argo CD and duct tapeAdopting Gitops
29Recreating Argo Workflows with Argo CD, sync-waves and duct tapeAdopting Gitops
30Abusing Argo CD as a full SDLC platformAdopting Gitops


The order of anti-patterns follows the timeline of an organization that starts with minimal Argo CD knowledge and slowly migrates several applications to the GitOps paradigm.

Anti-pattern 1 –  Not understanding the declarative setup of Argo CD

Following the GitOps principles means that Argo CD can take your Kubernetes manifests (or Helm charts or Kustomize overlays) from Git and sync them on your Kubernetes cluster. This process is well understood by teams and most people are familiar with saving Kubernetes manifests in Git.

It is important to note however that even this link between a cluster and a Git repository is a Kubernetes resource itself.

Argo CD introduces its own Custom Resource Definitions (CRDs) for several of its central concepts such as applications and projects and also reuses existing Kubernetes CRDs for clusters, secrets, etcs.

These files should also be stored in Git. That is the whole point of following GitOps. It doesn’t make sense to use Git only for some files and not store the Applications themselves similarly.

We see several teams that use the Argo CD UI or CLI to create applications that are not stored anywhere and then have difficulties understanding what is deployed where or how to recreate their Argo CD configuration from scratch.

It should be noted that the “create new app” button in the Argo CD UI is only for experiments and quick tests. You should not use it at all in a production environment as the created application is not saved anywhere.

Everything that Argo CD needs should be stored in Git. Recreating an Argo CD instance should be a simple process with minimal steps:

  1. Create a new cluster with Terraform/Pulumi/Crossplane etc.
  2. Install Argo CD itself using Terraform/Autopilot/Codefresh etc.
  3. Point Argo CD to your ApplicationSets or root app-of-apps file
  4. Finished.

Recreating your Argo CD instance is a repeatable process that can be performed in less than 5 minutes (explained in anti-pattern 27).

If you want a comprehensive guide on how to organize your Kubernetes manifests in Git see our Application Set guide.

Anti-pattern 2 – Creating Argo CD applications in a dynamic way

A related anti-pattern occurs when organizations have an existing process for creating applications and when they adopt Argo CD they simply call the Argo CD CLI or API to pass the information they already have.

Essentially there is an existing database or application configuration somewhere else and a custom CLI or other imperative tool does the following:

  1. Extracts application configuration from the existing database
  2. Creates an Argo CD application or Kubernetes manifest on the fly
  3. Applies this file to an Argo CD instance without storing anything in Git.

You fall into this trap if the “official” way of creating Argo CD applications in your organization is something like this:

my-app-cli new-app-name | argocd app create -f - 

Or several times, envsubst is used like this

envsubst < my-app-template.yaml | kubectl apply -f -n argocd

The end result is always the same. You have custom Argo CD applications that are not saved anywhere in Git. You lose all the main benefits of GitOps:

  • You don’t have a declarative file for what is deployed right now 
  • You don’t have a history of what was deployed in the past
  • Recreating the Argo CD instance is not a single-step process any more (see anti-pattern 27)

To overcome this anti-pattern you need to make sure that you use Argo CD the way GitOps works.

You either need to convert your existing database settings and make them read/write to Git or discard them completely and start using Git as the single source of truth for everything.

Then, all day 2 operations should be handled by Git. 

The same is true for updating existing applications. If you want to update the configuration of an application, the process should always be the same:

  1. You (or an external system) change a file in Git
  2. Argo CD notices the change in Git
  3. Argo CD syncs the changes to the cluster.

If you use the Argo CD API or the Kubernetes API to manually patch resources then you are not following GItOps.

Updating applications in production with any of the following commands goes against the GitOps principles.

kubectl set image deployment/my-deployment my-container=nginx:1.27.0
kubectl patch deployment <deployment-name> [.....patch here…]

Use the Kubernetes API only for experiments and local tests, never for production upgrades (see also anti-pattern 10 – disabling auto-sync).

Anti-pattern 3 – Using Argo CD parameter overrides

Yet another way of updating an Argo CD application in a manner we do NOT recommend is the following:

argocd CLI  app set guestbook -p image=example/guestbook:v2.3

The guest book application was just updated to version v2.3. Argo CD syncs the version and everything looks good. But where was this action saved? Nowhere.
This command is using the parameters feature of Argo CD that allows you to override any Argo CD application with your own custom properties. Even the official documentation has a  huge warning about not using this feature as it goes against the GitOps principles.

Even if you save the parameter information in the Application manifest, you have now completely destroyed local testing for developers (see anti-patterns 6 and 17).

Anti-pattern 4 – Adopting Argo CD without understanding Helm

Helm is the package manager for Kubernetes. In its original form it offers several essential features in a single platform:

  • A package format for Kubernetes manifests
  • A repository specification for storing Helm packages in artifact managers
  • A templating system
  • A comprehensive CLI
  • A lifecycle definition (upgrade, install, rollback, test )

Argo CD renders all Helm charts using the template command and completely discards most other lifecycle features. Even though in theory this is a good thing as Argo CD can replace the default Helm lifecycle (e.g. Argo CD comes with its own rollback command), Argo CD assumes that you already know how Helm templates work.

If you are adopting Argo CD and want to use Helm, then make sure that you know how Helm works on its own and how all your applications can be deployed in all your different environments WITHOUT Argo CD. Trying to learn Argo CD and Helm together at the same time is a recipe for failure.

At the very least you should know how to create Helm hierarchies of values

common-values.yaml
+-----all-prod-envs.yaml
   +----specific-prod-cluster.yaml

And how Helm works when loading the hierarchy with the correct overrides

helm install ./my-chart/ --generate-name -f common.yaml -f more.yaml -f some-more.yam

If you use Helm umbrella charts, understand how to override child values from the top chart.

In particular, pay attention to the following:

restaurant:
 menu: vegetarian

This can be a simple value setting for a Helm chart that sets the restaurant.menu value to “vegetarian”. It can also be an umbrella chart which has a dependency subchart called “restaurant” which itself has a property called “menu”. Understand why those two approaches are different and the advantages and disadvantages of each.
We have already written a complete guide on how to use Argo CD with Helm value hierarchies.

Anti-pattern 5 – Adopting Argo CD without understanding Kustomize

This is the same anti-pattern as the previous one but for Kustomize users. Kustomize is a powerful tool and includes several features such as

Argo CD can reuse all Kustomize features, provided that you have structured your Kustomize files correctly first.

Again make sure that your Kustomize files work on their own BEFORE bringing Argo CD into the picture. Well structured Kustomize applications are self-contained. Any developer should be able to use the kustomize (or kubectl) command to extract the configuration for an existing environment without the need for Argo CD.
We presented a full example for  Kustomize configurations in our Argo CD promotion guide.

Anti-pattern 6 – Assuming that developers need to know about Argo CD

This is the corollary to the previous two antipatterns. Several teams mix Argo CD configuration data with Kubernetes configuration making the life of developers extremely difficult.

This is a big problem for developers as one of their most important tasks is to run an application locally both during development and when trying to pinpoint difficult bugs in isolation. Creating Argo CD configurations and coupling them with Kubernetes manifests prevents them from understanding how an application runs independently.

We will see more specific anti-patterns later that also contribute to this problem but it is best to know about this trap in advance when you start working with Argo CD.

Remember also that developers don’t care about Kubernetes manifests. They only care about source code features. It is one thing to ask them to learn the basics (i.e. Helm values) and a completely different thing to require Argo CD knowledge just to be able to recreate the configuration of an existing environment.

When you design your Argo CD repository, you should always consider a developer persona who is an expert on Kubernetes but knows nothing about Argo CD. Can they recreate any configuration of any application on their laptop without using Argo CD? If the answer is no, it means that you are the victim of this anti-pattern. 

Find out where you have hardcoded Argo CD configurations with Kubernetes configurations and remove the tight coupling. For specific advice see anti-patterns 17 for Helm and 18 for Kustomize.

We have seen how to split Kubernetes configuration from Argo CD manifests in our Application Set guide

Anti-pattern 7 – Grouping applications at the wrong abstraction level

As we explained in the first anti-pattern (store everything in Git), an Argo CD application is just a link between a Git repository and a Kubernetes cluster. It is not a deployment artifact or a packaging format.

We have seen teams that abuse an Argo CD application as a generic grouping mechanism, using it for microservices or even completely unrelated applications.

You need to spend some time understanding what your applications do and how tightly coupled they are. If you have a set of “micro-services” that are always deployed together and upgraded together, you might want to use an umbrella chart for them. 

Argo CD applications should generally model something that requires individual deployments and updates. If you have several Argo CD applications that you always want to be managed together (but still want to deploy and update individually), then a better choice might be the app-of-apps pattern.

If several applications need to be deployed to different or similar configurations, then Application Sets are the proper recommendation.

So any time you want to group several applications, ask the following questions:

  1. Are those applications always deployed and upgraded as a single unit?
  2. Are those applications related in a business or technical manner?
  3. Do you want to use different configurations for different clusters for these applications?
  4. Is this combination of applications always the same? Do you sometimes wish to deploy a subset of them or a superset?
  5. Are these applications managed by a single team or multiple teams?

If you are unsure where to start, looking at Application Sets is always the best choice. Check also anti-pattern 19 (attempting to version application CRDs).

Anti-pattern 8 – Abusing the multi-source feature of Argo CD

This is a close relative of the previous anti-pattern. The Multi-source feature of Argo CD was one of the most requested features in the project’s history. The feature was created to solve a single scenario:

  1. You wish to use an external Helm chart that is not hosted by your organization
  2. You want to use your own Helm values and still store them in your own Gitrepository
  3. You need a way to instruct Argo CD to combine the external Helm chart with your own values.

The feature is finally implemented in Argo CD and you can finally do the following:

apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
  sources:
  - repoURL: 'https://prometheus-community.github.io/helm-charts'
    chart: prometheus
    targetRevision: 15.7.1
    helm:
      valueFiles:
      - $values/charts/prometheus/values.yaml
  - repoURL: 'https://git.example.com/org/value-files.git'
    targetRevision: dev
    ref: values

Unfortunately, Argo CD does not limit the number of items you can place in the “sources” array. Several people have misunderstood this, abusing the feature to group multiple (often unrelated) applications.

Don’t fall into this trap. The multi-source feature was never designed to work this way and several standard Argo CD capabilities will either be broken or not work at all if you use multi-sources as a generic application grouping mechanism.
The correct way to group applications is with Application Sets. If you want to use multisource applications with Helm hierarchies, we have also written an extensive guide.

Anti-pattern 9 – Not splitting the different Git repositories

If you look at any Kubernetes application from a high level it is comprised of 3 distinct types of files

  1. The source code
  2. Kubernetes manifests (deployment, service, ingress etc)
  3. Argo CD application manifests (or application sets)

We have already explained that as a best practice the source code should be separate than the manifests. If you are in a big organization it might also make sense to split Kubernetes manifests from Argo CD manifests.

If you keep all manifests in a single Git repository you will have issues with both CI (Continuous Integration) and CD (Continuous Deployment)  phases. Your CI system will try to auto-build application code when a manifest changes, and Argo CD will try to sync applications when a developer changes the source code.

There are several workarounds for these scenarios, but why try to solve a problem that shouldn’t exist in the first place?

We have also seen several variations of the same pattern that make the deployment process even more complicated. A classic scenario to avoid is the following

  1. The source code of the application is in Git repository A
  2. The Helm manifest is in Git repository B
  3. Only the Helm values for the different environments are also stored in Git repository A

The assumption here is that Helm values are close to the source code that developers need to change. In reality, it never makes sense to have access to Helm values without also having access to the Helm chart that uses them. Either assume that developers know about Helm and show them everything, or assume that they don’t care about Kubernetes at all and offer them a different abstraction that hides Argo CD completely for them.

Yet another problem is mixing Kubernetes manifests and Argo CD manifests in the same repository but instead of using different folders, you hardcode Kubernetes information into Argo CD information. This is described in detail in anti-patterns 17 and 18.

Anti-pattern 10 – Disabling auto-sync and self-heal

Several times, migrating to Argo CD is a big undertaking, especially for organizations that have invested a significant amount of effort in traditional pipelines. After all, it can be argued that Argo CD simply replicates what is already possible with an existing Continuous Integration system:

  1. A change happens in Git in a manifest
  2. A separate process picks up the Git event
  3. The process uses kubectl (or a custom script) to apply the changes in the cluster.

This could not be further from the truth as Argo CD also works the other way around. It monitors changes in the cluster and compares them against what is in Git. Then you can make a choice and either review those changes or discard them altogether.

This means that Argo CD solves the configuration drift problem once and for all, something that is not possible with traditional CI solutions.

But this advantage only exists if you let Argo CD do its job. Some organizations disable the auto-sync/self-heal behavior in Argo CD in an effort to “lock-down” or fully control production systems.

This is a bad choice because production systems are exactly the kind of systems where you want to avoid configuration drift. Manual changes that happen in production (during hotfixes or other debugging sessions) are one of the biggest factors affecting failed deployments.

We recommend that you have auto-sync/self-heal for all your systems both production and non-production. 

Locking down a system should not be done in Argo CD itself, but enforced on the cluster and Git level. The most obvious solution is to reject any direct commits in a Git repository that controls your production system and only allow developers to create Pull Requests which must pass several manual and automated checks before landing in the mainline branch that Argo CD monitors.

If you disable auto-sync/self-heal you are missing the number one advantage of moving to Argo CD from traditional pipelines (eliminating configuration drift).

Anti-pattern 11 – Abusing the targetRevision field

Promoting applications when adopting Argo CD is one of the biggest challenges for developer workloads. People see the targetRevision field in the Argo CD application and assume it is a promotion mechanism. 
The first issue is when teams use semantic version ranges to force Argo CD to update an application automatically to a newer version. The second issue is if they continuously update the targetRevision field to different branch names or attempt to implement “preview environments” by pointing an Argo CD application to a temporary developer/feature branch.

We have written a complete guide about the issues of abusing targetRevision.

In general, we recommend you always use HEAD in the targetRevision field which is also the default value.

Anti-pattern 12 – Misunderstanding immutability for container/git tags and Helm charts

This is not an anti-pattern with Argo CD per se, but it is closely related to the targetRevision choices, as explained in the previous anti-pattern.

We have seen several cases of people adopting Argo CD without first understanding the foundations (Helm, container registries, git tags). Several times, people use a specific git tag or Helm version in Argo CD without realizing that:

  1. Git tags seem to be immutable, but can be deleted and recreated with the same name
  2. Helm chart versions are mutable. This is how Helm was designed
  3. Container tags are mutable by default.

Let’s take these points one by one.

Container tags are mutable. You can create a container called my-app:v1.2 and then change something and push another container with the same tag. So just because you see the same container tag doesn’t mean that it is actually the same application. Some binary repository implementations don’t allow you to do this, but this is not always the default setting.

Helm chart versions are also mutable. You can change the contents of a Chart version and use the exact same version. Again, this is how Helm charts are created. 

In fact Helm offers an additional property – the appVersion which can store the “application” version. So you can have a Helm chart with 3 “version” fields:

  • The container image (mutable by default)
  • The appVersion field (mutable)
  • The Chart version (mutable)

So unless you control how developers work with code and manifests and also configure your Helm chart repository correctly, you don’t really know if a Helm chart version contains the same thing as another chart with the same version.

Git tags can also be overwritten. You can see this very easily with any GitHub repo with default settings.

git tag -a v1.2 -m "first tag"
git push --tags
echo "A change" >> README.md
git tag -d v1.2
git push origin :v1.2
git tag -a v1.2 -m "first tag"
git push --tags

Here we just pushed the same tag (v1.2) twice but with different contents. So if you were using this tag in the targetRevision field of Argo CD, you now have the “same” application without actually having the same contents.

The end result is that using tags and Helm chart versions in Argo CD doesn’t really restrict your developers unless you actively set up the rest of the ecosystem (Git repositories, Helm repos and binary artifact managers) to also work with immutable data.

Never assume you have a “locked-down” Argo CD system when the rest of the ecosystem allows developers and operators to create stuff with the same container/Helm/Git version.

Anti-pattern 13 – Giving too much power (or no power at all) to developers

When adopting Argo CD you need to make a decision about how much power and exposure you want to give to developers. On one end of the spectrum we see installations where developers have full access to the Argo CD UI and can sync/deploy their applications at will. On the other end we see locked down installations where developers are given very little power or Argo CD is completely hidden from them.

It is important to understand that despite these two extremes, there are several choices in the middle. First, you can configure the Argo CD access (and UI) to show only applications specific to each team. You can even do advanced scenarios where you show applications from other teams but in a read-only mode.

In our Argo CD RBAC guide, we have explained how the RBAC for Argo CD works and how you can show content specific only to a developer team. 

There is no right or wrong answer here, but you need to balance the flexibility versus the security that you want to offer to developers. 

On a related note we have already explained that developers don’t really care about Argo CD manifests and they shouldn’t also be forced to install Argo CD for local testing (see anti-pattern 6) . 

So a recommended workflow would be the following:

  1. Developers can test and deploy their applications locally without using Argo CD at all
  2. When they have created a feature branch this should be converted to a preview/temporary environment using the Pull Request generator without any human intervention
  3. Once the feature is ready, it will be deployed to production simply by merging the Pull request
  4. A system designed for promotion should propagate the changes to the next environment (check anti-pattern 30).

Ideally developers should only come in contact with Argo CD in the last phase and only in the case of a failure. In the happy path scenario where everything works as planned, developers shouldn’t have to debug anything with Argo CD.

Anti-pattern 14 – Referencing dynamic information from Argo CD/ Kubernetes manifests

This is a more specialized anti-pattern related to number 2 (creating applications in a dynamic way). The second GitOps principle explains that the desired state of your system should be immutable, versionable and auditable.

This is only true if you store EVERYTHING that the application needs in Git (or your chosen storage method). In the case of Kubernetes/Argo CD manifests, this means that all values used should be static and known in advance.

It is ok if you want to post-process your manifests in some way as long as this happens in a repeatable manner OR you also save the result itself in Git.

The problem starts when your configuration is not known in advance but requires real-time access to something else. 

The best example to illustrate this is the Helm lookup method which mutates the Helm chart to a different value without knowing in advance what this value is. This is problematic with Argo CD because having access to just the application manifests is not enough to run the application anymore.

Like anti-patterns 17 and 20 this also makes lives difficult for developers as they cannot run the application locally anymore (anti-pattern 6).

Note that the only exception to this rule is secrets. Even though you can store encrypted secrets in Git, it is also ok to reference them from an external source.
But make sure to understand the difference between referencing secrets from manifests versus injecting secrets into manifests.

Anti-pattern 15 – Writing applications instead of Application Sets

The Application CRD is the main entity in Argo CD that links a cluster and a GitHub repository. If you have a small number of applications or work in a homelab environment it is OK to write these files by hand.

But for any production installation of Argo CD (used in a company) you generally wouldn’t need to write Application files by hand. In fact you should not even deal with Application files at all.

The recommendation is to use directly Application Sets

In a big organization you will rarely have to deal with a single application on its own. Almost always you want to work with a group of applications. Some examples are:

  • A set of applications that go to the same cluster
  • A set of applications that are managed by the same team
  • A set of applications that share a configuration
  • A set of applications that should be deployed/updated as a unit.

Application Sets implement this grouping and also automate the tedious YAML needed. For example if you have 4 applications and 10 clusters you don’t really want to create 40 YAML files by hand.

  1. You create a single cluster generator that iterates over your clusters
  2. You create 4 folders for the 4 applications
  3. The Application set runs and creates automatically all 40 combinations for you.

Some people resist using Application Sets because they think it is yet another abstraction that hides their real Application manifests.

This used to be true, but in the latest Argo CD releases the CLI allows you to render an application set and see exactly what applications will be created. You can run this command either manually or in a CI pipeline (when a pull request is created) so that you have instant visibility on what will change.

In general though, as we will see later all application set files should be created only once.They are not a deployment format (anti-pattern 19) and you shouldn’t have to edit ApplicationSets for simple operations.
We have written a dedicated guide on how to use Application Sets with Argo CD.

Anti-pattern 16 – Using Helm to package Applications instead of Application Sets

This anti-pattern is the so called “Helm sandwich”. This is the case where:

  1. A set of Kubernetes manifests is packaged as a Helm chart
  2. The Helm chart is referenced from an Argo CD application Manifest
  3. The Argo CD application manifest itself is packaged in another Helm chart

Essentially, there are two instances of Helm templating on two different levels.

Teams that adopt this anti-pattern are familiar with Helm and assume that if it works great for plain Kubernetes manifests, it would also work great for Argo CD application manifests. 

So why is this approach an anti-pattern? Using Helm for applications is not an issue on its own, but it is the start for several other anti-patterns:

  1. Because Helm charts have a version people try to version Applications -> Anti-pattern 19
  2. Chart version numbers often result in abusing target Revision for promotions -> Anti-pattern 11
  3. If you use Helm everywhere it is super easy to hard-code Helm values in Application manifests -> Anti-pattern 17
  4. People miss all the Argo CD specific features of application sets -> Anti-pattern 15
  5. Distributing applications to different clusters happens with snowflake servers instead of the cluster generator -> Anti-pattern 21

The biggest problem however is that it ruins again completely the developer experience. You now have 2 levels of Helm values and this is a recipe for disaster:

Even speaking for just operators/admins, packaging applications in a Helm chart creates an extra level of indirection that not only is not needed but makes your life more difficult as Argo CD was never created for this “Helm sandwich”.

Our recommendation is obviously to use Application Sets for configuring Applications, as already explained in the previous anti-pattern.


Application Sets are the preferred way to generate Applications and they support the exact same templating functions as Helm charts. Basically if you can template it with Helm, you should be able to template it with Application Sets.

Using the “Helm sandwich” pattern makes the process more complex for everybody involved in the software lifecycle.
For more information, see our Application Set guide.

Anti-pattern 17 – Hardcoding Helm data inside Argo CD applications

If you follow the advice we outlined in Antipattern 4 you should have a clear separation between your Helm charts and your Argo CD manifests.

The Helm charts can be used independently (even by developers) and contain all application settings. The Argo CD application manifest simply defines where this application runs, and only operators need to change this file.

Those two kinds of files also have a different lifecycle

  • Helm values are expected to change all the time (even by developers)
  • Argo CD application manifests are created once and never changed again.

With Argo CD it is possible to hardcode Helm information inside an Application CRD. But just because you can do this, doesn’t mean it is a good idea to use this feature.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-helm-override
  namespace: argocd
spec:
  project: default


  source:
    repoURL: https://github.com/example-org/example-repo.git  
    targetRevision: HEAD  
    path: my-chart  


    helm:
      # DONT DO THIS
      parameters:
      - name: "my-example-setting-1"
        value: my-value1
      - name: "my-example-setting-2"
        value: "my-value2"
        forceString: true # ensures that value is treated as a string


      # DONT DO THIS
      values: |
        ingress:
          enabled: true
          path: /
          hosts:
            - mydomain.example.com


      # DONT DO THIS
      valuesObject:
        image:
          repository: docker.io/example/my-app
          tag: 0.1
          pullPolicy: IfNotPresent
  destination:
    server: https://kubernetes.default.svc
    namespace: my-app

The big problem with this file is that you are mixing 2 different kinds of information with 2 different lifecycles. The application CRD is something that is interesting mostly to operators/administrators while Helm information is interesting to developers and is also expected to change a lot. 

By mixing this information you make manifests harder to understand for everybody.

Now you have Helm information in two places (helm values and the helm property inside the Argo CD application manifest). It is very hard to understand what settings exist where and how to audit deployment history. Argo CD application manifests now must also change all the time especially if they define container images.

This manifest mixing is also the root of several other anti-patterns such as

  • Using Applications as a unit of work (anti-pattern 19)
  • Abusing the targetRevision field for promotions (anti-pattern 11)
  • Not understanding how Helm hierarchy works (anti-pattern 4)
  • Assuming that developers need Argo CD (anti-pattern 6)

The last point is especially important for developers. If you follow this practice, you have completely destroyed local testing for developers as they cannot run the application on its own anymore.

Even though we speak only about developers and operators here, several other scenarios will make this approach difficult to use

  1. Security teams will have a hard time understanding the settings for each application
  2. Your CI system cannot upgrade images just on Kubernetes manifests anymore. It also needs to look at Argo CD manifests and check if image definitions exist there as well
  3. It couples your Applications to specific Argo CD features

The correct solution is of course to store all Helm information in Helm values

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-helm-override
  namespace: argocd
spec:
  project: default


  source:
    repoURL: https://github.com/example-org/example-repo.git  
    targetRevision: HEAD  
    path: my-chart  


    helm:
      ## DO THIS (values in Git on their own)
      valueFiles:
      - values-production.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: my-app

Now there is a clear separation of concern between developers and operators.

People who want to know applications’ settings can look at Helm values, while people who want to know where applications are deployed can look at Argo CD manifests. It is possible to run a Helm application without Argo CD.

For more information about the different types of manifests and how to split them see our Application Set guide

This anti-pattern is even worse if coupled with the previous anti-pattern (the Helm sandwich).

Now you have 3 places where configuration settings can be stored:

  1. The Helm values of the chart that gets deployed
  2. The helm property inside the Application Manifest that references the Helm chart
  3. The Helm template that renders the Application manifest before getting passed to Argo CD 
Helm auditing

The result is a nightmare for anybody who wants to understand how an application gets deployed.

Anti-pattern 18 – Hardcoding Kustomize data inside Argo CD applications

This is the same anti-pattern as the previous one but for Kustomize. Again for convenience Argo CD allows you to hardcode Kustomize information inside an Application YAML:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-kustomize-override
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/example-org/example-repo.git  
    targetRevision: HEAD  
    path: my-app  
   
    # DONT DO THIS
    kustomize:
      namePrefix: prod-
      images:
      - docker.io/example/my-app:0.2
      namespace: custom-namespace


  destination:
    server: https://kubernetes.default.svc
    namespace: my-app

It is the same problem as before where you are mixing different concerns in a single file. Kustomize information should only exist in Kustomize overlays:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-proper-kustomize-app
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/example-org/example-repo.git  
    targetRevision: HEAD  
    ## DO THIS. Save all values in the Kustomize Overlay itself
    path: my-app/overlays/prod  


  destination:
    server: https://kubernetes.default.svc
    namespace: my-app

As explained before this helps developers during local testing. A developer can simply run “kustomize build my-app/overlays/prod” and get the full configuration of how my-app runs in production. No knowledge of Argo CD is required and no local installation of Argo CD is needed.

Developers can define how an application runs (its settings), while operators can decide where (which cluster) the application is deployed.

At the same time, several supporting functions are very easy:

  • Git history of the overlays is the same as the deployment history
  • There is only a single source of truth for configuration (the overlays)
  • Developers don’t need to know how to use Argo CD at all
  • It is very easy to diff settings between environments.

A detailed example with Kustomize overlays is available in our GitOps promotion guide.

Anti-pattern 19 – Attempting to version and promote Applications/Application Sets

An Argo CD application is just a link between a cluster and a Git repository. There is NO version field in the Application CRD. An application manifest is neither a packaging format nor a deployment artifact. The same is also true for Application Sets. You never deploy application sets. You just use them to auto-generate application manifests. Application Sets have no version on their own.

The lack of a version field is not a big problem because the expectation for both Applications and Application Sets is that you create them once and then never update them again. So, a version field is unnecessary as all Argo CD Applications are considered static.

However, we see several teams that try to use Applications as the unit of work (see anti-pattern 7) or continuously update those files in the targetRevision field (see anti-pattern 11). 

At this point teams try to create their own versioning on top of the Argo CD manifests and of course they fail because Argo CD was never designed this way.

The same is true for promotions. You cannot really “promote” an Argo CD application from one cluster to the next. It doesn’t work this way. You can only promote values that are referenced from one application to the next (essentially copying them).

The way promotions work in Argo CD is the following:

  1. There is an Argo CD application manifest in QA that points to a Helm chart or Kustomize overlay
  2. There is an Argo CD application manifest in Staging that points to different Helm values or Kustomize overlays
  3. When it is time to promote you copy the Helm values or Kustomize overlay from the QA files to the Staging files. 
  4. The Argo CD application manifests are not affected in any way. They are exactly the same as they were before.

See also the related anti-patterns of hardcoding Helm (anti-pattern 17) or Kustomize data (anti-pattern 18)  inside applications.

If you find yourself constantly updating Argo CD application manifests (or Application Sets) you have fallen into this trap. Your Argo CD manifests should be created only once and never touched again. No process is simpler than not needing a process at all (to update Application manifests).

On a related note, we have created GitOps Cloud to solve this problem with promotions and allow you to promote applications from one cluster to another.

Anti-pattern 20 – Not understanding what changes are applied to a cluster

One of the main benefits of using Argo CD is that all your Git tools work out of the box and you can reuse your code review process for your Kubernetes manifests.

The most basic capability of storing anything in GitHub is creating a Pull Request before merging any changes. This allows humans to review what will change and also run any automated tools to verify and validate the changes.

Unfortunately this review process will not really work on files such as Helm charts or Application Sets or Kustomize Overlays. Let’s assume you need to review the following change:

You need to manually run Helm in your head to understand what is happening here. Humans never want to do that. Obviously one approach would be to pre-render all your manifests so that reviews only happen in the final content. There are however several alternatives that you can consider such as having your CI system render on the fly the manifests and show you what will actually happen. 

Here is the exact change as before but this time on the final chart.

We have written a full guide on how to preview and diff your Kubernetes manifests with several other approaches.
Specifically for Argo CD you should also check https://github.com/dag-andersen/argocd-diff-preview and also understand that you can use the Argo CD CLI to render your Application Sets to Application CRDs.

Anti-pattern 21 – Using ad-hoc clusters instead of cluster labels

If you have a large number of clusters that you wish to manage with Argo CD, your first question is always whether to use a single Argo CD instance or multiple ones.

Once you answer this question the next step is to understand how you can distribute different applications to different clusters. The answer for this question is Application Sets.

Unfortunately we see a lot of teams that don’t understand how the cluster generator works and instead try to create ad-hoc cluster configurations (the pet vs cattle philosophy).

Examples to avoid often appear like this:

## DO NOT DO THIS 
- merge:
      mergeKeys:
        - app
      generators:
        - list:
            elements:
              - app: external-dns
                appPath: infra/helm-charts/external-dns
                namespace: dns
              - app: argocd
                appPath: infra/helm-charts/argocd
                namespace: argocd
              - app: external-secrets
                appPath: infra/helm-charts/external-secrets
                namespace: external-secrets
              - app: kyverno
                appPath: infra/helm-charts/kyverno
                namespace: kyverno
        - list:
            elements:
              - app: external-dns
                enabled: "true"
              - app: argocd
                enabled: "true"
              - app: external-secrets
                enabled: "false"
              - app: kyverno
                enabled: "true"
    selector:
      matchLabels:
        enabled: "true"

Or this

## DO NOT DO THIS 
Version: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: my-staging-cluster
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - matrix:
        generators:
        - git:
            repoURL: '<url>'
            revision: HEAD
            files:
              - path: customConfig/base.yaml
              - path: customConfig/{{ .Values.domainId }}/*.yaml
        - list:
            elements:
              - appName: 'auth'
              - appName: 'search'
              - appName: ‘billing’
              - appName: ‘payments’

These kind of configurations create snowflake servers that need constant maintenance. If you have fallen into this trap try to understand how much time you will need to spend in the following scenarios:

  • Creating a brand new server
  • Migrating an application from one server to another
  • Applying a global setting to all your servers
  • Making a different configuration change for a subset of your servers.

This kind of cluster management is even more problematic for developers, as simply understanding what applications are deployed where is not easy (already explained in anti-pattern 6).

 Our recommendation is to create a production-ready setup using cluster labels. We have described all the details in a dedicated guide for the cluster generator and Argo CD application sets.

Anti-pattern 22 – Attempting to use a single application set for everything

A related anti-pattern is when teams discover application sets and, for some unknown reason, try to cram all their applications into a single application set. This often results in a complex mix of different generators that is hard to understand and hard to debug.

The recommendation is to have many different application sets in your Argo CD setup. Ideally, you should have different application sets per “type” of application. This type can be anything that makes sense to you. For example:

  • An application set for all staging apps
  • An application set for all AWS clusters
  • An application set for all infra apps
  • An application set for the billing team
  • An application set for the payments team

You can slice and dice your applications in different dimensions. But in the end, you will have many application sets, and any time you make a change, you should instantly know where it should happen.

  • A new requirement is that all AWS clusters need to get sealed secrets -> Change the AWS application set.
  • A new requirement is that the billing team add a new microservice to their setup -> Change the “billing” application set.
  • All new clusters must upgrade Prometheus -> Change the “common” application set.

There is no technical limitation on the number of application sets you can have on a single Argo CD installation. So, spend some time understanding your application sets accordingly.

As always, our Argo CD application guide is the best starting point. 

Anti-pattern 23 – Using Pre-sync hooks for db migrations

Argo CD phases/waves allow you to define the order of Resources synced within a single Argo CD application. The pre-sync phase, in particular, can be used to check deployment requirements or perform other checks that need to happen before the main sync phase.

We often see organizations attempting to use pre-sync hooks for database migrations. The assumption here is that the database schema must be updated just before the new application version. Unfortunately, most organizations use legacy DB migration tools, and almost always, they package a DB migration CLI tool in the pre-sync phase, which is not the proper approach for Kubernetes applications.

Using pre-sync hooks for database migrations has several issues. The most important one is that the DB CLI tool is just a black box for Argo CD. The CLI runs and never reports back to Argo CD what really happened with the database. This can leave the DB in an inconsistent state where the main sync phase will always fail.

The second problem is that Argo CD is based on continuous reconciliation, where an application might be synced for several reasons and in different time frames. Unfortunately the traditional DB CLI tools are rarely created with this scenario in mind. Most times they assume they run inside a typical CI pipeline only once.

At this point, Argo CD users are looking for several hacks to force the pre-sync hook to run ONLY in the initial application deployment and not in any subsequent sync events, as this either slows down the deployment or breaks the database completely.

The correct approach is to use a Database migration operator built specifically for Kubernetes. We have written a full guide using the AtlasGo DB operator

Anti-pattern 24 – Mixing Infrastructure apps with developer workloads

We have seen in anti-pattern 7 several ways to group applications with GitOps (applicationsets, apps-of-apps, Helm umbrella charts).

These grouping methods should always be used for the same types of applications. They should group either infra applications (core-dns, nginx, prometheus) OR the applications that developers create. 

You should never mix applications of different types as you force developers to deal with infrastructure errors.

Infra before apps

When you create a new cluster and hand it over to developers, it should already have everything they need. The respective application sets should also have installed any infrastructure applications. 

Mixing both infrastructure and developer applications might be easier for you (to control directly the deployment order) but always results in a bad user experience for developers.

If developers have access to the Argo CD UI and see a deployment error, they should know immediately that it is something they can fix themselves.

Anti-pattern 25 – Misusing Argo CD finalizers

Argo CD finalizers allow you to define what happens when an Argo CD application (or application set) is removed. You must understand how finalizers work and the impact of adding/removing a finalizer from a resource.

Several teams have accidentally deleted one or more Argo CD applications because they never understood how finalizers work. Other times several resources are “stuck” and never recreated because of a misconfiguration with finalizers.

Finalizers are also very useful when you want to migrate applications from one Argo CD instance to another.
We have written a comprehensive guide about Argo CD finalizers and how to use them.

Anti-pattern 26 – Not understanding resource tracking

This anti-pattern is related to the previous one. First of all it is important to understand how Argo CD tracks and “adopts” Kubernetes resources. You can have Kubernetes resources that are not managed by Argo CD, or Argo CD resources that “owned” by other Kubernetes controllers.

It is vital to know that the relationship between a Kubernetes resource and the Argo CD application that owns it is not always 1-1. You can have

  1. Argo CD applications that no longer contain any Kubernetes resources
  2. Kubernetes resources that are no longer owned by an Argo CD application

The second scenario is achieved with finalizers. This pattern is very useful when you want to migrate applications from one Argo CD instance to another without downtime. The full process is the following:

  1. Argo CD instance A owns all Kubernetes resources
  2. You remove all finalizers for all applications (and application Sets)
  3. You delete all Argo CD applications
  4. The Kubernetes resources are still running just fine. There is no downtime
  5. You apply the same Argo CD applications to Argo CD instance B
  6. Argo CD instance B will adopt the same Kubernetes resources as before (with no downtime)

You can try this exact scenario of moving Argo CD applications between instances without downtime in our Gitops Certification (level 3) course.

The exact process can be used to upgrade an existing Argo CD instance to a new version in the safest way possible. 

Anti-pattern 27 – Creating “active-active” installations of Argo CD

This is the corollary to the previous two anti-patterns. We see several several teams that try to create “active-active” installations of Argo CD with the following requirement:

  1. The main Argo CD instance is controlling all applications and deployments
  2. There is a secondary Argo CD instance that is also pointed to the same cluster
  3. If the main Argo CD instance “fails” the secondary instance “jumps-in” 
  4. When the main Argo CD instance is restored it “adopts” again all applications.

These teams are disappointed to learn that Argo CD doesn’t support this and even the centralized mode is for controlling other Kubernetes clusters and not other Argo CD instances.

This requirement doesn’t really make sense for Argo CD and teams that look for this “active-active” configuration haven’t really understood resource tracking.

First of all it is important to understand that Argo CD only deploys applications. It doesn’t really control them. If Argo CD fails, new deployments will stop but existing applications will continue to work just fine.  And even if those fail for some reason, their pods will be rescheduled/restarted by the Kubernetes cluster (even if Argo CD is no longer operational).

The disaster recovery scenario for Argo CD is straightforward if your team has everything in Git (see antipattern 1). You can launch a second Argo CD cluster and point it to the same application manifests. Argo CD will then adopt the existing Kubernetes resources.

This is even possible if the central Argo CD instance has issues but still runs, as you can use finalizers (as explained in the previous section) to migrate applications to the second Argo CD instance without downtime.

Keeping a second Argo CD instance in “active-active” mode only wastes resources. 

Anti-pattern 28 – Recreating Argo Rollouts with Argo CD and duct tape

Bad deployments always happen regardless of whether you are using Argo CD or not. That is a fact for any software team. So how do failed deployments work if you have adopted GitOps?

The simplest way to fix a failed deployment is to roll “forward”. Make a new release or fix the Kubernetes manifests and once you commit, Argo CD will deploy the new changes and hopefully bring back the application to a good state.

Argo CD also includes a “rollback” command which simply points the application back to a previous Git hash. This sounds great in theory but it has 2 major problems:

  1. It works only if auto-sync is disabled (anti-pattern 10)
  2. It breaks GitOps as your cluster doesn’t represent what is in Git anymore

At this point, teams start creating custom solutions to overcome these limitations. The most common approaches we see are:

  1. Trying to detect a failed deployment, and then disable auto-sync on the fly while rolling back
  2. Using notifications with external metric providers who will automatically try to revert a commit on their own, so Argo CD will sync as usual to the previous version

These custom solutions are always clunky and create more problems than they solve. You know that your team is falling into this trap if you hear people always asking the question, “How can I disable auto-sync temporarily?”

The recommended solution is to use Argo Rollouts

Argo Rollouts is a progressive Delivery controller designed for this exact scenario—automated rollbacks when things go wrong. It also comes with its own resource (Analysis) that allows you to look at your metrics during a deployment and roll back without any human intervention.


Argo Rollouts will handle production deployments while non-production environments can still use plain Argo CD.

Anti-pattern 29 – Recreating Argo Workflows with Argo CD, sync-waves and duct tape

The sync wave feature of Argo CD allows you to execute tasks before and after the main sync phase. These tasks should ideally be idempotent and quick to finish. Some examples are:

  • Sending a notification to another system
  • Performing a quick smoke test
  • Verifying that a dependency exists

We see teams that misuse the sync waves in Argo CD with long-running tasks that are part of a bigger process with strict requirements such as 

  • Automatic retries 
  • If/else control flows
  • Dependency graphs and fan-in/fan-out configuration
  • Artifact storage are retrieval

Sync waves were NEVER designed for this kind of requirement. If you try to do this, you will soon resort to custom scripts that nobody wants to maintain. Adopting Argo CD for deployments and then trying to incorporate custom scripts in the sync process is a huge step backwards.

If you have this kind of process you should use Argo Workflows which handle exactly these kinds of requirements.

Argo Workflows are Kubernetes native workflows that offer you all these features out of the box.

Therefore the whole sync process must be:

  1. Run an Argo Workflow before the sync process
  2. Perform the main Sync phase
  3. Run another Argo Workflow after the sync process.

Argo Workflows will then handle all the heavy lifting using declarative Kubernetes resources instead of custom scripts.

Anti-pattern 30 – Abusing Argo CD as a full SDLC platform

Despite all the features and the developer-friendly UI, Argo CD is very simple at its core. It is a powerful sync engine that continuously watches what you have in Git and applies the change to your cluster. All the features are centered around this use case.

However, deploying an application is only part of the software development life cycle (SDLC). Several other requirements must be met before (the CI process) and after (observability)  the main deployment.

We have seen several teams trying to expand Argo CD’s scope and make it something it never was. Most importantly, Argo CD has no visibility in your Continuous Integration (CI)  process. Argo CD doesn’t know:

  1. What are the new features in the container that gets deployed
  2. Who made the container build
  3. If the application has passed your unit and integration tests
  4. If your new container has passed your security scans
  5. Who approved the Pull request of the source code change

In fact, Argo CD doesn’t even know that it deploys a new container that includes commits from a source code repo. All it knows is diffing and applying Kubernetes manifests without any insights about the business features behind.

Attempting to integrate this information into Argo CD either through custom plugins or custom YAML segments is always a clunky process. We understand the need for a unified interface. Developer teams love the Argo CD UI and think they can use it as a central dashboard for everything. 

That is not the role of Argo CD. You can create a developer portal that uses Argo CD behind the scenes, but Argo CD is not a developer portal itself.

If you want to use a central platform for all your Argo CD instances that also combines deployment information with the CI world, check out Codefresh GitOps cloud

Conclusion

We hope this comprehensive guide is useful and has provided several good and bad practices when adopting GitOps. Argo CD is a great tool, but it offers several knobs and switches that can be used with undesirable results.

Some features can be abused in several ways, simply because no good documentation exists about the history of the feature, what the intended use is, and what to avoid. 

Using this guide you can start your Argo CD journey in the best way possible as you now have the knowledge of what to avoid before investing a significant amount of effort into your Application manifests.

Here is a summary of all the anti-patterns and our recommendation:

  1. Not understanding the declarative setup of Argo CD -> Store Application CRDs in Git.
  2. Creating dynamic Argo CD applications -> Use Git as the single source of truth for application configuration.
  3. Using Argo CD parameters -> Avoid using the parameters feature as it goes against GitOps principles.
  4. Adopting Argo CD without understanding Helm -> Understand how Helm works independently before adopting Argo CD.
  5. Adopting Argo CD without understanding Kustomize -> Ensure your Kustomize files work on their own before integrating with Argo CD.
  6. Assuming that developers need to know about Argo CD -> Design your Argo CD applications so developers can recreate configurations without Argo CD knowledge.
  7. Grouping applications at the wrong abstraction level -> Use Application Sets or app-of-apps pattern for proper application grouping.
  8. Abusing the multi-source feature of Argo CD -> Use multi-source as a last resort and only for edge case scenarios.
  9. Not splitting the different Git repositories -> Separate source code, Kubernetes manifests, and Argo CD application manifests into different Git repositories.
  10. Disabling auto-sync and self-heal -> Keep auto-sync/self-heal enabled for all systems, including production.
  11. Abusing the targetRevision field -> Always use HEAD in the targetRevision field.
  12. Misunderstanding immutability for container/git tags and Helm charts -> Actively set up the ecosystem (Git, Helm repos, artifact managers) to work with immutable data.
  13. Giving too much power (or no power at all) to developers -> Balance flexibility and security with Argo CD RBAC, and enable local testing without Argo CD.
  14. Referencing dynamic information from Argo CD/ Kubernetes manifests -> Store all values used in manifests statically in Git.
  15. Writing applications instead of Application Sets -> Use Application Sets to automate the creation of Application files.
  16. Using Helm to package Applications instead of Application Sets -> Learn how Application Sets work and their features.
  17. Hardcoding Helm data inside Argo CD applications -> Store all Helm information in Helm values, separate from Argo CD manifests.
  18. Hardcoding Kustomize data inside Argo CD applications -> Store Kustomize information only in Kustomize overlays separate from Argo CD manifests
  19. Attempting to version and promote Applications/Application Sets -> Promote values or overlays, not Application manifests themselves.
  20. Not understanding what changes are applied to a cluster -> Use tools or CI systems to preview and diff rendered Kubernetes manifests.
  21. Using ad-hoc clusters instead of cluster labels -> Use cluster labels and Application Sets to distribute applications to different clusters.
  22. Attempting to use a single application set for everything -> Have many different Application Sets, each with a different purpose/scope.
  23. Using Pre-sync hooks for db migrations -> Use a Database migration operator explicitly built for Kubernetes.
  24. Mixing Infrastructure apps with developer workloads -> Separate infrastructure applications from developer workloads.
  25. Misusing Argo CD finalizers -> Understand how finalizers work and use them correctly for application deletion and migration.
  26. Not understanding resource tracking -> Understand how Argo CD tracks and adopts Kubernetes resources.
  27. Creating “active-active” installations of Argo CD -> Avoid active-active setups, rely on Git and resource tracking for disaster recovery.
  28. Recreating Argo Rollouts with Argo CD and duct tape -> Use Argo Rollouts for progressive delivery and automated rollbacks.
  29. Recreating Argo Workflows with Argo CD, sync-waves and duct tape -> Use Argo Workflows for long-running tasks and complex process orchestration.
  30. Abusing Argo CD as a full SDLC platform -> Use a different system as a developer portal or promotion orchestrator.

Let us know in the comment section if we have missed any other questionable practices!

Happy deployments!

The post Top 30 Argo CD Anti-Patterns to Avoid When Adopting Gitops appeared first on Codefresh.

]]>
https://codefresh.io/blog/argo-cd-anti-patterns-for-gitops/feed/ 0
Abusing the Target Revision Field for Argo CD Promotions https://codefresh.io/blog/argocd-application-target-revision-field/ https://codefresh.io/blog/argocd-application-target-revision-field/#comments Fri, 01 Aug 2025 13:57:26 +0000 https://codefresh.io/?p=17144 In our big guide on how to use ApplicationSets for Argo CD applications, we explained the best practice of having a 3-level structure for all manifests with a clear distinction between Argo CD Application files and Kubernetes resource files. In that article, we also outlined several anti-patterns that we have seen in the wild, meaning […]

The post Abusing the Target Revision Field for Argo CD Promotions appeared first on Codefresh.

]]>
In our big guide on how to use ApplicationSets for Argo CD applications, we explained the best practice of having a 3-level structure for all manifests with a clear distinction between Argo CD Application files and Kubernetes resource files.

In that article, we also outlined several anti-patterns that we have seen in the wild, meaning questionable practices that might seem ok at first glance but are problematic in the long run both for developers and for Argo CD operators.
In this guide we want to expand “Antipattern 2 – Working at the wrong abstraction level” and focus on the targetRevision field of the Argo CD application manifest.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  ## DONT DO THIS
  name: my-ever-changing-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/example-org/example-repo.git  
    targetRevision: dev
    ## earlier it was "targetRevision: staging" and before that it was "targetRevision: 1.0.0",
    ## and even earlier it was "targetRevision: 1.0.0-rc"
    path: my-staging-app
    ## Previously it was "path: my-qa-app"
  destination:
    server: https://kubernetes.default.svc
    namespace: my-app

Using the targetRevision field as a poor-man’s promotion mechanism is a big trap that impacts both usability and auditability for your Argo CD applications.

How the targetRevision field works

The targetRevision field is part of the Application specification, which is the central Argo CD construct for deploying your GitOps applications. 

An Argo CD application describes a link between a Git repository and a Kubernetes cluster. At its most basic form you point Argo CD to the HEAD of a Git repository that contains all your Kubernetes manifests. For convenience, the targetRevision field allows you to define other values apart from HEAD and even select specific Helm versions if you point Argo CD to a Helm chart instead of a Git repository.

You can see all the possible options for tracking strategies at the official documentation page.

Our recommendation is to have “targetRevision: HEAD” in all your application sets and very sparingly use “targetRevision: v1.2.3” for Helm charts that define infrastructure applications that remain mostly static (unlike applications created by developers).

Here is a list of all the possible options and our recommendation:

Application TargetValueRecommended
Folder in GittargetRevision: HEADYes
Helm chart stored in GittargetRevision: HEADYes
Branch/environment nametargetRevision: devNo
Semantic version (Git tag)targetRevision: 3.4.*No
Semantic version (Helm chart)targetRevision: 3.4.*No
Git hash of a committargetRevision: 8aefceNo
Git tagtargetRevision: v2.4Only in special cases
Helm Chart in Helm repositorytargetRevision: v2.4Only for Infra charts

As always, our recommendation is for production usage of Argo CD in large organizations. On a small scale (e.g. homelabs) or with a small team you can obviously get away with any approach you choose.

The main problem we see with several teams is abusing the target revision field as a promotion mechanism for developer applications. We will explain the shortcomings of this approach and the advantages of our recommendation.

Recommendation: Setting TargetRevision to Git HEAD

Before exploring all the alternative options let’s set the baseline and see why our recommendation of using the default HEAD value is the proper one. This scenario is where environments are based on Helm values and Kustomize overlays (or Git folders).

We will compare the following aspects of each approach:

  • Direct changes to code and simplifying day-to-day operations
  • Helping developers understand how and when an application is deployed
  • Enable easy auditing (one of the main benefits of GitOps)
  • Addressing break-glass scenarios and urgent production hotfixes

Setting the value to HEAD and instructing Argo CD to look at the latest version of the manifests/charts/overlays contained in that folder is the most direct and most straightforward approach for developers. It makes any deployment a single step. A developer can update the Kubernetes files and immediately see the change in any affected environment.

The same is true for hotfixes or rollbacks. Developers can change the status of an environment with the familiar git revert and git reset commands.

In fact this is the only setup where developers don’t need to know what Argo CD does at all. We have already explained that developers don’t really care about Argo CD manifests, and keeping them happy by allowing deployments without tampering with Argo CD manifests is the optimal process for them.

Auditing is also as simple as possible. Since Argo CD continuously tracks what is committed in the Git repository of the Kubernetes resources, the deployment history is the SAME as Git history.

The advantages of using Git history as deployment history cannot be overstated. In a large organization and with large numbers of environments, when something breaks the first questions everybody asks are always the same:

  1. What did we change in this environment?
  2. When did the change happen and who did it?
  3. What was the previous version that worked correctly?

Answering these questions quickly is a considerable advantage, especially during an active incident when timing is critical.

In summary, using HEAD for targetRevision is the solution that is fully GitOps compliant (as far as auditing is concerned), easy for developers to use, and flexible enough to cover any possible edge case scenarios and urgent hotfixes. 

Let’s compare it with the additional tracking options that Argo CD offers.

Avoid using environment names for targetRevision

The first approach that we can dismiss right away is using branch names for environments (dev, qa, staging, prod etc).

This is the practice where teams either point specific Argo CD applications to long-running Git branches or they use the targetRevision field to “mimic” promotion in the following way:

  1. An application is first pointed to the “dev” branch of the manifests using targetRevision
  2. Then the Application manifest is updated to point to the “qa” branch of manifests
  3. Then, finally, the targetRevision field is set to “staging” or whichever branch is the one before production.
  4. The cycle starts again

First, if you use the targetRevision field for long-running static branches, you have bigger problems than environment promotions. We have written a complete guide with all the details about the problems with the branch-per-environment approach.

The biggest problem with this approach, however, is that it completely misses all the benefits of auditing that come from GitOps.

If you constantly update the targetRevision field to different branch names it is tough (if not impossible) to reason about the history of your deployments.

In our baseline scenario of using HEAD, if you want to find what was running in a specific application last Thursday, you can go to your Git history and see which commit was active on the respective Git repository. This is a single-step process that anyone can complete in less than a minute.

If you use branch names in the targetRevision field, looking at history now becomes a multi-step process:

  1. First, you need to go to the Git repository that holds the application manifest and find the git commit that was done last Thursday
  2. Read the application file and see which branch was inserted in the targetRevision (let’s say it was “dev”).
  3. Then you need to go to the Git repository of the Kubernetes manifests and the dev branch, and see what was committed last Thursday

This process is more complex and prone to errors, as you manually need to correlate different git repos and commits for different functionalities (application manifests vs. Kubernetes manifests). Humans should not have to do this under pressure at 3 a.m. (when an incident often happens). 

Auditing deployments becomes even more chaotic if you use Application Sets. First, you would have to checkout the Git repos (with the correct revisions) and use the argocd cli to recreate how your application set looked last Thursday before you actually reach the appropriate Kubernetes resources that existed in the cluster.

What completely breaks down this process is all the “temporary fixes” that developers will make if you allow them. A widespread pattern we see is the temporary change of the targetRevision field to another branch “just for testing purposes.”

For example, a team that deploys typically its QA environment from the “qa” branch will often point the ArgoCD application to the “staging” branch to debug an issue that occurred in staging using the resources from the QA branch.

Or, several times, teams create ad-hoc preview environments by pointing different environments to feature branches from individual developers. This is even worse as developer branches can be removed at any time.

In summary, using branch names for the targetRevision field is a very complex process that presents many issues regarding auditing and deployment history. If you work in an organization that has specific financial and legal requirements, you will spend a lot of time trying to keep the auditors happy and manually reconstructing what and where was deployed in the past.

Avoid using semantic ranges in targetRevision

The targetRevision field can also work with version ranges for both Git tags and Helm versions. This sounds great in theory. You enter a value like 3.4.x and then Argo CD will automatically deploy 3.4.0, 3.4.1, 3.4.2 and so on.

You think you have solved the promotion problem with Argo CD but in reality you just introduced two major issues that are especially important to developers.

The first problem is that you have now missed all of GitOps’s auditing capabilities. If you use this pattern, you don’t really have a deployment history in the Git repository with the Argo CD manifests.

If somebody asks what you had deployed in a past moment, you cannot see this information in the Git repository anymore. You need to correlate information between versions from your artifact manager that holds Helm charts or locate the dates where each Git tag was created. This is a cumbersome process and again it is not something that you want to do during an incident.

If your organization has strict legal requirements, using version ranges in targetRevisions will complicate auditing on a magnitude of order.

But the biggest problem is that you can no longer roll back to a previous version. As Argo CD honors semantic version rules, it will only deploy newer versions of an application. This becomes a big problem when critical issues are found in production.

  1. You have 3.4.x as a value in targetRevision
  2. Version 3.4.2 is deployed right now in production
  3. You create a new Git tag with version 3.4.3
  4. Argo CD deploys it and it has a critical issue
  5. You cannot really go back to 3.4.2 anymore in an automated way

You would have to manually edit the application file and change the targetRevision to 3.4.2 yourself. Then, remember to switch it back to 3.4.x when the issue with 3.4.3 is fixed and 3.4.4 is released.

With this requirement you just forced 2 extra commits that humans have to do during incidents (where you usually want to avoid manual steps).

This process also leaves room for human error. People might forget to switch the targetRevision field back to semantic versioning and wonder why new releases are not getting deployed anymore.

In summary, while version ranges look like an easy way to gain “free promotions”, the problems they create are more impactful than their benefits. The same issues apply if you use semantic versioning for Helm charts.

Avoid using Git hashes in targetRevision

This is the case when an organization wants the “safest” process possible and forces all environments to point to a specific Git hash.

Git hashes

This is a truly locked system, as Argo CD will not deploy anything new anymore. We see this pattern often in financial companies and other companies that want to restrict developers to the greatest extent.

First of all, let’s clarify the perceived “safety” of this approach. As the Argo CD documentation clearly states, even if you set up an Argo CD application with a specific Git Hash, parameter overrides will still take effect. We don’t recommend using parameter overrides, but this means that a person can still affect this application configuration (either by mistake or on purpose). 

So don’t assume that using a specific hash in the targetRevision field is a bulletproof method for “securing” your Argo CD applications.

On the other hand, you have ruined the developer experience for all your teams. Nothing gets deployed unless somebody also changes the targetRevision. Developer self-service is not possible at all. Every time developers create a new release, another human or system must update the targetRevision field.

The experience is even worse during incidents. We already know that developers don’t care about Git hashes, so understanding what is deployed where becomes a lengthy and error-prone process. Rolling back is also super difficult as the only way to do it is by switching the targetRevision.

Git hashes are immutable. This means that it is impossible for an external system (e.g. CI) to make any changes to your manifests or understand what needs to be deployed next. Instead, the external system (or human) must always work in 2 steps:

  1. First, somebody needs to commit the new version of the manifests with application updates (i.e. bumping the container image)
  2. Then you also need to obtain the new Git Hash to put into targetRevision and do a separate commit

It is impossible to do both tasks in a single step as the Git hash from the first action is needed in the second one.

If you assign this responsibility to humans, you just introduced manual steps into your deployment process. If you use an external system, you just added complexity for the sake of complexity.

In summary, using Git hashes in the targetRevision field goes against all DevOps principles and significantly slows down deployments. Using this approach in non-production environments is always a sign that the organization doesn’t really trust its own developer teams.

Use (if needed) Git tags in targetRevision

The last choice for the targetRevision field is to use numbered Git Tags.

This is an acceptable practice, but we recommend it only for locking down production environments. It is better than using plain hashes, as with Git versions, developers can understand where each version is deployed. However, it still suffers from all the issues mentioned in the previous section. Developers cannot deploy or rollback on their own, and extra effort is required during incidents. Like Git hashes, Git tags are considered immutable, meaning you need special care to update them when a new release occurs.

Using Git versions as the target revision in production environments is a good practice if you want to fully control what goes into production. But don’t use this technique for non-production environments. Even if your organization is under legal restrictions, there is no point in making the lives of developers (especially for their QA/staging/dev environments) difficult. 

Therefore, our recommendation is:

  1. Use a specific Git Tag in the targetRevision field ONLY in production environments
  2. Use the simple HEAD tracking method in every other environment.

This gives you the best of both worlds. Developers can deploy fast in non-production environments and can change versions at will. But when it comes to production, they need a human (or external system) to actually update the targetRevision field to a new version.

Note, however, that Git tags can be deleted and recreated with different contents. So, unless you set proper permissions in your Git repository, using a specific Git tag does not guarantee that your environment is locked down. If the same tag gets associated with a different Git hash, Argo CD will happily redeploy the application.

Use specific chart versions for Infrastructure charts

The targetRevision field can also accept specific versions for Helm charts. This is a good pattern to follow but only for infrastructure Helm charts (coreDNS, sealed-secrets, prometheus etc.). Basically you should pin your Helm charts only if all the following apply:

  • The chart is stored in a Helm repository (and not in Git)
  • The chart represents off-the-shelf software and not something your developers create
  • You never “promote” these charts from one environment to another. They just represent infrastructure applications.

If you use Helm charts for your own applications (the ones your developers create) then follow the advice of the first section of this guide. Put them in Git and use HEAD in the targetRevision field. This will give you all the benefits of easy updating, history as auditing, and following the GitOps principles.

It is worth mentioning that Helm versions are mutable by default. Unless your artifact manager has specific support for this, a developer can push a different Helm chart with the same version and override the contents of the previous one. So even if you set targetRevision for a Helm chart to version 2.3.4 it doesn’t mean that it has the same contents of chart 2.3.4 as it was last week unless you configure Helm chart versions as immutable. Some developers prefer to bump only appVersion without also bumping the chart version if they have never changed anything in the chart itself.

Summary

We have now seen all the choices for the targetRevision field and examined the advantages and disadvantages.

The HEAD tracking method is the simplest, most direct, and flexible for developers. It makes incident response as painless as possible, as rolling back can be performed by anyone with simple Git commands. It also allows developers to self-serve their needs. It makes auditing straightforward. We recommend using HEAD as the value in the targetRevision field whenever possible.

For production environments, we understand if you are using a specific Git tag/version. However, employ this approach sparingly and only for systems where you want to “restrict” developers. Make sure you understand all the limitations of this approach. Tags are mutable by default, and you have introduced two extra steps in all your deployment processes.

We strongly recommend against using branch/environment names in the targetRevision field. It completely breaks GitOps history and makes auditing a nightmare.

We recommend against using version ranges. Again, you lose all the benefits of GitOps auditing.

We also recommend against using Git hashes. Especially in non-production environments, it slows down your developers.

Happy deployments!

The post Abusing the Target Revision Field for Argo CD Promotions appeared first on Codefresh.

]]>
https://codefresh.io/blog/argocd-application-target-revision-field/feed/ 6
GoodRx Releases Lifecycle Solution for Ephemeral Developer Environments with Built-in Support for Codefresh Pipelines https://codefresh.io/blog/goodrx-releases-lifecycle-solution-ephemeral-environments/ https://codefresh.io/blog/goodrx-releases-lifecycle-solution-ephemeral-environments/#respond Tue, 08 Jul 2025 11:48:28 +0000 https://codefresh.io/?p=17115 GoodRx, a digital healthcare platform, has released the Lifecycle project as open-source code. Lifecycle is a complete solution for temporary/ephemeral environments. The project’s build process includes built-in support for Codefresh pipelines. Creating preview environments from a Pull Request  with Lifecycle Lifecycle was conceived as an internal project back in 2019, and today it is released […]

The post GoodRx Releases Lifecycle Solution for Ephemeral Developer Environments with Built-in Support for Codefresh Pipelines appeared first on Codefresh.

]]>
GoodRx, a digital healthcare platform, has released the Lifecycle project as open-source code. Lifecycle is a complete solution for temporary/ephemeral environments. The project’s build process includes built-in support for Codefresh pipelines.

Creating preview environments from a Pull Request  with Lifecycle

Lifecycle was conceived as an internal project back in 2019, and today it is released to the world as a fully open-source project available at https://github.com/GoodRxOSS/lifecycle 

The project covers two very popular scenarios for medium-sized developer teams.

  1. Creating a complete temporary environment with the contents of a pull request
  2. Creating some services with the contents of a pull request while still using several dependencies from a shared/staging environment

Lifecycle comes with its own abstraction for service definitions. You can see the full syntax in the documentation page

This file (lifecycle.yaml) allows developers to define several microservices that take part in the application and their dependencies.

When a developer creates a Pull Request, Lifecycle understands all the dependencies and their changes and launches preview/ephemeral environments either for all services or only those selected by the developer.

Even though Lifecycle includes a simple Graphical User Interface, developers can use the Pull Request itself to see what is happening.

When a GitHub project is augmented with Lifecycle, a smart comment is added on each Pull request that shows the state of the preview environment. From the same comment, developers can enable or disable specific microservices and even redeploy the entire environment by clicking on checkboxes.

Once the Pull request is merged (or closed) the temporary environment shuts down on its own. You can see a full demo of the developer experience in a Youtube recording.

How Lifecycle works

Lifecycle itself is a self-hosted application available as a Helm chart. Currently, it supports Google Cloud and Amazon Web Services. You need to install Lifecycle in a Kubernetes cluster. If you are using Terraform/OpenTofu you can easily bootstrap everything required with an example Git repository.

Once you install the Lifecycle GitHub app your developers are ready!

There are many ways to define how environments are created.  You can choose to auto-create an environment for each Pull request or require a specific label before a deployment happens.

Developers follow their usual workflow.

  1. First they create a feature locally
  2. Once ready, they commit and push to a Pull Request
  3. They can view their feature in isolation in an environment specific to a Pull request/branch
  4. They can choose to accept or discard a Pull request at any point in time.

Lifecycle essentially supercharges your Pull requests because, in addition to the usual checks (unit tests, code coverage, security scans), they now show a URL with the application running for live verification. Apart from developers, other teams (testers or database administrators) will find this functionality very useful for quick manual tests or other checks that require deploying the end result.

Why Lifecycle is different

Preview environments are a well-accepted practice in the software industry and several approaches exist for solving this problem. We have actually offered our own advice both for plain Helm applications as well as GitOps workflows with Argo CD.

What distinguishes Lifecycle from the competition is the “fallback” mechanism it offers. Sometimes creating a full replica of the whole application is either too costly or too complex. Especially for teams that have adopted microservices, the usual scenario is that changes exist only in a subset of the services while the rest are still on the latest version.

Lifecycle allows you to define a fallback/static environment that will take effect for services that the developer does not select. This means that the developer now has the full power to select only a subset of services to participate in the Pull request, while still using the latest stable version for everything else.

This static/shared environment is also handled from a specific Pull request again by Lifecycle. This means that developers can choose to examine new features in individual branches and when they feel confident they can move them to the shared static environment with a simple merge. 

We have seen several tools that excel in one scenario (launching everything) or the other (keeping a shared testing environment), but Lifecycle is a unique tool that focuses equally on both use cases.

If you want to try Lifecycle get started at the official documentation. 

The post GoodRx Releases Lifecycle Solution for Ephemeral Developer Environments with Built-in Support for Codefresh Pipelines appeared first on Codefresh.

]]>
https://codefresh.io/blog/goodrx-releases-lifecycle-solution-ephemeral-environments/feed/ 0
How we replaced the default K8s scheduler to optimize our Continuous Integration builds https://codefresh.io/blog/custom-k8s-scheduler-continuous-integration/ https://codefresh.io/blog/custom-k8s-scheduler-continuous-integration/#comments Mon, 07 Jul 2025 11:57:20 +0000 https://codefresh.io/?p=17099 The default Kubernetes scheduler works great when your cluster is destined for long running applications. At Codefresh we use our Kubernetes clusters for running Continuous Integration pipelines which means our workloads are ephemeral (they are discarded when a pipeline has finished). This allowed us to look at the Kubernetes scheduler from a different perspective and […]

The post How we replaced the default K8s scheduler to optimize our Continuous Integration builds appeared first on Codefresh.

]]>
The default Kubernetes scheduler works great when your cluster is destined for long running applications. At Codefresh we use our Kubernetes clusters for running Continuous Integration pipelines which means our workloads are ephemeral (they are discarded when a pipeline has finished).

This allowed us to look at the Kubernetes scheduler from a different perspective and forced us to think about how Kubernetes can work for short-running workloads. After trying to fine-tune the default scheduler for running CI pipelines, we decided that it was best to write our own scheduler designed specifically for our needs.

In this post, we will describe why the default scheduler is not a good choice for ephemeral workloads and how we replaced it with a custom scheduler that meets our needs.

Codefresh pipelines – build your code on Kubernetes

Codefresh pipelines are powerful tools with a simple yet powerful syntax and many capabilities that can optimize your container or application builds. Beyond pipeline syntax, there are less discussed operational aspects of running Codefresh pipelines in hybrid or on-prem scenarios.

Every day we run lots of builds both for ourselves and SAAS customers. Build start time is one of the most immediately noticeable aspects of user experience. Nothing spoils the first impression about our platform quite like builds that are stuck in the initialization stage for a prolonged period of time. There is a psychological threshold when “it’s normal” turns into “it’s kinda slow” and then into “is it broken?”. Since we want to provide a pleasant experience to our customers we want to avoid slowness as much as possible.

There are naive ways to brute force this issue, but we want to stay cost-efficient in our solutions, so we had to be a little bit more inventive than throwing resources at the problem. A big part of our solution is realizing that the default behavior of the Kubernetes scheduler is sensible for regular web applications, but is completely suboptimal for CI/СD workloads.

At Codefresh, we implement measures to address both latency and cost considerations. Some of those are useful if you run builds on your infrastructure. In this article, we will cover those topics by explaining the problem, designing the solution, and implementing it all.

We focused on two main areas

  1. The time it takes to start a build
  2. Reducing the cost of the infrastructure that runs our builds

Let’s see these in order.

Build start latency

As a starting point we want to minimize the time it takes to start a pipeline run in order to offer a better user experience.

Behind the scenes Codefresh builds are mapped to Kubernetes Pods, so naturally the question of scheduling comes up. Both in terms of “how to start builds as fast as possible?” and “how to minimize the costs associated with running builds?”.

The build cannot start until the pipeline Pod(s) are up and running. Typically, there is spare capacity on the Nodes dedicated to running the build, but it’s not always the case: if we keep creating new builds faster than they are completed, we are bound to reach cluster capacity limits.

Sooner or later builds would be delayed by build Pods being stuck in the Pending status. It might take several minutes for the autoscaler to react and provision new Node(s) for a surge of new builds. Then new Nodes must pull Codefresh images and start all required containers inside the Pod. During all that time the user sees their build in the “Init” phase, probably getting increasingly frustrated.

The solution – avoid waiting for empty nodes completely

To combat this bad user experience, we’ve implemented a concept of ballast: Pods that mimic build Pods in terms of scheduling preferences but with greatly reduced priority. Pod priority affects the K8s scheduler’s behavior. Usually, if there is not enough capacity in a cluster to run a Pod, it will enter a Pending state, and then it’s up to autoscaler or human operators to provision additional capacity for it.

In this illustration a Pod waits for a second node to be created to finally land on it.

But that is not a complete picture. In fact, before transitioning a Pod into the Pending state, the scheduler compares the new Pod’s priority to existing ones and if there are any with lower priority then the scheduler will evict those to make space for new Pod.

Typically this mechanism is used to ensure critical components like various DaemonSets (marked red in the illustration above) are always up and running. There are 2 built-in PriorityClasses: system-cluster-critical and system-node-critical to facilitate that. They have a very large positive priority value that should be enough to kick out any regular pod.

It turns out that we can employ this basic scheduling mechanism to our advantage. 

We can use the same mechanism in reverse by creating a PriorityClass with a very large negative value to ensure that Pods with this class will be “total pushovers” and concede their spot to any other Pod if needed. 

We call those Pods a “ballast”. These are placeholder pods with the sole purpose of getting discarded by the scheduler when a real pod appears.

We can create ballast Pods (marked blue in the illustration above) in a Node pool dedicated to running builds. When the node eventually fills up, the next build Pod will evict a ballast Pod and land in its place. The evicted ballast Pod will become Pending and trigger new Node creation. 

From the cluster’s view the picture is the same: A Pod is created, the Pod enters the Pending state, and a new Node is provisioned. 

But from the end user perspective there is a key difference. The “real” build pod starts working immediately and the placeholder/ballast pod is the one that has to wait and enters the Pending state.

Users see their pipeline start right away!

Implementation

In the release 7.6.0 of our cf-runtime Helm chart we’ve added a ballast section that allows users to enable ballast for both dind and engine Pods. Under the hood those are Deployments that copy nodeSelector, affinity, tolerations and schedulerName of respective build Pods to perfectly mimic their scheduling behavior.

All you need to do is enable them, and set the amount of replicas and the resources of an individual replica:

ballast:
 dind:
   enabled: true
   replicaCount: 3
   resources:
     requests:
       cpu: 3500m
       memory: 7800Mi
     limits:
       cpu: 3500m
       memory: 7800Mi
 engine:
   enabled: true
   replicaCount: 3
   resources:
     requests:
       cpu: 100m
       memory: 128Mi
     limits:
       cpu: 100m
       memory: 128Mi

In this example we create a “ballast” setting that can handle a build spike of up to 3 builds. Ballast resources are set to the same values as runtime.dind.resources (targeting xlarge EC2 instance size) and runtime.engine.resources respectively.

 There are some considerations and rules of thumb for picking those values:

  1. Pick ballast Pod size equal to the default build Pod size
  2. Pick replica count not greater than the typical build spike size

If the ballast Pod is equal to the default build Pod size, then most of the time there will be one eviction per one new build, which makes it easier to reason about the number of replicas.

At the same time the ballast count should not be higher than the expected spikes we want to accommodate: if we create at most 10 builds at a time, but have 30 ballast pods then those 20 remaining replicas will never be of use to us and just idly burn through our infra/cloud budget.

The maximum value from this setup would be achieved by setting the replica count to the most common spike size: all spikes up to the common size will be fully accommodated while bigger spikes will still benefit from the setup.

Another factor to consider is cluster autoscaler reactivity. If it can provision a node in 10 minutes, then in general you want to have a bit bigger ballast compared to a more reactive autoscaler that provisions a new node in under a minute.

Ultimately your ballast setup is a tradeoff between convenience and cost: we host idle Pods for our builds to start faster. If ballast causes cost concerns, one might want to scale the ballast setup dynamically. It’s possible to set the number of replicas to zero in chart values and add an external HPA, for example use KEDA with Cron scaler to effectively remove ballast outside of office hours, where we don’t care about build start times (since they are most likely created by some form of automation and not humans).

Cost-efficient scheduling

After we optimized the start time of our builds, we wanted to look at cost efficiency.

Running pipelines requires considerable computing resources, so being as efficient as possible when scheduling build Pods is a major concern for operators.

There is a significant difference between running something like a web application expressed in Kubernetes terms as Deployment and a CI/CD pipeline, which is more like a Job object. 

Deployments are scalable and can afford disruption, which actually happens during every release that updates image tags in a rolling fashion. That allows operators to run those workloads on Spot instances or aggressively scale down underutilized nodes, pushing replicas to other nodes. 

On the other hand, CI/CD pipelines do not work in this manner. They are not stateless and disrupting them while they are running is a scenario we want to avoid.

With Job-like workloads that implement pipelines we need to patiently wait for their completion, and only then can we scale down the node that hosted this Job.

This trait creates a harmful dynamic that we’ve observed in our clusters that run customer builds. Below you can see a snapshot of a node pool from one of our clusters (made with eks-node-viewer):

Those clusters tend to run half-empty, costing us extra money.

This problem is especially exaggerated right after a spike in the amount of submitted builds. During a build spike, the cluster autoscaler will create new nodes to accommodate workloads.

In the illustration below, you can see a cluster right after the build spike, where the majority of those builds have finished and the cluster is half-empty. After those builds finish, there will always be a small stream of builds that trickle in and are evenly spread across the nodes.

As you can see in the illustration below, overall resource utilization is very small, but no single node can be scaled down.

Solution – fine tune the scheduler for job-like workloads

The Root cause of this problem is two-fold:

  1. Job-like workloads are non-disruptible
  2. The default Kubernetes scheduler strives for even load spread

We cannot do anything about the first aspect, but the second one is entirely in our control.

The default scheduler’s behavior is reasonable for most applications but suboptimal for CI/CD pipelines. To prevent builds from taking nodes hostage, we want to pack them tightly, filling nodes one by one instead of spreading builds evenly across the nodes. Any solution that fills small nodes first gets bonus points.

We want  our scheduling algorithm to look like this:

This way, we give big and mostly empty nodes a chance to complete a few remaining jobs and retire from the cluster. In this particular example, nodes 2 and 3 will most likely be scaled down very soon.

Implementation

To change the scheduler’s behavior we need to implement a custom scheduler. Kubernetes allows for multiple schedulers to exist in a cluster and Pods can specify the desired scheduler via the “schedulerName” field which defaults to boringly named “default-scheduler”.

WARNING: It’s important to make sure that all Pods on a given node pool are managed by a single scheduler to avoid conflicts and evict/schedule loops. 

For the purpose of running Codefresh builds the recommendation is to have a dedicated tainted node pool and add matching tolerations to Codefresh pods.

We need to create an umbrella chart over the scheduler-plugins Helm chart, since it doesn’t provide the required level of flexibility. You can find our Helm chart for our scheduler plugin at https://github.com/codefresh-contrib/dind-scheduler/tree/main/dind-scheduler 

Here is the important part of values.yaml

schedulerConfig:
   score:
     enabled:
       # pick smallest node fitting pod
     - name: NodeResourcesAllocatable
       weight: 100
       # in case of multiple nodes of the same size,
       # resolve a tie in favor of most allocated one
     - name: NodeResourcesFit
       weight: 1
     disabled:
     - name: "*"


 pluginConfig:
 - name: NodeResourcesAllocatable
   args:
     mode: Least
     resources:
     - name: cpu
       weight: 1
 - name: NodeResourcesFit
   args:
     scoringStrategy:
       type: MostAllocated

If we focus on the core of scheduler logic in plain English it sounds like:

  1. Pick the smallest eligible node possible
  2. If there are multiple nodes of the same size, use the fullest node in terms of allocated CPUs

This way we minimize the time big half-empty nodes are held hostage by trickling builds.

The only thing left is to set the scheduler name in runtime values:

runtime:
 dind:
   schedulerName: dind-scheduler

Now you can describe any DinD pod to validate that the correct scheduler is used:

Events:                                                
  Type    Reason                  Age    From          
  ----    ------                  ----   ----          
  Normal  Scheduled               39s    dind-scheduler

After rolling out this custom scheduler this is how the same node pool looks after we’ve changed the scheduler’s behavior: nodes became as tightly packed as possible.

This means that now we pay for nodes we actually use to their fullest capacity keeping our cloud costs down.

Conclusion

In this article we’ve learned how to start Codefresh builds fast and run them cheaply. We have used nothing but built-in Kubernetes concepts revolving around the scheduler:

Feel free to use those resources to further tailor this solution to your needs. If you are a Codefresh customer you should also realize why your builds are much faster now!

The post How we replaced the default K8s scheduler to optimize our Continuous Integration builds appeared first on Codefresh.

]]>
https://codefresh.io/blog/custom-k8s-scheduler-continuous-integration/feed/ 2
Configuring Slack notifications with Argo Workflow – a learning experience https://codefresh.io/blog/configuring-slack-notifications-argo-workflow/ https://codefresh.io/blog/configuring-slack-notifications-argo-workflow/#respond Tue, 24 Jun 2025 18:06:52 +0000 https://codefresh.io/?p=17061 The acquisition of Codefresh gave me an exciting opportunity to learn new tech. Initially, I thought Argo was just Argo CD. I didn’t realize that Argo consists of 4 distinct projects: A key feature of the Codefresh product is Promotion Flows, which makes heavy use of Argo Workflows.  Promotion Flows add the ability to assign […]

The post Configuring Slack notifications with Argo Workflow – a learning experience appeared first on Codefresh.

]]>
The acquisition of Codefresh gave me an exciting opportunity to learn new tech. Initially, I thought Argo was just Argo CD. I didn’t realize that Argo consists of 4 distinct projects:

A key feature of the Codefresh product is Promotion Flows, which makes heavy use of Argo Workflows.  Promotion Flows add the ability to assign Pre and or Post Actions to the process called Promotion Workflows, which are Argo Workflows with some annotations added.  To better understand Promotion Flow capabilities, I decided to create a workflow so I can see how it works and put it into action.  In this post, I go over the project I undertook.

The problem to solve

To get this project underway, I needed a problem to solve. I read that Kubernetes worked with external secret providers, but I hadn’t ever used it myself.  Having a local instance of HashiCorp Vault running, I tried including a Slack notification in a Codefresh Promotion Flow, which pulled the secrets for Slack from HashiCorp Vault in a just-in-time (JIT) fashion. It’s worth noting that while I created this to work as a Promotion Workflow, this also works as a standard Argo Workflow.

Prep work

To prepare, I first configured:

  • A Slack channel to work with
  • Vault to work with Kubernetes authentication/authorization

Slack channel

To post a message to Slack, I created an App.  I used the Slack API bot token guide to create an App quickly.  

Installing the App generated a Bot User OAuth Token. This retrieves secrets from Vault.

Vault

For the Vault instance, I needed 2 things for this project:

  • Some secrets to retrieve
  • To allow a Kubernetes Service Account to access my Vault instance

Vault Secrets

You can create secrets in Vault via the UI, CLI, or through an API call.  My Vault instance runs in a container without any data persistence, so I created a quick PowerShell script to automate it.

# Get variable values
$slackChannel = <Slack channel name>
$slackToken = <Slack OAuth token>
$vaultToken = <HashiCorp Vault token>
$header = @{ "X-Vault-Token" = $vaultToken } 

# Create Hashtable
$jsonPayload = @{
	data = @{  
		SLACK_CHANNEL = $slackChannel
        SLACK_TOKEN = $slackToken
    }
}

$jsonPayload | ConvertTo-Json -Depth 10

Invoke-RestMethod -Method Post -Uri "http://<Vault URL>:8200/v1/secret/data/slack" -Body ($jsonPayload | ConvertTo-Json -Depth 10) -Headers $header

Kubernetes Service Account authentication and authorization

There are a few files I had to create to configure the integration between my Kubernetes cluster and my HashiCorp vault instance:

  • vault-auth-service-account.yaml
  • vault-auth-secret.yaml
  • configmap-json.yaml
  • vault-policy.hcl (this one is generated in the script)

vault-auth-service-account.yaml

This file will create a new Kubernetes Service Account, then assign permissions to it.

# Uncomment if using vanilla Argo Workflows
#apiVersion: v1
#kind: ServiceAccount
#metadata:
#  name: vault-auth
#  namespace: argo
#---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: role-tokenreview-binding
  namespace: codefresh-gitops-runtime # Change namespace to either your GitOps Runtime namespace or argo if you're using vanilla Argo Workflows
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: cf-default-promotion-workflows-sa # Change to vault-auth is using vanilla Argo Workflows
  namespace: codefresh-gitops-runtime  # Change namespace to either your GitOps Runtime namespace or argo if you're using vanilla Argo Workflows

vault-auth-secret.yaml

The Service Account being used will need a token created to perform the authentication operations.  This token is what is used with Vault so it can authenticate and retrieve secrets.

apiVersion: v1
kind: Secret
metadata:
  name: vault-auth-secret
  namespace: codefresh-gitops-runtime  # Change namespace to either your GitOps Runtime namespace or argo if you're using vanilla Argo Workflows
  annotations:
    #kubernetes.io/service-account.name: vault-auth # Uncomment this line and comment out the next if using vanilla Argo Workflows
    kubernetes.io/service-account.name: cf-default-promotion-workflows-sa
type: kubernetes.io/service-account-token

configmap-json.yaml

This file creates a configMap resource that the Vault container uses to connect your Vault instance.  The template section defines what you want the container to do.  In this case, I’m writing JSON to a file with the contents of the secrets we retrieved.

apiVersion: v1
data:
  vault-agent-config.hcl: |
    # Comment this out if running as sidecar instead of initContainer
    exit_after_auth = true

    pid_file = "/home/vault/pidfile"

    auto_auth {
        method "kubernetes" {
            mount_path = "auth/kubernetes"
            config = {
                role = "argo"
            }
        }

        sink "file" {
            config = {
                path = "/home/vault/.vault-token"
            }
        }
    }

    template {
    destination = "/etc/secrets/slack.json"
    contents = <<EOT
    {{- with secret "secret/data/slack" }}
    {
        "SLACK_CHANNEL": "{{ .Data.data.SLACK_CHANNEL }}",
        "SLACK_TOKEN": "{{ .Data.data.SLACK_TOKEN }}"
    }
    {{ end }}
    EOT
    }
kind: ConfigMap
metadata:
  name: vault-agent-config
  namespace: codefresh-gitops-runtime # Change namespace to either your GitOps Runtime namespace or argo if you're using vanilla Argo Workflows

vault-policy.hcl

This file grants the Service Account permissions in Vault so it can read secrets.  This example grants the Service Account read and list permissions to any secret. In a real-world situation, I’d limit what the Service Account has access to. This is just a simple example to get started.

path "secret/data/*" {
  capabilities = ["read", "list"]
}

Helper script

Since I was doing this over and over while testing, I wrote a PowerShell script to automate the process.

  # Reference: https://developer.hashicorp.com/vault/tutorials/kubernetes/kubernetes-external-vault
  #            https://developer.hashicorp.com/vault/tutorials/kubernetes/agent-kubernetes

# Declare working variables
$vaultUrl = "http://<Vault URL>:8200"
$vaultToken = "<Vault Token>"
$namespaceName = "<Namespace>"
$serviceAccountName = "<Service Account Name>"

# Set environment variables
$env:VAULT_ADDR = $vaultUrl
$env:VAULT_TOKEN = $vaultToken

# Create the Kubernetes service account and secret
kubectl apply -f vault-auth-service-account.yaml
kubectl apply -f vault-auth-secret.yaml

# Get the secret
$secret = (kubectl get secrets -n $namespaceName --output json | ConvertFrom-Json)
$secret = ($secret.Items | Where-Object {$_.metadata.name -eq "vault-auth-secret"})

# Get JWT token
$jwtToken = (kubectl get secret $secret.metadata.name --output 'go-template= {{ .data.token }}' -n $namespaceName)
$jwtToken = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($jwtToken))

# Get CA certificate 
$saCaCRT = (kubectl config view --raw --minify --flatten --output 'jsonpath={.clusters[].cluster.certificate-authority-data}' -n $namespaceName)
$saCaCRT = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($saCaCRT))

# Get cluster hostname
$k8sHost = (kubectl config view --raw --minify --flatten --output 'jsonpath={.clusters[].cluster.server}')

# Create read-only policy for kubernetees
$vaultPolicy = @"
path "secret/data/*" {
  capabilities = ["read", "list"]
}
"@

Set-Content -Path .\vault-policy.hcl -Value $vaultPolicy

.\vault policy write k8s-ro $PWD/vault-policy.hcl

# Enable Kubernetes authentication in Vault
.\vault auth enable kubernetes

# Configure the Kubernetes authentication
.\vault write auth/kubernetes/config `
token_reviewer_jwt="$jwtToken" `
kubernetes_host="$k8sHost" `
kubernetes_ca_cert="$saCaCRT" `
issuer="https://kubernetes.default.svc.cluster.local"

# Create a role for the Kubernetes authentication
.\vault write auth/kubernetes/role/argo `
bound_service_account_names=$serviceAccountName `
bound_service_account_namespaces=$namespaceName `
token_policies=k8s-ro `
ttl=24h

# Create config map for agent
kubectl apply -f configmap-json.yaml

This tutorial from HashiCorp contains the same commands used in this post, but in bash format.

Workflow template

If you haven’t worked with Argo Workflows before, this template may look intimidating. I’ll break it down by section to make it more digestible.

# DO NOT REMOVE the following attributes:
# annotations.codefresh.io/workflow-origin (identifies type of Workflow Template as Promotion Workflow)
# annotations.version (identifies version of Promotion Workflow used)
# annotations.description (identifies intended use of the Promotion Workflow)
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: slack-notification
  annotations:
    codefresh.io/workflow-origin: promotion
    version: 0.0.1
    description: promotion workflow template
spec:
  arguments:
    parameters:
        - name: APP_NAME
        - name: RUNTIME
  serviceAccountName: cf-default-promotion-workflows-sa
  entrypoint: vault-auth
  volumes:
    - configMap:
        items:
          - key: vault-agent-config.hcl
            path: vault-agent-config.hcl
        name: vault-agent-config
      name: config
    - emptyDir: {}
      name: shared-data
  templates:
    - name: vault-auth
      steps:
        - - name: get-slack-data
            template: call-vault
        - - name: post-slack-message
            template: post-message
            arguments:
              parameters:
                - name: SLACK_CHANNEL
                  value: >-
                    {{=jsonpath(steps['get-slack-data'].outputs.parameters['slack-data'],
                    '$.SLACK_CHANNEL')}}
                - name: SLACK_TOKEN 
                  value: "{{=jsonpath(steps['get-slack-data'].outputs.parameters['slack-data'], '$.SLACK_TOKEN')}}"
                - name: SLACK_MESSAGE
                  value: "{{workflow.parameters.APP_NAME}} promotion has started on runtime {{workflow.parameters.RUNTIME}}"

    - name: call-vault
      container:
        command:
          - vault
        args:
          - agent
          - '-config=/etc/vault/vault-agent-config.hcl'
          - '-log-level=debug'
        env:
          - name: VAULT_ADDR
            value: http://<Vault URL>:8200
        image: hashicorp/vault
        name: vault-agent
        volumeMounts:
          - mountPath: /etc/vault
            name: config
          - mountPath: /etc/secrets
            name: shared-data
      outputs:
        parameters:
          - name: slack-data
            valueFrom:
              path: /etc/secrets/slack.json

    - name: post-message	# we also have an existing plugin at https://github.com/codefresh-io/argo-hub/blob/main/workflows/slack/versions/0.0.2/docs/post-to-channel.md
      inputs:
        parameters:
          - name: SLACK_CHANNEL
          - name: SLACK_TOKEN
          - name: SLACK_MESSAGE
      script:
        image: curlimages/curl
        command:
          - sh
        source: |
          curl -vvv -X POST -H "Authorization: Bearer {{inputs.parameters.SLACK_TOKEN}}" \
          -H "Content-type: application/json" \
          --url https://slack.com/api/chat.postMessage \
          --data "{ 'token': '{{inputs.parameters.SLACK_TOKEN}}', 'channel': '{{inputs.parameters.SLACK_CHANNEL}}', 'text' : 'Workflow beginning:star:', 'attachments': [{'color': '#ADD8E6','blocks': [ { 'type': 'section', 'fields': [{ 'type': 'mrkdwn', 'text': '{{inputs.parameters.SLACK_MESSAGE}}'}] } ] }]  }" 

Kind

The Kind for the Workflow manifest is a WorkflowTemplate using the api of argoproj.io/v1. You need the annotations applied to this template to designate this as a Promotion Workflow in Codefresh. This allows it to display on the correct dashboards.  If you’re using vanilla Argo Workflows, you can remove the annotations section.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: slack-notification # Name of the workflow
  annotations: # Codefresh annotations
    codefresh.io/workflow-origin: promotion
    version: 0.0.1
    description: promotion workflow template

Spec

Spec is where we define  parameters, volumes, and the entrypoint.  There are additional components broken down in the comments.

spec:
  arguments:
    parameters: # Define parameters for the workflow
      - name: APP_NAME # APP_NAME and RUNTIME are automatically set when used within a promotion
      - name: RUNTIME
  serviceAccountName: cf-default-promotion-workflows-sa # The service account used during workflow execution
  entrypoint: vault-auth # Name of the first template to call
  volumes: # Defines workflow wide volume usable by all templates and steps
    - configMap: # This config map is specific to the Vault work we'll be doing, it references the config map we created in the HashiCorp Vault configuration steps
        items: # This section specifies that we're going to write the config map contents to a file which the Vault container will use as a configuration
          - key: vault-agent-config.hcl
            path: vault-agent-config.hcl 
        name: vault-agent-config
      name: config # Name of the volume
    - emptyDir: {} # Creates an empty directory
      name: shared-data # Name of the volume

Templates

The templates section defines the different templates used during the execution of the workflow. This example contains 3 templates:

  • vault-auth:  This is the name we specified in the entrypoint of the spec section and the beginning of execution. This template calls the other two.
  • call-vault: This is the template that will perform the call from the cluster to Vault to retrieve the secrets and produce output parameters.
  • post-message: This template posts the message to Slack and will receive the secret as an input parameter.

The – – syntax indicates the step executes sequentially; a single means it executes in parallel with the previous step.

templates:
    - name: vault-auth
      steps:
        - - name: get-slack-data
            template: call-vault
        - - name: post-slack-message
            template: post-message
            arguments:
              parameters:
                - name: SLACK_CHANNEL
                  value: >-
                    {{=jsonpath(steps['get-slack-data'].outputs.parameters['slack-data'],
                    '$.SLACK_CHANNEL')}}
                - name: SLACK_TOKEN 
                  value: "{{=jsonpath(steps['get-slack-data'].outputs.parameters['slack-data'], '$.SLACK_TOKEN')}}"
                - name: SLACK_MESSAGE
                  value: "Test message"


    - name: call-vault
      container:
        command:
          - vault
        args:
          - agent
          - '-config=/etc/vault/vault-agent-config.hcl'
          - '-log-level=debug'
        env:
          - name: VAULT_ADDR
            value: http://<Vault URL>:8200
        image: hashicorp/vault
        name: vault-agent
        volumeMounts:
          - mountPath: /etc/vault
            name: config
          - mountPath: /etc/secrets
            name: shared-data
      outputs:
        parameters:
          - name: slack-data
            valueFrom:
              path: /etc/secrets/slack.json


    - name: post-message
      inputs:
        parameters:
          - name: SLACK_CHANNEL
          - name: SLACK_TOKEN
          - name: SLACK_MESSAGE
      script:
        image: curlimages/curl
        command:
          - sh
        source: |
          curl -vvv -X POST -H "Authorization: Bearer {{inputs.parameters.SLACK_TOKEN}}" \
          -H "Content-type: application/json" \
          --url https://slack.com/api/chat.postMessage \
          --data "{ 'token': '{{inputs.parameters.SLACK_TOKEN}}', 'channel': '{{inputs.parameters.SLACK_CHANNEL}}', 'text' : 'Workflow beginning:star:', 'attachments': [{'color': '#ADD8E6','blocks': [ { 'type': 'section', 'fields': [{ 'type': 'mrkdwn', 'text': '{{inputs.parameters.SLACK_MESSAGE}}'}] } ] }]  }" 

The result

After going through some trial and error learning how everything functions, I successfully posted a message to my designated channel!

Slack notification showing the workflow successfully beginning.

Conclusion

I needed something to help me understand how Argo Workflows and Codefresh Promotions Workflows worked. Setting up my own project with a specific purpose demystified not only how they worked, but also how to construct a Promotion Workflow myself. I hope this post helps you in the same way it helped me.

Happy deployments!

The post Configuring Slack notifications with Argo Workflow – a learning experience appeared first on Codefresh.

]]>
https://codefresh.io/blog/configuring-slack-notifications-argo-workflow/feed/ 0
Laser Focused Kubernetes Deployments Using Argo Rollouts and Header Based Routing https://codefresh.io/blog/argo-rollouts-header-based-routing/ https://codefresh.io/blog/argo-rollouts-header-based-routing/#respond Mon, 23 Jun 2025 13:39:25 +0000 https://codefresh.io/?p=17067 A Kubernetes cluster with default configuration has access to only two deployment strategies: To get access to more advanced deployment strategies such as blue/green and canaries you need to use a dedicated Progressive Delivery controller such as Argo Rollouts.  We have previously covered several basic and advanced scenarios for Argo Rollouts in our blog. Today […]

The post Laser Focused Kubernetes Deployments Using Argo Rollouts and Header Based Routing appeared first on Codefresh.

]]>
A Kubernetes cluster with default configuration has access to only two deployment strategies:

  • Recreate (causes downtime)
  • Rolling Update (avoids downtime but you cannot preview or validate the next version in advance)

To get access to more advanced deployment strategies such as blue/green and canaries you need to use a dedicated Progressive Delivery controller such as Argo Rollouts

We have previously covered several basic and advanced scenarios for Argo Rollouts in our blog. Today we answer another common question which is how you can select which of all live users will have access to the canary deployment. 

As a reminder, with a canary deployment, you gradually shift traffic to your live users for the new version’s pods. The canary is finished when 100% of live users see the new pods or when something goes wrong and you revert all of them to the previous/stable version.

In the example above, we start the canary by shifting 20% of network requests to the v2 container, then 50%, and finally 100%. The key point here is that unless you do something special, the network traffic requests that go to the new version are random. Some users might even see both application versions if you are not careful.

In the real world you almost always want specific groups of people to be part of the canary process. Some examples are:

  • “Only our internal users should see this new version”
  • “Only French users must be part of the canary”
  • “Asia should stay in the old version, the US will see the canary only”
  • “Only users who have checked the ‘preview checkbox’ must see the canary”
  • “The payment gateway should stay in the old version. The intranet should see the new version”

So can we still use Argo Rollouts to cover these use cases? The answer is yes, and in this guide we explain two approaches, one basic and one advanced and also compare the advantages and disadvantages. 

The methods we are going to use to decide which users see the canary are 

  1. Static Routing with extra URLs (limited but simple to implement)
  2. Header based Routing (more powerful but also more complex to implement)

If you want to try the examples on your own, all resources are available at https://github.com/kostis-codefresh/rollouts-header-routing-example 

Understanding the blast radius of your deployments

The central promise of Argo Rollouts is automatic rollbacks. You deploy a new version and then within 1-2 hours (ideally 15 minutes) either the new version is promoted as stable or it is automatically reverted. 

This sounds great in theory, but in practice, you need to understand who will be affected if a deployment fails. Let’s say you are doing canary deployments and need 1 hour to get good metrics to decide about the new version’s health. If the metrics fail, some users will have issues for 1 hour. Is this acceptable? Could you control which users face the disruption and who never participate in the canary?

If you read the official Argo Rollouts documentation, the assumption is that the Rollout controller only focuses on a single application.

In a big organization most services have dependencies. This is especially true for companies that have adopted microservices. So instead of looking at a single service independently, you need to understand how the application works inside the whole cluster. 

A more realistic example would be the following:

Here we have an e-shop application with different kinds of users

  • External partners can interact with the inventory of the application
  • Internal/Intranet users provide customer support and handle the store management
  • The general public accesses the public website to order/buy items.

If we choose Progressive Delivery for the “auth” service shown in the middle, we see that even though it is a single service, it is a runtime requirement for 3 other services (portal, admin, store). So even if we apply a canary approach, a failed deployment will affect all users of our application.

Therefore if you need 2 hours for a canary deployment , and you have a failure then ALL your users will be affected for 2 hours. Wouldn’t it be nice if you could control which user groups are affected and which are not?

Isolating specific users instead of random network requests

Making a decision about which users see the canary process and which do not is only one aspect of the deployment process. The other aspect is verifying whether a user is part of the canary. Then, all network requests should always be directed to the preview/canary version of the application.

A widespread misconception about Argo Rollouts is that integrating with a traffic provider allows you to send specific users to the canary version. Unfortunately, this is not true in the default configuration.

Even if you use a traffic provider, the percentage of requests that go to the canary application is completely RANDOM. If you set up Argo Rollouts with a canary step of 30%, Argo Rollouts will only guarantee that 30% of all network requests will go to the canary process. But there is no guarantee that these are from the same users.

This leads to a very common problem for several organizations: Multiple requests from the same user result in different application versions (both old/stable and preview/canary).

In the example of 30%, Argo Rollouts will indeed send 30% of the total network requests to the canary version, but if you look at the network requests for a single user, you might have the case where the first request is not part of the 30%, the next one is, the next is not and so on. This limitation can be catastrophic for applications with a Graphical Interface, as the user might see different components on screen with each subsequent network request (if the canary version also affects the application’s UI).

In the real world, companies don’t want a random percentage of requests to go to the canary version. You want to apply the percentage to individual users.

The expectation is that if you set up a canary of 30%, you expect 30% of users to see the canary and 70% of them are still on the old/stable version. If you log the network requests of a single user however, you want all of them to go to EITHER the canary version OR the stable version and never both.

So can Argo Rollout support this use case of user segmentation instead of network request segmentation?

Example application – Visualize your canary

Our example Rollout can be found at https://github.com/kostis-codefresh/rollouts-header-routing-example/ 

This repository includes 

The highlight of the example application is that you can see visually which requests hit one version or the other.

In the screenshot above, a canary is in progress between the application’s v1 and v2. The dashboard performs multiple requests (one for each box shown), allowing you to examine your canary networking in a very simple way.

Approach 1 – Static URL routing

Let’s start with the first use case—reducing the blast radius from a failed deployment. The solution is to create a different URL for each group of users who participate in the deployment process.

We have 3 URLs:

  1. The canary/default URL where requests are routed to the canary. Users of this URL will follow the canary traffic as it increases
  2. A URL that ALWAYS sends requests to the canary/preview version regardless of the defined percentage
  3. A URL that ALWAYS sends requests to the stable/old version regardless of the defined percentage.

Instead of having a single URL for the canary, we can give each user group a different URL according to their risk acceptance.

In the previous example of the e-shop application we can easily accommodate the following imaginary requirements:

  1. We want our external partner never to see the canary at all. They will be shown the stable version until the last possible moment
  2. We want our public users to be part of the canary process as normal
  3. We want our own employees to “see” the new version right away so that they can detect problems as early as possible

In this example we use different URL paths, but we could do the same thing with hostnames (e.g. canary.auth.com, preview.auth.com, stable.auth.com).

Now when a canary process is started

  • Users who follow the /stable endpoint will always see the old/stable application version
  • Users who follow the /preview endpoint go to the new version straight away
  • Users who follow the /canary endpoint participate in the canary as usual.

Here is a timeline for each user group. Blue indicates that network requests go to the old/stable version, and green indicates that they go to the new canary version.

The end result for all groups is precisely the same. They see the new version of the application. The big difference is in failed deployments. If a deployment fails and the canary reverts, users that follow /stable (external partners in our example above) have no impact at all.

Instead of affecting everybody, we have completely isolated our external patterns and also have a different risk acceptance between the general public and our own internal users.

Implementing this approach with Argo Rollouts is straightforward. Instead of using just one network endpoint (for the canary), you create an additional one pointing at the stable service and one more for the preview service.

If you run our example, you can now access 3 URLS (/canary, /stable, /preview). If you start a canary process only the /canary will gradually move to the new version.

Users who visit /preview will see the new version right away:

Users who visit /stable will always view the stable version regardless of the state of the canary process:

This approach needs no source code changes and can be implemented quickly. It has however three significant limitations:

  • It is static in the sense that you need to decide in advance which user groups will visit which URL
  • You need to notify all dependent services about the new URLs if they don’t want to follow the default behaviour
  • It still works at the level of network requests instead of actual users

Approach 2 – Dynamic URL routing

The main limitation of static routing is that you need to identify which user group will use which service in advance. Once you make this selection, you cannot change it after the canary has started.

We still haven’t addressed the problem of users versus requests. In the case of the canary endpoint a random number of requests can see the canary instead of real users.

Using HTTP headers instead of simple endpoints can improve network isolation. Argo Rollouts can detect optional HTTP headers and make decisions accordingly.

In the example above Argo Rollouts will send to the canary all requests that have an HTTP header “X-Canary:true”.

Now we have the capability to have canaries for users instead of just networks. We can modify our application source code to enable this header on the fly.

All requests for users with this header present will be redirected to the canary user. This user group will always see the canary, so this approach works great even for Graphic applications.

HTTP headers are fully dynamic. You can change them on the fly. There are several networking products such as load balancers, api gateways, service meshes that allow you to inject headers or modify headers in a network request.
We can activate this pattern by creating a standard HTTP route and then instructing the canary to create a second one on the fly only if a specific header exists.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: smart-rollouts-demo
spec:
  replicas: 5
  strategy:
    canary:
      canaryService: smart-canary-service
      stableService: smart-stable-service
      trafficRouting:
        managedRoutes:
        - name: always-preview
        plugins:
          argoproj-labs/gatewayAPI:
            httpRoutes:
              - name: my-smart-route
                useHeaderRoutes: true
            namespace: default
      steps:
        - setHeaderRoute:
            name: always-preview
            match:
              - headerName: X-Canary
                headerValue:
                  exact: "yes"  
        - setWeight: 25                        
        - pause: {}
        - setWeight: 100

If you launch the application and this HTTP header is not present, you will see a standard canary with both versions.

If you activate the header then all requests of this user go to the canary!

This setting is now per user. If you open another browser (to simulate a different user), you will see the standard canary behavior again.

In this simple example, the application itself controls the HTTP header. In a real application, a networking component might do this for you (for example, adding this header only to French users).

Conclusion

In this guide we have seen two additional approaches to decide which users can look at the new version during a canary instead of a random percentage of requests (default behavior for Argo Rollouts)

With these approaches

  • We have complete control over the impact of a failed deployment. We can choose user groups that will always be redirected to the stable version even when a canary is in progress
  • Splitting user groups according to their risk acceptance can be decided in advance or updated on the fly
  • Canary behavior is now per user instead of per network request

In both cases, you might need to make source code changes to use a different endpoint or enable/disable a specific HTTP header.

The post Laser Focused Kubernetes Deployments Using Argo Rollouts and Header Based Routing appeared first on Codefresh.

]]>
https://codefresh.io/blog/argo-rollouts-header-based-routing/feed/ 0
Distribute Your Argo CD Applications to Different Kubernetes Clusters Using Application Sets https://codefresh.io/blog/argocd-clusters-labels-with-apps/ https://codefresh.io/blog/argocd-clusters-labels-with-apps/#comments Tue, 17 Jun 2025 10:44:02 +0000 https://codefresh.io/?p=17027 In the previous article in this series, we explained how Argo CD application Sets work and how to use them for organizing your applications in different environments or groups. We received a lot of positive feedback from our readers, and many teams now use the associated Git repository as a starting point for their own […]

The post Distribute Your Argo CD Applications to Different Kubernetes Clusters Using Application Sets appeared first on Codefresh.

]]>
In the previous article in this series, we explained how Argo CD application Sets work and how to use them for organizing your applications in different environments or groups. We received a lot of positive feedback from our readers, and many teams now use the associated Git repository as a starting point for their own Argo CD setup.

Even though we covered Application Sets, and more specifically the Git generator, we never explained how to assign different applications to different clusters. This is a common question from teams managing multiple clusters with different application settings per environment.

In this article, we complete the Application Set puzzle and analyze:

  • How to decide which application goes to which cluster
  • How to have different application settings per environment
  • How to split your clusters into different groups with cluster labels
  • How to combine the Argo CD Git Generator with the Cluster generator
  • How you can simplify your day-to-day operations using cluster labels. 

For more details, we’ve again included an example Git repository.

Managing multiple Kubernetes clusters with Argo CD

Argo CD ApplicationSets let you automate your Application manifests in Argo CD. If you adopt ApplicationSets, you no longer need to deal with individual Argo CD applications’ YAML. You can simply point Argo CD to your clusters and folders, and all the possible combinations get created on the fly for you.  

We’ve already seen that you can use ApplicationSets to deploy multiple applications on a single cluster.

We’ve also seen the other dimension—how to deploy the same application to different clusters:

In this guide, we cover the most complex scenario where we have multiple applications and multiple clusters.

To achieve this scenario, we need to use the Cluster Generator of Argo CD. This means you need to connect all your clusters to a single Argo CD instance. This is the hub-and-spoke setup of Argo CD. See our Argo CD architecture guide for different configurations and the advantages and disadvantages of each one.

Using a combination of the cluster and the Git generator, we can create a 2-dimensional matrix of all the pairs (cluster-app) and have Argo CD deploy everything with a single file.

This approach is a great starting point, but in a real organization, we need 2 more capabilities:

  1. The ability to enable/disable some applications for some clusters
  2. The ability to have different configurations (for example, Helm values) according to the cluster the application belongs to.

The final result is not a full 2-D matrix because some applications won’t exist in all environments. We want to achieve this:

In the example above, Sealed Secrets is NOT present in Cluster C. And the Cert manager is not present in Cluster A. In addition, the “Billing Application” needs to have a different configuration for each cluster.

So, can we achieve these requirements with Application Sets?

Anti-pattern – Creating Snowflake servers with ad-hoc combinations

When faced with the problem of distributing different applications to different clusters, many teams jump straight into very complex solutions that combine multiple Application Set generators. Unfortunately, most hard code custom combinations in the application set files.
A classic example of this approach is trying to individually enable/deactivate a specific application for a particular cluster. We advise AGAINST using such Application Set structures.

 ## DO NOT DO THIS 
- merge:
      mergeKeys:
        - app
      generators:
        - list:
            elements:
              - app: external-dns
                appPath: infra/helm-charts/external-dns
                namespace: dns
              - app: argocd
                appPath: infra/helm-charts/argocd
                namespace: argocd
              - app: external-secrets
                appPath: infra/helm-charts/external-secrets
                namespace: external-secrets
              - app: kyverno
                appPath: infra/helm-charts/kyverno
                namespace: kyverno
        - list:
            elements:
              - app: external-dns
                enabled: "true"
              - app: argocd
                enabled: "true"
              - app: external-secrets
                enabled: "false"
              - app: kyverno
                enabled: "true"
    selector:
      matchLabels:
        enabled: "true"

This file creates snowflake/pet servers where you need to define exactly what they contain. The final result is brittle, requiring significant effort when any major change happens. There are several challenges with this setup:

  • It works directly on individual clusters (instead of cluster groups, as we’ll see later in the guide), so it never scales as your requirements change.
  • It forces you to hardcode application combinations inside Application Sets. This makes the generators your new unit of work instead of your Kubernetes manifests.
  • It makes all day-2 operations lengthy and cumbersome procedures.
  • It makes reasoning about your clusters super difficult. Understanding what’s deployed where is no longer trivial.

The final two points cannot be overstated. This approach might look ok at first glance, but the more clusters you have, the more complex it will become.

  1. If somebody asks which clusters contain kyverno, you need to scan all individual files for the “enabled” property of the “kyverno” line.
  2. Every time you add a new cluster to your setup, you need to copy/paste the list of components from another cluster and start enabling/deactivating each individual component. If you have many components and many clusters, this is an error-prone process that you should avoid at all costs.
  3. If you add a new component, you need to go to all your existing files and add it to all the enabled/deactivated lists.
  4. It only addresses the first requirement (enabling/deactivating applications for clusters) but not the second one (having different configurations per cluster for the same application).

There is a better way to distribute applications to Argo CD clusters. The approach we DO recommend is using cluster generator labels.

Working with cluster groups instead of individual clusters

In a large organization, you don’t really care about individual clusters. You care about cluster groups. Argo CD doesn’t model the concept of a cluster group on its own, but you can replicate it using cluster labels.

You need to spend some time thinking about the different types of clusters you have and then assign labels to them.

The labels can be anything that makes sense to your organization

  • Environment types (for example, QA/staging/prod)
  • Regions or countries
  • Department or teams
  • Cloud provider or other technical difference
  • Any other special configuration that distinguishes one or more clusters from the rest

After you have those labels, you can slice and dice your cluster across any dimension and start thinking about cluster groups instead of individual cluster labels.

Because, ultimately, 99% of use cases resolve around cluster groups rather than individual clusters.

  • “I want all my production clusters to have application X with Y settings.”
  • “I want all my AWS clusters to have X authentication enabled.”
  • “Team X will control this environment while team Y will control that environment.”
  • “All European clusters need this application.”
  • “Application X is installed on both US-East and US-West regions, but with different configurations.”
  • “Just for our QA environment, we need this load testing app deployed.”

We’ll see in detail all the advantages when using cluster labels, but one of the easiest ways to understand the flexibility of this approach is to examine what happens for a very common scenario—adding a brand new cluster.

In most cases, a new cluster is “similar” to another cluster. A human operator needs to “clone” an existing cluster, or at the very least, define the new properties of the new cluster in the configuration file.

If you use cluster labels (as we suggest), the whole process requires zero modifications to your application set files.

  1. You create the cluster with your favorite infra tool (Terraform/Pulumi/Crossplane, etc)
  2. You assign the labels on this cluster (for example, it’s a new QA cluster in US East)
  3. Finished!

Argo CD automatically detects this new cluster when it collects all its clusters, and deploys everything that needs to be deployed in the cluster according to its labels. There’s no configuration file to edit to “enable/deactivate” your apps. The process cannot get any easier than this.

Notice that this setup also helps with communication between developers and operators/infrastructure people. Opening a ticket for a new cluster and having several discussions about the contents of the new cluster significantly slows down development time.

Manual cluster creation

In most cases, developers want a cluster that either mimics an existing one or has similar configuration to another cluster group. This makes your job very easy, as you can map directly to cluster labels what developers need.

Creating a new cluster can be a hectic process because you need to validate that it matches the expected workloads and is “similar” to your other clusters. If you use cluster labels, then Argo CD takes care of everything in minutes instead of hours.

Organizing your Argo CD clusters with different labels

Let’s see how all our use cases can work together with a semi-realistic example. You can find all Argo CD manifests at https://github.com/kostis-codefresh/multi-app-multi-value-argocd if you want to follow along.

The repository contains:

Here are the 7 clusters that we define with K3d. In a real organization, these clusters would be created with Terraform or another similar tool.

All clusters

We’ve assigned several example labels on those clusters. Notice that even before talking about applications, the clusters themselves exist in 2 dimensions:

  • A promotion flow (QA -> staging -> production) on the horizontal axis
  • A region setting (US/EU/Asia) on the vertical axis

The “hub” cluster contains the Argo CD instance that manages all the other clusters. In our example, this cluster only has Argo CD and no end-user applications, so it doesn’t take part in our application sets (it has a label type=hub instead of type=workload).


You can verify or change the labels of each cluster by the Cluster Secret in the main Argo CD instance. Here’s an example of a QA cluster that shows the assigned labels as created by our example GitHub repository.

apiVersion: v1
data:
  [...snip..]
kind: Secret
metadata:
  annotations:
    managed-by: argocd.argoproj.io
  labels:
    argocd.argoproj.io/secret-type: cluster
    cloud: gcp
    department: billing
    env: qa
    region: eu
    type: workload
  name: cluster-k3d-qa-eu-serverlb-1347542961
  namespace: argocd

We’re now ready to look at some typical scenarios. It’s impossible to cover all possible use cases, so we’ll see some representative scenarios for each use case.

The major question that you need to ask yourself is whether you want to deploy an application across different environments with the exact same configuration OR you want a different configuration per environment. The latter is obviously more complex and requires a good understanding of your Kustomize Overlays and Helm value hierarchies, but it’s closer to how a real organization works:


Here are the scenarios we’ll see:

ScenarioTypeConfiguration
1 – “workload clusters”Plain ManifestsSame across all environments
2 – “GCP only”Plain ManifestsSame across all environments
3 – “Europe only”Plain ManifestsSame across all environments
4 – “Production/Asia”Plain ManifestsSame across all environments
5 – “QA US and EU”KustomizeSame across all environments
6 – “Production EU/US”KustomizeDifferent per environment
7 – “QA US and EU”HelmSame across all environments
8 – “Europe Only”HelmDifferent per environment
9 – “Production EU/US/Asia”HelmDifferent per environment

Notice that in our example repository, our applications are grouped in folders by type: manifests, Kustomize, or Helm apps.

In a real organization, you might have different sub-folders for each type, but it’s simpler if you only have to manage one type of application (for example, Kustomize for your own developers and Helm charts for external applications).

Scenario 1 – Run some applications on all workload clusters

Let’s see a very simple use case. We want to deploy all the following common applications to all our clusters only, excluding the Argo CD “hub” cluster. We can take advantage of the “workload” label and point Argo CD to a folder that has all our common applications.

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: simple-apps/*
        - clusters:    
            selector:
              matchLabels:
                type: "workload"

You can see the full Application Set at 01-common-apps.yml. This file instructs Argo CD to:

  1. Gather all connected clusters that have the “type=workload” label
  2. Gather all the Kubernetes manifests found under “simple-apps”
  3. Create all the combinations between those clusters and those apps
  4. Deploy the resulting Argo CD applications.

If you’re not familiar with generators, please read our Application Set Guide. If you deploy this file, you’ll see the following:

We got 18 applications (6 clusters multiplied by 3 apps) in a single step. Isn’t this cool? 

Scenario 2- Choose only GCP clusters and exclude those in AWS

In the next example, we want to install all the applications under `simple-apps` folder only in our Google Cloud clusters, but those applications should not exist in our Amazon clusters. Again, we have created the appropriate labels in advance. In our imaginary organization, all non-production servers run in GCP.

Choose GCP only clusters

The admin server also runs in AWS, and this is why it won’t get picked up by our application set. You can find the full manifest at 02-gcp-only.yml.

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: simple-apps/*
        - clusters:    
            selector:
              matchLabels:
                type: "workload"    
                cloud: "gcp"

This Application Set is similar to the previous one, but now we’re matching 2 labels—one for Google Cloud and one for all our “workload” clusters.

If you apply it, you get several applications, but only to non-prod environments.

Argo CD created a list of applications for only the QA and Staging cluster groups, as they contain clusters that run on Google Cloud.

Scenario 3 – Choose only European Clusters

The big power of labels will become clear when you get requirements that need to work with clusters in an unusual or non-linear way. Let’s imagine a scenario where you need to do something specific to all European clusters because of GDRP regulations.

At this point, most teams realize that the primary way of organizing their clusters was by type (qa/staging/prod), and they modelled the region as a secondary parameter. This creates several challenges and makes people ask the same question, “Does product X support deployments in regions?”.

But when using the cluster generator, all labels are first-level constructs, allowing you to make any selection possible. We can focus on European clusters by just defining our region the same way as any other scenario.

Today, only the QA and Production environments have a European server. But tomorrow, you might add one in the Staging environment WITHOUT any modifications in your application Set.


We select all European servers by region with file 03-eu-only.yml

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: simple-apps/*
        - clusters:    
            selector:
              matchLabels:
                type: "workload"    
                region: "eu"  

Deploying this application set will instruct Argo CD to place all the  applications under simple-apps folder only in the European servers:

If you add a new European Region in the Staging environment, then in the next Argo CD sync, that cluster also gets all applications defined for Europe, with zero effort from the administrator.

Scenario 4 – Choose a specific cluster among a cluster group

If it wasn’t clear from the previous examples, the label selector for clusters works in an “AND” manner by default. So the more labels you add in the selector, the more specific the application set becomes.

This means that even if you really want to select a single cluster among a group, you can just define all the labels that correctly identify it.

We want to select the Asian Environment for Production (which is a specific Kubernetes cluster).

Choose prod asia only

The application set that selects this cluster is at 04-specific-cluster.yml.

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: simple-apps/*
        - clusters:    
            selector:
              matchLabels:
                type: "workload"  
                region: "asia"    
                env: "prod"    

The labels we have defined in the application set map only to one cluster. Argo CD will look at this application set and find all clusters that have type=workload AND region=asia AND env=prod. 

Applying the file you will see the following

As expected, Argo CD deployed all the applications under simple-apps folder only to the production cluster in Asia.

Scenario 5 –  Different Kustomize overlays for the QA clusters

For simplicity, in all the previous examples, all our applications use the same configuration across all clusters. So even if our cluster generator selected multiple clusters, they all used the plain manifests we defined. 

While this approach can work for some trivial applications, you almost certainly want to use a different configuration per cluster. This can take the form of DNS names, database credentials, security controls, rate limiting settings, etc.


For our next example, we’ll use Kustomize overlays. For each application, we have the base configuration plus extra settings in overlays or Kustomize components.

We have covered Kustomize overlays in detail in the promotion article and explained how they work with Argo CD in our Application Set guide, so make sure you read those first if you’re not familiar with overlays.

For the cluster selector, we’ll choose the QA environment this time (which corresponds to 2 clusters).

The application set that selects the QA clusters and deploys applications with the respective configuration is at 05-my-qa-appset.yml.

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: kustomize-apps/*/envs/qa  
        - clusters:    
            selector:
              matchLabels:
                type: "workload"    
                env: "qa"  

The Matrix generator selects all clusters that match the QA/Workload labels and applies only the applications that have a QA overlay.

Apply the file, and you see all QA deployments:

The important point here is that for each application, only the QA overlay is selected.

QA overlay selected

You can see in the Git repository that the “Invoices” application comes with configurations for all environments, but we appropriately employ only the QA one in our application set.

Scenario 6 – Different Kustomize settings for US and EU in production

There are many more examples we can show with this setup. Be sure to read the documentation of the cluster generator. One important point is that you can use the output of this generator as input to another generator.

As a final example with Kustomize, let’s see a scenario where we want to deploy our applications to Production Europe and Production US, but not in Asia.

Remember that by default, server labels work in “AND” mode. So if we simply list “us” and “eu” as labels, Argo CD will try to find all clusters that have both labels at the same time. We don’t want this, as no cluster matches this description. 

Also, unlike the previous example where we specifically asked for the “QA” overlay, now we want to choose the overlays that match whatever the cluster type/region is (either prod-us or prod-eu).


You can find the full application set at 06-my-prod-appset.yml.

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - clusters:    
            selector:
              matchLabels:
                type: "workload"      
                env: "prod"
              matchExpressions:
              - key: region
                operator: In
                values:
                  - "eu"
                  - "us"        
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: 'kustomize-apps/*/envs/{{.name}}'

The first thing to show here is the matchExpressions block. This lets you choose clusters in an “OR” manner. We want all clusters that are either EU or US AND in production.

The second point is using the output of the cluster generator as input to the Git generator. The “{{.name}}” variable will render to the name of the cluster matched, forcing the Git generator to load the respective Kustomize overlay for each environment.

Apply the file and you will see production deployment in EU and US but not in Asia:

And most importantly, you see that each server loads the configuration for its own region:

You should now understand how to select any combination of clusters and apply your exact choice of Kustomize overlays according to the “type” of each cluster.

Scenario 7 – A Helm hierarchy of values for the QA environment

Cluster labels can also work with your Helm charts and values.

As a starting example, let’s deploy our Helm charts to the two QA clusters using the same configuration for both.

You can find the full application set at 07-helm-qa-only.yml.

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: charts/*
        - clusters:    
            selector:
              matchLabels:
                type: "workload"    
                env: "qa"  

The generator part of the file selects our charts and applies them to all clusters with the QA/workload label.

We have 2 example charts in the Git repository, so Argo CD created 4 applications for us (1 for each region).

Scenario 8 – Different Helm values for the European environments

Like the Kustomize example, we want to make our examples more advanced and have different value files per environment. 

The same Git repository also contains a set of Helm values for each environment.

We have covered Helm value hierarchies and Argo CD applications in our Helm guide, so please read that guide first if you don’t know how to create your own value hierarchies.

Let’s deploy our Helm charts at all European servers:

This time, however, we want to specifically load the European values only instead of all values.


You can find the full application set at 08-helm-eu.yml

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - clusters:    
            selector:
              matchLabels:
                type: "workload"      
                region: "eu"                        
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: charts/*  

The generator part is straightforward. It applies all charts to clusters with the EU/Workload labels. The smart selection of values happens in the “sources” section of the generated application:

sources:
  - repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
    path: '{{.path.path}}'
    targetRevision: HEAD
    helm:
      valueFiles:
      - '$my-values/values/{{index .path.segments 1}}/common-values.yaml'  
      - '$my-values/values/{{index .path.segments 1}}/app-version/{{index .metadata.labels "env"}}-values.yaml'                
      - '$my-values/values/{{index .path.segments 1}}/regions/eu-values.yaml'              
      - '$my-values/values/{{index .path.segments 1}}/envs/{{index .metadata.labels "env"}}-eu-values.yaml'                  
  - repoURL: 'https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git'
    targetRevision: HEAD
    ref: my-values

Here we apply the appropriate values according to: 

  1. The chart name (index .path.segments 1)
  2. The environment label that exists on the cluster (index .metadata.labels “env”)

In this example, you see how you can query the cluster itself for its own metadata.

If you apply this file, you see both charts deployed in the European servers.

But most importantly, you see that each environment gets the correct values according to its type:

Notice that in both cases, we still have some common values that apply to both environments.

Scenario 9 – Different Helm values for all 3 Production regions

As a final example with Helm, let’s deploy to all production regions with the appropriate settings for each one.

We choose all 3 regions in our cluster generator.

You can find the full application set at 09-helm-prod.yml

Like before, we select all 3 regions in an “OR” manner and apply our charts.

spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - matrix:
      generators:
        - clusters:    
            selector:
              matchLabels:
                type: "workload"      
                env: "prod"
              matchExpressions:
              - key: region
                operator: In
                values:
                  - "eu"
                  - "us"  
                  - "asia"                      
        - git:
            repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
            revision: HEAD
            directories:
            - path: charts/* 

For each application, we query each cluster for its environments and region.

sources:
  - repoURL: https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git
    path: '{{.path.path}}'
    targetRevision: HEAD
    helm:
      valueFiles:
      - '$my-values/values/{{index .path.segments 1}}/common-values.yaml'  
      - '$my-values/values/{{index .path.segments 1}}/app-version/{{index .metadata.labels "env"}}-values.yaml'              
      - '$my-values/values/{{index .path.segments 1}}/env-type/{{index .metadata.labels "env"}}-values.yaml'  
      - '$my-values/values/{{index .path.segments 1}}/regions/{{index .metadata.labels "region"}}-values.yaml'              
      - '$my-values/values/{{index .path.segments 1}}/envs/{{index .metadata.labels "env"}}-{{index .metadata.labels "region"}}-values.yaml'                  
  - repoURL: 'https://github.com/kostis-codefresh/multi-app-multi-value-argocd.git'
    targetRevision: HEAD
    ref: my-values

All charts are now deployed in all regions:

You can also verify that each environment picks the correct settings from the value hierarchy:

You have now seen how to apply value hierarchies with Application Sets and cluster labels.

Day 2 operations

We now reach the most important point of this guide. We’ve seen how cluster labels let you  define exactly what goes into which cluster. You might be wondering why this is the recommended solution and how it’s better than other approaches you’ve seen.

The answer is that with cluster labels you treat your application sets in a “create-and-forget” function. After the initial set up, you shouldn’t need to touch your application sets at all. This means that maintenance effort is zero, which is always the OPTIMAL way of evaluating any architecture decision.

Let’s see some semi-realistic scenarios of using our recommendation in a real organization.

Imagine you just organized all your application sets with cluster labels. All files are committed in Git, and all applications are successfully deployed. Everything runs smoothly.

Scenario A – Removing a server

On Monday, you need to decommission the US/prod server. You remove the “us” and “prod” tags from the server. In the next sync, the cluster generator from all related appsets doesn’t pick it up and nothing gets deployed there. You don’t really care how many application sets touched this cluster. They will all stop deploying there automatically.


Changes you had to do in your Application Sets: ZERO

Scenario B – Deploying a new application

On Tuesday, a developer wants a new application in the QA environment. You commit a new overlay for QA configuration for that app. All QA application sets pick it up in the next sync and deploy it to any/all clusters that deal with QA. You don’t really care how many application sets affect QA or how many clusters are contained in QA. They will all get the new application in the same manner.


Changes you had to do in your Application Sets: ZERO

Scenario C – Adding a new Cluster

On Wednesday, you need to add a new cluster to replace the decommissioned one. You create the new cluster with Terraform/Pulumi/Crossplane/whatever and just assign it the appropriate tags (“us”,”prod”, “workload”,”aws”). All respective Application Sets see the new cluster in the next sync and deploy whatever needs to be deployed there. You don’t really care how many Application Sets touch this cluster. The cluster will get the exact same applications as it had before.


Changes you had to do in your Application Sets: ZERO

Scenario D – Copying an application

On Thursday, a developer says that a specific application that exists in staging also needs to go to QA.

You copy the staging overlay to a QA overlay for this application and ask the developer about the correct settings. In the next sync, all the QA Application Sets pick it up and deploy it. The developer doesn’t need to know anything about application sets or cluster labels. In fact, they could just do this deployment on their own if they had access to the Kustomize overlays.


Changes you had to do in your Application Sets: ZERO

Scenario E – Central cluster change

On Friday, you’re told that ALL your clusters now need sealed-secrets installed. You add a new configuration for sealed secrets in your “common” folder and commit it to Git. Then the  “Common” Application Set (that applies to all clusters) picks it up and applies it to all clusters.


Changes you had to do in your Application Sets: ZERO

Essentially, the Application Sets only need to change when you need to add another dimension to your servers (i.e., new labels) for something that was not expected. If you completed a proper evaluation in the beginning, and communicated to all parties how all the servers are going to be used, then this scenario won’t happen very often. For daily operations, the Application Sets just sit in the Git repository without anybody (operators or developers) having to make any changes at all.

The other big advantage of cluster labels is that they work the same, regardless of how many servers you have. The Application Sets that work with labels will automatically update on their own, even if they manage 1, 10, or 100s of servers that you connect to the central Argo CD instance.
Let’s compare those same scenarios with the approach we do NOT recommend, where application sets explicitly enable/deactivate components/apps in each specific server.

## Do not do this
- merge:
      mergeKeys:
        - app
      generators:
        - list:
            elements:
              - app: external-dns
                appPath: infra/helm-charts/external-dns
                namespace: dns
              - app: argocd
                appPath: infra/helm-charts/argocd
                namespace: argocd
              - app: external-secrets
                appPath: infra/helm-charts/external-secrets
                namespace: external-secrets
              - app: kyverno
                appPath: infra/helm-charts/kyverno
                namespace: kyverno
        - list:
            elements:
              - app: external-dns
                enabled: "true"
              - app: argocd
                enabled: "true"
              - app: external-secrets
                enabled: "false"
              - app: kyverno
                enabled: "true"
    selector:
      matchLabels:
        enabled: "true"

What actions do you need for each scenario?

  • Scenario A – Removing a server
    1. You need first to locate all Application Sets that “choose” this server.
    2. You then need to edit all application Sets and “deactivate” all the components they contain.
    3. You need to commit and sync all changes. 
    4. There is a risk that you either forgot an application set or forgot to “deactivate” a line
    5. The more servers you have, the more complex is the process
  • Scenario B – Deploying a new application
    1. You first need to locate all Application Sets that choose the servers that need this application.
    2. You need to edit all those application Sets and add a new line for this application.
    3. You need to commit and sync all changes.
    4. There’s a risk that you either forgot an application set or forgot to add a line for the new application.
    5. The more servers you have, the more complex the process.
  • Scenario C – Adding a new Cluster
    1. You need to understand how this cluster is “similar” to other clusters.
    2. You either need to create a new Application Set for this cluster or locate all Application Sets that touch it.
    3. You need to add all new lines of enabled/disabled components for this cluster.
    4. There’s a risk that you either forgot an application set or forgot to add a line for the new application.
  • Scenario D – Copying an application
    1. You need first to locate all Application Sets that choose the news servers for this application.
    2. You need to edit all those application Sets and locate the line for this application and change it to “enabled”.
    3. You need to commit and sync all changes.
    4. There’s a risk that you either forgot an application set or forgot to “enable” the component.
    5. The more servers you have, the more complex the process.
  • Scenario E – Central change
    1. You need first to locate all Application Sets that you manage.
    2. You need to edit all those application Sets and add a new line for this common application.
    3. You need to commit and sync all changes.
    4. There’s a risk that you either forgot an application set or forgot to add a line for the new application
    5. The more servers you have, the more complex the process.

It shouldn’t be a surprise that having snowflake servers where you must enable/deactivate each application individually is a much more complex process than working with cluster groups identified by labels.

Developers and self-service

At the start of this guide, we talked about effective communication with developers. Another major reason that makes cluster labels the optimal solution is that they’re fully automated. At each sync, the Argo CD cluster generator detects which clusters have the appropriate labels and does whatever needs to be done (deploy or undeploy an application).

Your developers don’t need to know anything about cluster labels. In fact, they don’t even need to know about Application Sets. Developers can work with a Git repository that holds standard Helm charts/Kustomize overlays/plain manifests, and their instructions are super simple:

  • If they add a new overlay in the QA folder, then that application will be deployed in the QA environments regardless of the number of servers.
  • If they delete an overlay, that application will be undeployed.
  • If they want a brand new application, they can just commit the new overlays or Helm values in a specific folder, and Argo CD will pick it up.

Developers can work on their own without opening any tickets or waiting for you to do something for them. 

This comes in complete contrast to the anti-pattern we explained above, where you manually enable/deactivate applications in each specific cluster. If you follow this approach, all actions become a two-step process:

  1. The developer adds their overlay or Helm values in a Git repository.
  2. Then you MUST go to all your Application Sets and manually “enable” the new application.

Preventing developers from deploying their applications and waiting for you to do something is the fastest way to create bottlenecks in your organization.

There is no reason for this complexity when using cluster labels is a much better choice.

Conclusion

In this guide, we explained in detail how to create cluster groups with Argo CD using custom labels. We have also seen:

  • How to use the cluster generator to select a cluster in an “AND” and “OR” fashion.
  • How to deploy applications to multiple clusters using the same configuration.
  • How to deploy Kustomize applications with different overlays per cluster.
  • How to deploy Helm applications with different value sets per cluster.
  • How to perform common day-2 operations with many Argo CD clusters.
  • Why our recommended approach is the optimal one, as the number of clusters and developers grows in your organization.

You can find all Application Sets and manifests at https://github.com/kostis-codefresh/multi-app-multi-value-argocd.

Happy labeling 🙂

The post Distribute Your Argo CD Applications to Different Kubernetes Clusters Using Application Sets appeared first on Codefresh.

]]>
https://codefresh.io/blog/argocd-clusters-labels-with-apps/feed/ 9
Why Environments Beat Clusters For Dev Experience https://codefresh.io/blog/why-environments-beat-clusters-for-dev-experience/ https://codefresh.io/blog/why-environments-beat-clusters-for-dev-experience/#respond Mon, 02 Jun 2025 08:41:54 +0000 https://codefresh.io/?p=17000 The cloud ecosystem has reached a turning point. Tools for operators/administrators are now mature and can handle most day-to-day operations that deal with Kubernetes clusters. Finally, we can turn our focus to application developers and their needs. If you look at all the Kubernetes tools available, you’ll understand that most of them treat Kubernetes as […]

The post Why Environments Beat Clusters For Dev Experience appeared first on Codefresh.

]]>
The cloud ecosystem has reached a turning point. Tools for operators/administrators are now mature and can handle most day-to-day operations that deal with Kubernetes clusters. Finally, we can turn our focus to application developers and their needs.

If you look at all the Kubernetes tools available, you’ll understand that most of them treat Kubernetes as another form of infrastructure. You can easily find tools that install Kubernetes, monitor Kubernetes, secure Kubernetes, do cost estimations for Kubernetes, etc. But how many Kubernetes tools can you find that target application developers and their day-to-day responsibilities?

Several companies even try to hide Kubernetes completely from developers by using leaky abstractions or so-called developer portals. These adoption efforts almost always fail simply because nobody asked the developers what they really need. Don’t fall into this trap. 

In this article, we see some common examples of what companies “think” about developers’ needs versus what developers need in practice, in the context of application development for Kubernetes.

Confusing clusters with environments

When designing a deployment workflow, most teams center the discussion around individual clusters. You often hear people talk about direct mappings between clusters and “environments”.

  • “This is our production cluster.”
  • “Our staging cluster is down.”
  • “We need a new cluster for QA.”

This forces all Kubernetes tools to expose clusters as first-level constructs in their user interface (UI). If your team has created any kind of dashboard for Kubernetes, I can bet that the left-panel navigation contains a “cluster” entry where people can look at individual clusters.

In reality, developers never care about individual clusters. Most times, they care about: 

  • Cluster groups that behave the same (e.g., prod-us, prod-asia, prod-eu).
  • A set of namespaces within a big cluster (most common in shared qa/staging clusters).
  • A combination of the above.

Developers have a different mindset:

  • “I am ready to ship this feature to production.”
  • “Oh, my new feature is failing in the QA environment.”
  • “That is strange, application 1.23 works ok in QA, but presents an error in the staging environment.” 

So, how many clusters are in production, QA, or staging? It does NOT matter to developers. Folks only think that developers care about environment settings and, more specifically, what differs between environment configurations.

This means that if your internal portal/developer platform looks like this:

You need to redesign it like this:

Let me repeat that again. Developers do NOT care about individual Kubernetes clusters. They mostly care about the different settings between the cluster groups that represent each “environment”.

Promotions are more important than deployments

We’ve established that developers prefer thinking about environments and not individual clusters. Let’s see another common misconception with tools that target developers. If you look at the most typical scenario of how a feature reaches production, this is the process:

  1. A developer performs an initial deployment in the first environment—let’s call it QA environment.
  2. After passing several tests and reviews, the feature gets promoted to the staging environment.
  3. After passing several tests, the feature gets promoted to production.
  4. Depending on the company, there might be several other intermediate environments where promotions happen (e.g., load testing).

Several tools promise to simplify deployments for Kubernetes developers. It turns out that developers are actually interested in promotions. There are 3 main reasons for that:

  1. A deployment (where code is packaged in a brand new artifact) happens only once in the first environment. In all the subsequent environments, developers want to promote an existing image/configuration/release/artifact and not deploy anything from scratch
  2. Problems with promotions have more impact. Promoting to production is always a risky process. Deploying to QA is not.
  3. Promotions can often fail due to external configuration not directly controlled by developers.

The last point cannot be overstated. One of the most common cases for failed deployments in production is the difference in environment configuration. Developers dutifully tested their application in QA and staging, and everything worked fine. Then the application failed to deploy in production because of an unexpected change in production settings completely outside the scope of the application container.

Continuing our wireframe from an imaginary developer portal, most teams think that developers need this:

The UI is problematic for many reasons:

  1. Developers can deploy an older version to any environment by mistake.
  2. Developers need to manually correlate versions between environments and understand what was in the next/previous environment.
  3. Most times, versions correspond to Docker image versions that exclude external configuration.

But as explained already, developers just want to promote. So they would prefer this:

Notice that this makes a developer’s job very easy. There are also several guardrails in place. At a minimum, the production drop-down can ONLY promote what is currently in staging and nothing else (or maybe the last 3 versions that are available in staging). You can also perform checks and present warnings in several scenarios (for example, if a developer tries to promote something to production that is clearly failing in staging).

Developers don’t care about Git hashes

Git hashes are great when it comes to source code operations. When developers perform basic merges, cherry-picks, and rebase scenarios, they do care about Git hashes. But when it comes to deployments, Git hashes mean nothing:

  • Git hash names have no specific order. You cannot look at 2 hashes and understand which is newer/older.
  • Git hashes can only capture the state of a source code snapshot or the specific commit for Kubernetes manifests, but never both at the same time.
  • They allow for mistakes when it comes to deployments, especially when developers have to copy/paste hashes between different tools.

What developers care about instead is software versions. Version numbers are simple to read, simple to reason, and simple to understand.

Products and dashboards that expose Git hashes as a central concept are cumbersome and difficult to use. It’s ok if Git hashes are an additional piece of information for a deployment, but they should never be used for direct promotions or other day-to-day operations.

What developers really want to see are versions. Or, at least some kind of numbering where ordering is easy to understand.

So, where do these versions come from?

Here we reach one of the biggest misconceptions about container images for Kubernetes. Several tools (and dashboards) simply use a special tag with a version number on a container that gets “promoted” from each environment to the next. Using a container image version for the promotion process might make sense at first glance.

However, this approach misses 2 facts completely:

  • Docker tags are mutable by default. Just because you see a container called my-image:v1.0 in one environment and another container my-image:v1.0 in another environment, that doesn’t mean they’re the same application. They might be completely different.
  • Used as an application version, the container tag only works in 80% of cases. Developers sometimes need to promote a container image AND associated configuration (like configmaps and secrets). In those cases, the container tag is NOT enough to understand what gets promoted and where.

You can partially solve the first problem with tooling support. You need to instruct ALL tools that take part in the software lifecycle to treat container tags as immutable. This is also a very basic requirement for any package registry that your organization is using.

You can solve the second problem with Helm charts. Helm charts give you access to 2 additional “version” properties (in addition to your container tag). You can annotate a Helm chart with an “application” version and also have a different version for the chart itself. This way, when you promote a Helm chart, you can promote a container image PLUS additional configuration in a single step.

However, not all organizations use Helm charts. Several teams prefer to use Kustomize or even plain manifests. In this case, the versioning problem still exists.

Unfortunately, most tools assume that developers only care about container images and center their whole interface around image tags.

Developers would prefer a system that lets them promote configuration and container tags at the same time. This would cover 100% of their needs and cater even to edge case scenarios where they only promote configuration, while the container image stays the same.

Tools that expose Git hashes and assume developers only work with container tags completely miss the way Kubernetes applications work.

Don’t abuse pipelines as promotion mechanisms

Many developers see Continuous Delivery (CD) as the next evolution of Continuous Integration (CI). After all, before you deploy a container image, you need to build and test it first.

Several teams that switch to cloud-native development make their first step in CD by abusing their existing CI pipelines. Developers love to see a single pipeline that shows the whole picture for a specific feature, from the initial code commit all the way through to production.

The main problem here is that the typical CI pipeline only knows what is happening WHILE it’s running. After it finishes, it has no visibility into the actual cluster.

This leads to the classic problem of failed deployments in the following manner:

  1. A developer commits a new feature (or merges something in a branch).
  2. The CI pipeline starts and then builds/tests a container image with success.
  3. The CI pipeline deploys the image to a Kubernetes cluster and optionally runs some checks.
  4. Everything looks good, and the pipeline shows its status as “green”.
  5. Ten minutes later, the application has issues (memory leaks, wrong dependencies, missing DB).
  6. Developers get paged about a failed deployment, even though the pipeline STILL shows as green.

Your organization probably falls into this trap if developer teams always talk about “lack of deployment visibility”, “wasting time to troubleshoot deployments”, “not enough production access”, and similar complaints. 

In reality, developers look at the CI system for the basic build. Then they need to go to another system (usually a metrics/monitoring solution) to understand what’s happening with their application.

The key takeaway here is that instead of a basic CI pipeline, you need a system that gives developers real-time information about deployments and promotions. In that system, the “green” status means that the application is healthy RIGHT NOW, not 5 minutes before.

Now developers can use a single interface for deploying/promoting, and understanding if the application is healthy. They can go to their metrics solution when things go wrong, but in the happy path scenario, a single system can tell them if the application is successfully running in a Kubernetes cluster (or a specific environment, as we saw earlier in the article).

Stop deploying and start promoting

You should now understand what developers actually need and why existing solutions aren’t designed with Kubernetes/GitOps in mind. There are several initiatives right now for investing in developer portals in big organizations, and, unfortunately, developers don’t always have a say about exactly what they need from an internal platform.

The next question is whether you actually need to create a platform like this from scratch. You might think that Argo CD is a solution that helps developers with Kubernetes deployments and that simply adopting Argo CD will make developers happy. In reality, Argo CD is a great sync engine, but doesn’t try to solve changes between applications, promotions, or environments. For example, Argo CD doesn’t have the concept of environments, instead only operating with individual clusters.
This is why we extended Argo CD to create Codefresh GitOps Cloud. We looked at existing solutions and understood that developer experience is always an afterthought, even in newer platforms that are supposedly designed with Kubernetes in mind.

Codefresh GitOps Cloud implements all the best patterns we explained that developers need:

  1. It works with environments instead of individual clusters. Each environment can be one cluster, a set of clusters, a set of namespaces, application labels, or any combination of those.
  2. It’s based around promotions. Developers can easily understand what’s different between 2 environments and how to move an application from one environment to the next.
  3. Git hashes are there if you need them. The central construct, however, is products and their versions. A product is a new entity that includes an application along with its configuration and its container images. When you promote from one environment to another, you promote the whole application and not individual container images.
  4. The graphical dashboard always shows real-time information. When you see a “green” checkmark, it means that an application is running successfully in an environment right now. Developers can detect right away what deployment was successful and what failed without going to another system.
  5. Argo CD and its amazing sync engine power everything behind the scenes.

You can use Codefresh GitOps Cloud today along with your existing CI system (like Jenkins and GitHub Actions). GitOps Cloud doesn’t replace your CI solution. It makes developers happy by giving them a dedicated platform for application promotions, which is what developers really need (instead of playing deployments).

Oh, one more thing. If you already have your own Argo CD instance, you can bring it along

Ready to start your GitOps journey with Codefresh? Try GitOps Cloud free for 45 days now.

The post Why Environments Beat Clusters For Dev Experience appeared first on Codefresh.

]]>
https://codefresh.io/blog/why-environments-beat-clusters-for-dev-experience/feed/ 0
Combine the Codefresh GitOps Cloud with your existing Argo CD instance https://codefresh.io/blog/bring-your-own-argocd/ https://codefresh.io/blog/bring-your-own-argocd/#respond Wed, 30 Apr 2025 08:16:16 +0000 https://codefresh.io/?p=16968 We recently announced the new Codefresh GitOps Cloud, the easiest way to promote changes across Argo CD applications–even across different clusters. With Codefresh GitOps Cloud, you can model your own promotion flow with a graphical editor (although YAML is still available). You define exactly how an application reaches production, including all the requirements and approval […]

The post Combine the Codefresh GitOps Cloud with your existing Argo CD instance appeared first on Codefresh.

]]>
We recently announced the new Codefresh GitOps Cloud, the easiest way to promote changes across Argo CD applications–even across different clusters.

With Codefresh GitOps Cloud, you can model your own promotion flow with a graphical editor (although YAML is still available). You define exactly how an application reaches production, including all the requirements and approval gates your organization needs.

With Codefresh, environment information gets modelled in the platform itself. An environment can be any cluster or namespace in a cluster or any combination of the two. This means for Codefresh, your environments are first-level constructs, not just naming conventions.

At the same time, you can enrich your applications with extra information, like source code features, JIRA tickets, unit testing results, and other aspects of the software lifecycle that aren’t normally known to Argo CD.

After speaking with several teams about our new offering, we realized the most exciting feature for many folks is the new capability of connecting your existing Argo CD instance to the Codefresh platform.

We briefly described in the announcement blog post that you can now keep your existing Argo CD installation and still get all the benefits of the GitOps Cloud. We didn’t explain, however, how the integration works under the hood and why it helps teams that have already progressed in their GitOps journey. In this post, we share the technical details on how Codefresh interacts with your own Argo CD instance.

The user experience

It all starts with the installation instructions.

Here, you verify that you have all the requirements for installing the Codefresh GitOps runtime. The wizard also provides you with a full Helm command that you can run in your own Kubernetes cluster to get everything up and running. It’s also possible to use Terraform or any other similar method to install your Codefresh runtime.

Note that the traditional method (installing all Argo projects from scratch) is still available. Our documentation page has more information, including all the requirements and limitations for both options.

So, how does this work? What does Codefresh install in your cluster?

Architecture of the GitOps runtime

The image below shows all the components of the GitOps runtime. These include the other 3 Argo projects (Rollouts, Events, Workflows) and a set of custom components that contain part of the Codefresh control plane.

You can find a full description of what all these components do on our runtime documentation page

Notice that Argo Rollouts is special. You also need to run it on each cluster that you wish to deploy applications to (if you follow progressive delivery). 

Because our users operate across a wide variety of applications, we support 2 communication modes for the runtime: 

  • Outbound-only communication is more secure and easier to install, but might be less performant (because the runtime polls the Codefresh control plane).
  • Inbound communication is more complex to set up as you need to open firewall ports in your infrastructure, but can offer performance benefits. 

The Codefresh runtime supports both ways! You can install the Codefresh runtime and expose it via standard Kubernetes ingress or set up a tunnel-based solution that needs no exposed ports.

This means you can decide what’s best for your organization based on your risk requirements versus ease of installation.

Below is the updated architecture diagram with a tunnel-based solution.

Now you can clearly see the direction of traffic. All network streams start from your own cluster and point to the Codefresh control plane. This means you can easily install the GitOps runtime in clusters that are behind a firewall and not accessible on the public internet.

Here’s the architecture diagram for an ingress-based installation.

Notice again the direction of traffic and, more specifically, the arrow that joins the “GitOps Client(UI)” with the GitOps runtime.

Communication with your own Argo CD instance

We also need to explain how the GitOps runtime retrieves info from your Argo CD instance. As we explained in the announcement, Codefresh GitOps is not just a wrapper over Argo CD. It adds more features and defines all the missing pieces (like environments and promotions) that teams require when they move to GitOps-based CD.

The Codefresh platform needs to know what your Argo CD instance is doing. This information is retrieved by standard Kubernetes events.

When you want to deploy a new version of your application (or promote it via the Codefresh UI), a commit happens in Git. Argo CD then syncs the changes and publishes the details in the Kubernetes API. 

The event reporter component of the GitOps runtime subscribes to the same Kubernetes API and retrieves all changes that happened to the Argo CD application. The reporter also asks the Argo CD server for the live manifests, as it resides on the cluster.

All this information is then forwarded to the Codefresh control plane and is accessible to the Codefresh dashboards.

Now you know exactly how that GitOps runtime interacts with your own Argo CD instance. As you can see, there’s nothing you need to change in your existing Argo CD installation.

Conclusion

We hope you now understand better how Codefresh GitOps Cloud works, and, more specifically:

  • The requirements
  • What gets installed in your cluster
  • The role of the Codefresh control plane
  • How network traffic flows between all components
  • What data stays within your premises, and what’s accessed by the Codefresh platform

Ready to start your GitOps journey with Codefresh? Try GitOps Cloud free for 45 days now.

The post Combine the Codefresh GitOps Cloud with your existing Argo CD instance appeared first on Codefresh.

]]>
https://codefresh.io/blog/bring-your-own-argocd/feed/ 0
Introducing Codefresh GitOps Cloud – Seamless environment promotions across clusters using your existing Argo CD https://codefresh.io/blog/introducing-codefresh-gitops-cloud/ https://codefresh.io/blog/introducing-codefresh-gitops-cloud/#respond Sun, 30 Mar 2025 23:48:40 +0000 https://codefresh.io/?p=16872 We’re excited to announce our new Codefresh GitOps Cloud offering that lets you bring your GitOps deployments to the next level. You get:  In this blog post, we’ll dive into some of the key features of Codefresh GitOps Cloud. Multi-environment application promotions with Argo CD As more organizations adopt Argo CD, platform engineers need to […]

The post Introducing Codefresh GitOps Cloud – Seamless environment promotions across clusters using your existing Argo CD appeared first on Codefresh.

]]>
We’re excited to announce our new Codefresh GitOps Cloud offering that lets you bring your GitOps deployments to the next level. You get: 

  • Multi-environment application promotions with Argo CD, including:
    • Full visibility for application engineers
    • GitHub checks to keep everyone informed 
    • Smart concurrency settings to keep changes flowing
    • Complex promotion modeling with workflow hooks and actions
  • The ability to bring your own Argo CD instance or install our Codefresh runtime
  • Straightforward pricing
  • A roadmap of features to come

In this blog post, we’ll dive into some of the key features of Codefresh GitOps Cloud.

Multi-environment application promotions with Argo CD

As more organizations adopt Argo CD, platform engineers need to work out how to implement a promotion workflow between different environments. Developers need to gradually move their application between several pre-production environments to safely test it and confidently deploy it to production. 

If you’re already familiar with Argo CD, you’ll know that Argo CD doesn’t have concepts for environments or promotions. While Argo CD is a great sync engine for deploying application manifests to Kubernetes clusters, it doesn’t understand how a single application gets promoted between environments. This forces many teams to abuse CI tools for environment promotions and resort to custom scripts.

These custom promotions scripts are hard to maintain, difficult to debug, and, most importantly, require time and effort that your organization could spend creating features instead. The problem is even more evident if you have a large number of applications and have adopted microservices across several different Kubernetes clusters. Adding a new environment often needs updates in several places in these scripts. Application engineers can never really self-serve. 

But there is a better way–with GitOps Cloud. Connect your existing Argo CD instances to start managing environment promotion without all the custom scripting. With GitOps Cloud, you can compare and promote changes between Argo CD apps, Argo CD instances, and clusters. Promotion workflows can take into account the different requirements of different environments while providing a simple dashboard for visibility at a glance no matter where apps and environment promotions are deploying to. 

Codefresh GitOps Cloud makes it easy for application teams to gain full visibility, keeps everyone informed, and keeps changes flowing. You can also model complex workflows with pre/post hooks. Here are some of the features of environment promotion with Codefresh.

Full visibility for application engineers

Unlike using Argo CD on its own, Codefresh GitOps Cloud understands your environments, including different clusters, different namespaces, or any other combination of them. This gives you full visibility on how the same application advances through different environments, even across several Kubernetes clusters and Argo CD installations.

Codefresh makes environment promotion so easy that you can even promote with simple drag-and-drop actions. 

We know every organization has different promotion workflows. You can model your exact promotion process with the workflow editor and then enforce it across your applications as a golden path for your developers with a Kubernetes CRD. Create any kind of serial or parallel flow with a friendly graphical editor, or directly in YAML if you prefer. 

Following a single application as it moves through environments is not enough if you want to scale your Argo CD adoption. You also need a way to trace your applications back to individual changes.

When something breaks down, the first question is always, “What was the last change deployed?” You can answer this question in seconds using the timeline view in Codefresh GitOps Cloud.

The timeline view connects your Argo CD applications with the source code features they contain. This lets you understand exactly what’s deployed in each environment on a feature level. This dashboard solves a major limitation of Argo CD: the lack of visibility in the CI process. Argo CD only knows the version of a container image and nothing else.

For more information on how Codefresh models GitOps environments, please read our dedicated blog posts about:

The grouping of different applications into products

Keep everyone informed with GitHub checks

We know application engineers want to avoid context-switching between multiple tools to stay in the flow. With GitHub checks, developers can see what happened with their promotion right where the action happens—on a pull request (PR).

This means that in simple scenarios, developers don’t even need to visit the Codefresh promotion dashboard. Assuming the promotion is successfully finished, they can use the familiar GitHub UI to understand what happened to their deployment.

Keep changes flowing with smart concurrency settings

Frequent deployments are a key characteristic of fast-moving organizations. Several customers have told us that having frequent commits on a big monorepo that holds Argo CD manifests becomes challenging after scaling to a certain size.

The main issue is that Argo CD only synchronizes Kubernetes resources according to Git contents, without any insight into how important or “fresh” a commit is.

Now, in Codefresh GitOps Cloud, you can explicitly define what happens when too many commits land at the same time and a promotion starts while another is already underway.

You choose whether to terminate the previous promotion deployments (keeping only the latest one) or force everything into a queue for a more gradual promotion process without gaps.

This feature is a game changer for teams that have adopted GitOps in all their environments and currently have to “gate” developers in the CI system instead of handling this at the CD level.

Model complex promotions with workflow hooks and actions

In several scenarios, you need to accompany your promotions with other supporting tasks that aren’t directly modeled as Kubernetes resources. Some examples are:

  1. Sending a Slack notification when a promotion starts or finishes
  2. Updating a ticket when a feature reaches a specific environment
  3. Waiting for an approval before moving to the next step of the promotion workflow
  4. Performing a verification check before a lengthy migration action

Codefresh promotion workflows let you define several checks before and after a promotion happens.

 These checks can be anything you want, from simple smoke tests to preflight verifications to comprehensive load testing suites.

A new feature available in Codefresh GitOps Cloud is promotion hooks.

Like traditional pipelines, promotion hooks let you define extra events or requirements that happen when a promotion starts, ends, or fails. For example, you could set up a hook to send a Slack message if a promotion fails, or a webhook call to your Grafana instance when a deployment starts.

Promotion hooks are fully integrated with the promotion flow dashboards. This means they’re saved as YAML (the GitOps way) and you can edit them using a friendly graphical interface.

Runs on your existing Argo CD infrastructure

For several years, we’ve shipped the Codefresh runtime which packages all 4 Argo projects. But some customers have struggled with the migration effort to get full value from the Codefresh platform. Migrating from custom configurations of Argo required a lot of effort to test to ensure nothing broke.

We’ve been working to overcome this challenge and we’re excited to provide the ability to connect to existing Argo CD infrastructure with Codefresh GitOps Cloud. In this new mode, you can connect your existing Argo CD instance to the Codefresh platform and still get access to all the groundbreaking features of environment promotions or workflow hooks. The agent lets you bring your own Argo CD instance, and works in a true plug-and-play mode. You install it in minutes, and if you change your mind you can also remove it without affecting your existing Argo CD instance.

You can still choose the Codefresh runtime to manage all Argo services in one bundle—Argo CD, Rollouts, Workflows, and Events. We’ve made it even easier to deploy with a new installation process that reduces the effort required.

These options let you decide what’s best for you—simplified infrastructure management, or easier adoption on top of existing Argo CD infrastructure.

Straightforward pricing

We believe your team shouldn’t have to maintain custom scripts that are difficult to update and debug. You can start using Codefresh GitOps Cloud today with your existing Argo CD instances, starting at $4,170/year. If you purchase by June 1, 2025, you also get 3 bundles for our GitOps training and certification to further enhance your Argo CD experience.

You can try GitOps Cloud for free for 45 days to see how easy it is to solve environment promotion once and for all, while keeping your existing Argo CD investment.

What’s coming next

We know environment promotion is one of the biggest challenges when adopting GitOps. We’re directly involved with the GitOps working group and are active maintainers of Argo CD leading releases, security patches, and more. This means we see firsthand how teams struggle with promotions and, more specifically, applications that span different clusters. It’s why we’ve implemented a number of features to help make environment promotion easier for teams. 

But environment promotion is just the beginning. We want to make adopting GitOps easy for teams of all sizes. We’re already working on the next major feature for Codefresh GitOps Cloud.

Ready to start your GitOps journey with Codefresh? Try GitOps Cloud free for 45 days now.

You can also join us for a live webinar on April 9 to learn how Codefresh GitOps Cloud simplifies and accelerates multi-environment application promotions using Argo CD. You’ll see how teams can connect multiple Argo CD instances to a single control plane—no extra software required—and get a sneak peek at what’s coming next. Don’t miss this chance to see GitOps Cloud in action and get your questions answered by the product team. Register for the webinar.

The post Introducing Codefresh GitOps Cloud – Seamless environment promotions across clusters using your existing Argo CD appeared first on Codefresh.

]]>
https://codefresh.io/blog/introducing-codefresh-gitops-cloud/feed/ 0