Skip to content

perf(ci): add Docker buildx layer caching for image builds#19806

Draft
davdhacs wants to merge 19 commits intomasterfrom
davdhacs/docker-buildx-cache
Draft

perf(ci): add Docker buildx layer caching for image builds#19806
davdhacs wants to merge 19 commits intomasterfrom
davdhacs/docker-buildx-cache

Conversation

@davdhacs
Copy link
Copy Markdown
Contributor

@davdhacs davdhacs commented Apr 2, 2026

Description

Enable GHA buildx cache for Docker image builds in the build workflow. Docker layers (base image pulls, microdnf upgrade, package installs) are cached across CI runs, avoiding redundant work on every build.

How it works

  • scripts/docker-build.sh checks for DOCKER_BUILDX_CACHE env var
  • When set, adds --cache-from type=gha and --cache-to type=gha,mode=max to buildx
  • Cache scoped per image + architecture (e.g., main-amd64, operator-arm64)
  • Opt-in: no effect on local builds (env var not set)

Images cached

  • build-and-push-main: main image + central-db image
  • build-and-push-main: roxctl image
  • build-and-push-operator: operator image

Expected impact

The "Build main images" step currently takes ~115s. Most of that time is:

  • Base image pull (~10s)
  • microdnf upgrade --nobest (~40s)
  • microdnf install util-linux (~15s)
  • fetch-stackrox-data.sh (~20s)

With layer caching, these RUN steps are cached after the first build. Only the COPY steps with new binaries rebuild. Expected warm time: ~30-40s (vs 115s cold).

User-facing documentation

  • CHANGELOG not needed
  • documentation PR not needed

Testing and quality

  • the change is production ready
  • CI results are inspected

How I validated my change

CI validation — compare "Build main images" step timing between first run (cold, saves cache) and second run (warm, restores cache).

🤖 Generated with Claude Code

Enable GHA buildx cache for main, roxctl, and operator image builds.
Docker layers (base image pulls, package installs) are cached across
CI runs, avoiding redundant microdnf upgrade/install on every build.

Cache is opt-in via DOCKER_BUILDX_CACHE env var to avoid affecting
local builds. Scoped per image and architecture to prevent collisions.

Expected savings: ~60-80s off the 115s "Build main images" step on
warm runs (package install layers cached).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 2, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Chores
    • Improved CI Docker build caching to speed up image builds and reduce repeated work during automated builds.
  • Refactor
    • Reworked container image build stages to centralize OS-level package setup, reducing redundant package operations and producing leaner runtime images.

Walkthrough

Adds Docker Buildx layer caching support in CI and build scripts, and restructures the RHEL image Dockerfile to move OS-level package installation into a new intermediate stage (base-with-packages) while keeping the final runtime stage lean.

Changes

Cohort / File(s) Summary
CI Workflow
.github/workflows/build.yaml
Sets DOCKER_BUILDX_CACHE to arch-scoped values (main-${{ matrix.arch }}, operator-${{ matrix.arch }}) for relevant build steps.
Build script
scripts/docker-build.sh
Conditionally constructs Buildx cache arguments (--cache-from type=gha, --cache-to type=gha,mode=max,scope=...) when DOCKER_BUILDX_CACHE is non-empty and injects them into docker buildx build.
RHEL image Dockerfile
image/rhel/Dockerfile
Introduces base-with-packages intermediate stage to perform GPG import, RPM installs, microdnf operations, cache cleanup, and ownership setup; final runtime stage now FROM base-with-packages and only copies application artifacts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: enabling Docker buildx layer caching for CI image builds to improve build performance.
Description check ✅ Passed The PR description is thorough and well-structured. It clearly explains the feature, implementation details, affected images, expected performance impact, and validation approach.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch davdhacs/docker-buildx-cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🚀 Build Images Ready

Images are ready for commit 9096228. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-561-g9096228e8f

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.60%. Comparing base (a73bc3a) to head (9096228).
⚠️ Report is 15 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #19806    +/-   ##
========================================
  Coverage   49.59%   49.60%            
========================================
  Files        2763     2763            
  Lines      208167   208271   +104     
========================================
+ Hits       103250   103312    +62     
- Misses      97252    97292    +40     
- Partials     7665     7667     +2     
Flag Coverage Δ
go-unit-tests 49.60% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Move OS package installation (microdnf upgrade, postgres RPMs,
util-linux) into a separate 'base-with-packages' stage. The final
stage uses FROM base-with-packages and only COPYs binaries.

Before: COPY binaries → RUN microdnf (rebuilds packages every commit)
After:  base-with-packages stage (cached) → COPY binaries (fast)

With Docker buildx GHA cache, the package install layer (~60s) is
cached across CI runs. Only the binary COPY steps rebuild per commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@image/rhel/Dockerfile`:
- Around line 25-39: The build fails because the save-dir-contents binary
(image/rhel/static-bin/save-dir-contents) is used in the base-with-packages
stage before it is copied from later stages; copy that static binary into the
base-with-packages stage (add a COPY of image/rhel/static-bin/save-dir-contents
into the base-with-packages stage before the RUN that invokes it, mirroring the
pattern used in scanner/image/scanner/Dockerfile) so the RUN can execute it.
Also add a simple cache-bust/refresh mechanism for the base-with-packages layer
(e.g., introduce an optional build ARG like CACHEBUST or use
--pull/force-refresh in CI) so microdnf upgrade and RPM installs aren’t silently
skipped on warm cached base-with-packages layers.
- Around line 44-52: The Dockerfile calls the save-dir-contents script in the
base-with-packages stage before the script is copied in, causing build failures;
either copy the static-bin directory (or at least the save-dir-contents script)
into the base-with-packages stage before the chown/save-dir-contents invocation,
or move the entire directory-initialization block that calls save-dir-contents
(the chown/mkdir lines) to after the static-bin copy in the final stage where
save-dir-contents is available; locate references to save-dir-contents and the
static-bin copy in the Dockerfile to apply the fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 731e455b-f445-4230-b81b-0f4830c8677a

📥 Commits

Reviewing files that changed from the base of the PR and between c55e335 and c917dc8.

📒 Files selected for processing (1)
  • image/rhel/Dockerfile

Comment on lines +25 to +39
# Install OS packages in a separate stage so Docker can cache this layer
# independently of binary changes. Package installs rarely change, but
# binaries change every commit — this ordering avoids rebuilding packages
# when only binaries change.
FROM ${BASE_REGISTRY}/${BASE_IMAGE}:${BASE_TAG} AS base-with-packages

COPY signatures/RPM-GPG-KEY-CentOS-Official /
COPY --from=downloads /output/rpms/ /tmp/

RUN rpm --import RPM-GPG-KEY-CentOS-Official && \
microdnf -y upgrade --nobest && \
rpm -i --nodeps /tmp/postgres-libs.rpm && \
rpm -i --nodeps /tmp/postgres.rpm && \
microdnf install --setopt=install_weak_deps=0 --nodocs -y util-linux && \
microdnf clean all -y && \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate the Dockerfile and verify its structure
fd -t f "Dockerfile" | grep -E "(rhel|image)" | head -20

Repository: stackrox/stackrox

Length of output: 1089


🏁 Script executed:

# Read the full Dockerfile to understand the caching stages
cat -n image/rhel/Dockerfile | head -100

Repository: stackrox/stackrox

Length of output: 4645


🏁 Script executed:

# Search for any cache-busting mechanisms or build args
rg "CACHE_BUST|cache.*bust|--cache-from|BUILD_ARG" image/rhel/ -A 2 -B 2

Repository: stackrox/stackrox

Length of output: 43


🏁 Script executed:

# Look for CI/workflow files that define the build and publish process
fd -t f -e yml -e yaml -e sh | grep -iE "(workflow|ci|build|deploy|publish)" | head -20

Repository: stackrox/stackrox

Length of output: 1066


🏁 Script executed:

# Search for save-dir-contents definition
rg "save-dir-contents" -t sh -t dockerfile --max-count 20

Repository: stackrox/stackrox

Length of output: 95


🏁 Script executed:

# Search for save-dir-contents definition (could be shell script, binary, or part of base image)
rg "save-dir-contents" --max-count 20 -B 2 -A 2

Repository: stackrox/stackrox

Length of output: 3877


🏁 Script executed:

# Look for any build/ci workflow files
find . -type f \( -name "*.yml" -o -name "*.yaml" -o -name "Makefile" -o -name "*.mk" \) | grep -iE "(\.github|\.gitlab|\.circleci|workflow|build|ci)" | head -30

Repository: stackrox/stackrox

Length of output: 1339


🏁 Script executed:

# Check if there's a scripts directory with build helpers
fd -t f "save-dir-contents" 2>/dev/null || find . -type f -name "*save*" | head -20

Repository: stackrox/stackrox

Length of output: 100


🏁 Script executed:

# Check the Dockerfile again for BASE_TAG and how it's used in CI context
rg "BASE_TAG|LABEL_VERSION|build" image/rhel/ -B 1 -A 1

Repository: stackrox/stackrox

Length of output: 6045


Fix critical build failure: save-dir-contents unavailable in base-with-packages stage.

Line 48 invokes save-dir-contents but the binary is not copied until line 77 in the final stage. The save-dir-contents script exists at image/rhel/static-bin/save-dir-contents but must be explicitly copied into the base-with-packages stage before use, similar to the pattern in scanner/image/scanner/Dockerfile. This will cause an immediate build failure.

Additionally, while BASE_TAG defaults to latest, the base-with-packages layer will still be cache-hit on warm builds, preventing microdnf upgrade and Postgres RPM installs from rerunning even when the base image receives security updates. Consider adding a cache-busting mechanism or disabling this optimization in production build workflows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@image/rhel/Dockerfile` around lines 25 - 39, The build fails because the
save-dir-contents binary (image/rhel/static-bin/save-dir-contents) is used in
the base-with-packages stage before it is copied from later stages; copy that
static binary into the base-with-packages stage (add a COPY of
image/rhel/static-bin/save-dir-contents into the base-with-packages stage before
the RUN that invokes it, mirroring the pattern used in
scanner/image/scanner/Dockerfile) so the RUN can execute it. Also add a simple
cache-bust/refresh mechanism for the base-with-packages layer (e.g., introduce
an optional build ARG like CACHEBUST or use --pull/force-refresh in CI) so
microdnf upgrade and RPM installs aren’t silently skipped on warm cached
base-with-packages layers.

Comment on lines +44 to +52
# The contents of paths mounted as emptyDir volumes in Kubernetes are saved
# by the script `save-dir-contents` during the image build. The directory
# contents are then restored by the script `restore-all-dir-contents`
# during the container start.
chown -R 4000:4000 /etc/pki/ca-trust && save-dir-contents /etc/pki/ca-trust/source && \
mkdir -p /var/lib/stackrox && chown -R 4000:4000 /var/lib/stackrox && \
mkdir -p /var/log/stackrox && chown -R 4000:4000 /var/log/stackrox && \
mkdir -p /var/cache/stackrox && chown -R 4000:4000 /var/cache/stackrox && \
chown -R 4000:4000 /tmp
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat image/rhel/Dockerfile | head -100

Repository: stackrox/stackrox

Length of output: 3945


🏁 Script executed:

# Search for save-dir-contents definition or script
fd -type f -name "*save-dir*"

Repository: stackrox/stackrox

Length of output: 231


🏁 Script executed:

# Also search for it in all files
rg "save-dir-contents" -t sh -t bash --files-with-matches

Repository: stackrox/stackrox

Length of output: 89


🏁 Script executed:

# Search within static-bin or similar directories
fd -type f -o -type l | grep -E "(static-bin|bin)" | head -20

Repository: stackrox/stackrox

Length of output: 231


🏁 Script executed:

# Search for save-dir-contents in all files
rg "save-dir-contents" --type-list | head -5

Repository: stackrox/stackrox

Length of output: 450


🏁 Script executed:

# Simple grep search for the script
rg "save-dir-contents" -l

Repository: stackrox/stackrox

Length of output: 203


🏁 Script executed:

# Find static-bin directory
fd "static-bin" -type d

Repository: stackrox/stackrox

Length of output: 231


🏁 Script executed:

# Show where save-dir-contents is defined
rg "save-dir-contents" -B 3 -A 3

Repository: stackrox/stackrox

Length of output: 4895


🏁 Script executed:

# Look for scripts directories
fd -type d -name "scripts" -o -name "bin"

Repository: stackrox/stackrox

Length of output: 231


🏁 Script executed:

# Check for static-bin in git
git ls-files | grep -i "static-bin"

Repository: stackrox/stackrox

Length of output: 537


🏁 Script executed:

# Get the full Dockerfile around the critical lines
cat image/rhel/Dockerfile | sed -n '25,80p'

Repository: stackrox/stackrox

Length of output: 2427


🏁 Script executed:

# Verify static-bin contains the script
ls -la image/rhel/static-bin/save-dir-contents

Repository: stackrox/stackrox

Length of output: 148


save-dir-contents is unavailable at line 48—the build will fail.

The save-dir-contents script resides in image/rhel/static-bin/ but is not copied into the base-with-packages stage before line 48 attempts to call it. The script is only copied at line 72, which occurs in the final stage—after the base-with-packages stage has already completed. Move the directory-initialization block (lines 48–52) to after line 72 where the script is available, or copy the script into this stage explicitly before line 48.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@image/rhel/Dockerfile` around lines 44 - 52, The Dockerfile calls the
save-dir-contents script in the base-with-packages stage before the script is
copied in, causing build failures; either copy the static-bin directory (or at
least the save-dir-contents script) into the base-with-packages stage before the
chown/save-dir-contents invocation, or move the entire directory-initialization
block that calls save-dir-contents (the chown/mkdir lines) to after the
static-bin copy in the final stage where save-dir-contents is available; locate
references to save-dir-contents and the static-bin copy in the Dockerfile to
apply the fix.

davdhacs and others added 15 commits April 2, 2026 18:07
The RUN step calls save-dir-contents which is in static-bin/.
Copy just the helper scripts needed for package installation.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace make docker-build-main-image with docker/build-push-action@v6
which handles GHA buildx cache natively. The action manages cache
tokens and builder configuration that our docker-build.sh wrapper
was missing.

The base-with-packages Dockerfile stage (package installs) should
now cache across CI runs, skipping microdnf upgrade/install when
only binaries change.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Each COPY --link creates an independent overlay layer. Changing one
binary (e.g. bin/central) doesn't invalidate other COPY layers
(ui, static-data, etc). Combined with GHA buildx cache, this means
only the changed binaries need to be re-copied on warm builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use push: true instead of load: true on docker/build-push-action.
This pushes layers directly to the registry from buildkit, avoiding
the slow --load export to local docker (~90s overhead even with all
layers cached).

The main image is now built and pushed in one step. roxctl and
central-db still use the separate push step.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Container env vars removed by Tomecz's PR aren't available as
env vars anymore. Use secrets.* directly in docker/login-action.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The push-main-manifests step expects per-arch tags (e.g. main:tag-amd64)
to create multi-arch manifest lists.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
docker/login-action with registry: quay.io/org doesn't work for
quay.io. Use the existing registry_rw_login helper which handles
quay.io authentication correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…r stackrox-io

docker/login-action can only authenticate to one quay.io org at a time.
Push main to rhacs-eng via build-push-action (fast, cached layers).
Use existing push_main_image_set for stackrox-io and other images
(handles multi-org login correctly by re-authenticating per org).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
build-push-action pushes directly to registry without loading locally.
The existing push_main_image_set expects local images. Instead:
- Push roxctl/central-db from local docker (built by make)
- Copy main from rhacs-eng to stackrox-io via skopeo

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
push:true creates manifest lists instead of plain images, breaking
the multi-arch manifest creation step. Revert to load:true which
loads into local docker and uses the existing push_main_image_set
pipeline unchanged.

GHA layer cache still works with load:true (17 cached layers confirmed).
Build step: 105s warm vs 110s baseline (5% faster). The --load export
overhead limits the savings but the Dockerfile restructuring and
COPY --link provide the foundation for future improvements.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use push:true + provenance:false to push main image directly from
buildx to the registry. provenance:false produces a plain image
manifest (not a manifest list), compatible with the downstream
push-main-manifests job that creates multi-arch manifest lists.

Login to both quay.io orgs before the build step so buildx can push
to both registries. roxctl and central-db still use docker push
(built locally by make).

Expected: Build+push main image ~55s (vs 105s with load:true).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…g copy

buildx's docker-container driver doesn't share host docker credentials.
Use docker/login-action (which injects creds into the buildx builder)
for rhacs-eng push. Copy main to stackrox-io via skopeo (lightweight,
blobs shared on quay.io).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
docker login can only hold one quay.io credential at a time. Use
skopeo --src-creds and --dest-creds to authenticate to both orgs
simultaneously for the rhacs-eng → stackrox-io copy.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant