perf(ci): add Docker buildx layer caching for image builds#19806
perf(ci): add Docker buildx layer caching for image builds#19806
Conversation
Enable GHA buildx cache for main, roxctl, and operator image builds. Docker layers (base image pulls, package installs) are cached across CI runs, avoiding redundant microdnf upgrade/install on every build. Cache is opt-in via DOCKER_BUILDX_CACHE env var to avoid affecting local builds. Scoped per image and architecture to prevent collisions. Expected savings: ~60-80s off the 115s "Build main images" step on warm runs (package install layers cached). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds Docker Buildx layer caching support in CI and build scripts, and restructures the RHEL image Dockerfile to move OS-level package installation into a new intermediate stage ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Build Images ReadyImages are ready for commit 9096228. To use with deploy scripts: export MAIN_IMAGE_TAG=4.11.x-561-g9096228e8f |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19806 +/- ##
========================================
Coverage 49.59% 49.60%
========================================
Files 2763 2763
Lines 208167 208271 +104
========================================
+ Hits 103250 103312 +62
- Misses 97252 97292 +40
- Partials 7665 7667 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Move OS package installation (microdnf upgrade, postgres RPMs, util-linux) into a separate 'base-with-packages' stage. The final stage uses FROM base-with-packages and only COPYs binaries. Before: COPY binaries → RUN microdnf (rebuilds packages every commit) After: base-with-packages stage (cached) → COPY binaries (fast) With Docker buildx GHA cache, the package install layer (~60s) is cached across CI runs. Only the binary COPY steps rebuild per commit. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@image/rhel/Dockerfile`:
- Around line 25-39: The build fails because the save-dir-contents binary
(image/rhel/static-bin/save-dir-contents) is used in the base-with-packages
stage before it is copied from later stages; copy that static binary into the
base-with-packages stage (add a COPY of image/rhel/static-bin/save-dir-contents
into the base-with-packages stage before the RUN that invokes it, mirroring the
pattern used in scanner/image/scanner/Dockerfile) so the RUN can execute it.
Also add a simple cache-bust/refresh mechanism for the base-with-packages layer
(e.g., introduce an optional build ARG like CACHEBUST or use
--pull/force-refresh in CI) so microdnf upgrade and RPM installs aren’t silently
skipped on warm cached base-with-packages layers.
- Around line 44-52: The Dockerfile calls the save-dir-contents script in the
base-with-packages stage before the script is copied in, causing build failures;
either copy the static-bin directory (or at least the save-dir-contents script)
into the base-with-packages stage before the chown/save-dir-contents invocation,
or move the entire directory-initialization block that calls save-dir-contents
(the chown/mkdir lines) to after the static-bin copy in the final stage where
save-dir-contents is available; locate references to save-dir-contents and the
static-bin copy in the Dockerfile to apply the fix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 731e455b-f445-4230-b81b-0f4830c8677a
📒 Files selected for processing (1)
image/rhel/Dockerfile
| # Install OS packages in a separate stage so Docker can cache this layer | ||
| # independently of binary changes. Package installs rarely change, but | ||
| # binaries change every commit — this ordering avoids rebuilding packages | ||
| # when only binaries change. | ||
| FROM ${BASE_REGISTRY}/${BASE_IMAGE}:${BASE_TAG} AS base-with-packages | ||
|
|
||
| COPY signatures/RPM-GPG-KEY-CentOS-Official / | ||
| COPY --from=downloads /output/rpms/ /tmp/ | ||
|
|
||
| RUN rpm --import RPM-GPG-KEY-CentOS-Official && \ | ||
| microdnf -y upgrade --nobest && \ | ||
| rpm -i --nodeps /tmp/postgres-libs.rpm && \ | ||
| rpm -i --nodeps /tmp/postgres.rpm && \ | ||
| microdnf install --setopt=install_weak_deps=0 --nodocs -y util-linux && \ | ||
| microdnf clean all -y && \ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the Dockerfile and verify its structure
fd -t f "Dockerfile" | grep -E "(rhel|image)" | head -20Repository: stackrox/stackrox
Length of output: 1089
🏁 Script executed:
# Read the full Dockerfile to understand the caching stages
cat -n image/rhel/Dockerfile | head -100Repository: stackrox/stackrox
Length of output: 4645
🏁 Script executed:
# Search for any cache-busting mechanisms or build args
rg "CACHE_BUST|cache.*bust|--cache-from|BUILD_ARG" image/rhel/ -A 2 -B 2Repository: stackrox/stackrox
Length of output: 43
🏁 Script executed:
# Look for CI/workflow files that define the build and publish process
fd -t f -e yml -e yaml -e sh | grep -iE "(workflow|ci|build|deploy|publish)" | head -20Repository: stackrox/stackrox
Length of output: 1066
🏁 Script executed:
# Search for save-dir-contents definition
rg "save-dir-contents" -t sh -t dockerfile --max-count 20Repository: stackrox/stackrox
Length of output: 95
🏁 Script executed:
# Search for save-dir-contents definition (could be shell script, binary, or part of base image)
rg "save-dir-contents" --max-count 20 -B 2 -A 2Repository: stackrox/stackrox
Length of output: 3877
🏁 Script executed:
# Look for any build/ci workflow files
find . -type f \( -name "*.yml" -o -name "*.yaml" -o -name "Makefile" -o -name "*.mk" \) | grep -iE "(\.github|\.gitlab|\.circleci|workflow|build|ci)" | head -30Repository: stackrox/stackrox
Length of output: 1339
🏁 Script executed:
# Check if there's a scripts directory with build helpers
fd -t f "save-dir-contents" 2>/dev/null || find . -type f -name "*save*" | head -20Repository: stackrox/stackrox
Length of output: 100
🏁 Script executed:
# Check the Dockerfile again for BASE_TAG and how it's used in CI context
rg "BASE_TAG|LABEL_VERSION|build" image/rhel/ -B 1 -A 1Repository: stackrox/stackrox
Length of output: 6045
Fix critical build failure: save-dir-contents unavailable in base-with-packages stage.
Line 48 invokes save-dir-contents but the binary is not copied until line 77 in the final stage. The save-dir-contents script exists at image/rhel/static-bin/save-dir-contents but must be explicitly copied into the base-with-packages stage before use, similar to the pattern in scanner/image/scanner/Dockerfile. This will cause an immediate build failure.
Additionally, while BASE_TAG defaults to latest, the base-with-packages layer will still be cache-hit on warm builds, preventing microdnf upgrade and Postgres RPM installs from rerunning even when the base image receives security updates. Consider adding a cache-busting mechanism or disabling this optimization in production build workflows.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@image/rhel/Dockerfile` around lines 25 - 39, The build fails because the
save-dir-contents binary (image/rhel/static-bin/save-dir-contents) is used in
the base-with-packages stage before it is copied from later stages; copy that
static binary into the base-with-packages stage (add a COPY of
image/rhel/static-bin/save-dir-contents into the base-with-packages stage before
the RUN that invokes it, mirroring the pattern used in
scanner/image/scanner/Dockerfile) so the RUN can execute it. Also add a simple
cache-bust/refresh mechanism for the base-with-packages layer (e.g., introduce
an optional build ARG like CACHEBUST or use --pull/force-refresh in CI) so
microdnf upgrade and RPM installs aren’t silently skipped on warm cached
base-with-packages layers.
| # The contents of paths mounted as emptyDir volumes in Kubernetes are saved | ||
| # by the script `save-dir-contents` during the image build. The directory | ||
| # contents are then restored by the script `restore-all-dir-contents` | ||
| # during the container start. | ||
| chown -R 4000:4000 /etc/pki/ca-trust && save-dir-contents /etc/pki/ca-trust/source && \ | ||
| mkdir -p /var/lib/stackrox && chown -R 4000:4000 /var/lib/stackrox && \ | ||
| mkdir -p /var/log/stackrox && chown -R 4000:4000 /var/log/stackrox && \ | ||
| mkdir -p /var/cache/stackrox && chown -R 4000:4000 /var/cache/stackrox && \ | ||
| chown -R 4000:4000 /tmp |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat image/rhel/Dockerfile | head -100Repository: stackrox/stackrox
Length of output: 3945
🏁 Script executed:
# Search for save-dir-contents definition or script
fd -type f -name "*save-dir*"Repository: stackrox/stackrox
Length of output: 231
🏁 Script executed:
# Also search for it in all files
rg "save-dir-contents" -t sh -t bash --files-with-matchesRepository: stackrox/stackrox
Length of output: 89
🏁 Script executed:
# Search within static-bin or similar directories
fd -type f -o -type l | grep -E "(static-bin|bin)" | head -20Repository: stackrox/stackrox
Length of output: 231
🏁 Script executed:
# Search for save-dir-contents in all files
rg "save-dir-contents" --type-list | head -5Repository: stackrox/stackrox
Length of output: 450
🏁 Script executed:
# Simple grep search for the script
rg "save-dir-contents" -lRepository: stackrox/stackrox
Length of output: 203
🏁 Script executed:
# Find static-bin directory
fd "static-bin" -type dRepository: stackrox/stackrox
Length of output: 231
🏁 Script executed:
# Show where save-dir-contents is defined
rg "save-dir-contents" -B 3 -A 3Repository: stackrox/stackrox
Length of output: 4895
🏁 Script executed:
# Look for scripts directories
fd -type d -name "scripts" -o -name "bin"Repository: stackrox/stackrox
Length of output: 231
🏁 Script executed:
# Check for static-bin in git
git ls-files | grep -i "static-bin"Repository: stackrox/stackrox
Length of output: 537
🏁 Script executed:
# Get the full Dockerfile around the critical lines
cat image/rhel/Dockerfile | sed -n '25,80p'Repository: stackrox/stackrox
Length of output: 2427
🏁 Script executed:
# Verify static-bin contains the script
ls -la image/rhel/static-bin/save-dir-contentsRepository: stackrox/stackrox
Length of output: 148
save-dir-contents is unavailable at line 48—the build will fail.
The save-dir-contents script resides in image/rhel/static-bin/ but is not copied into the base-with-packages stage before line 48 attempts to call it. The script is only copied at line 72, which occurs in the final stage—after the base-with-packages stage has already completed. Move the directory-initialization block (lines 48–52) to after line 72 where the script is available, or copy the script into this stage explicitly before line 48.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@image/rhel/Dockerfile` around lines 44 - 52, The Dockerfile calls the
save-dir-contents script in the base-with-packages stage before the script is
copied in, causing build failures; either copy the static-bin directory (or at
least the save-dir-contents script) into the base-with-packages stage before the
chown/save-dir-contents invocation, or move the entire directory-initialization
block that calls save-dir-contents (the chown/mkdir lines) to after the
static-bin copy in the final stage where save-dir-contents is available; locate
references to save-dir-contents and the static-bin copy in the Dockerfile to
apply the fix.
The RUN step calls save-dir-contents which is in static-bin/. Copy just the helper scripts needed for package installation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace make docker-build-main-image with docker/build-push-action@v6 which handles GHA buildx cache natively. The action manages cache tokens and builder configuration that our docker-build.sh wrapper was missing. The base-with-packages Dockerfile stage (package installs) should now cache across CI runs, skipping microdnf upgrade/install when only binaries change. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Each COPY --link creates an independent overlay layer. Changing one binary (e.g. bin/central) doesn't invalidate other COPY layers (ui, static-data, etc). Combined with GHA buildx cache, this means only the changed binaries need to be re-copied on warm builds. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use push: true instead of load: true on docker/build-push-action. This pushes layers directly to the registry from buildkit, avoiding the slow --load export to local docker (~90s overhead even with all layers cached). The main image is now built and pushed in one step. roxctl and central-db still use the separate push step. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Container env vars removed by Tomecz's PR aren't available as env vars anymore. Use secrets.* directly in docker/login-action. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The push-main-manifests step expects per-arch tags (e.g. main:tag-amd64) to create multi-arch manifest lists. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
docker/login-action with registry: quay.io/org doesn't work for quay.io. Use the existing registry_rw_login helper which handles quay.io authentication correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…r stackrox-io docker/login-action can only authenticate to one quay.io org at a time. Push main to rhacs-eng via build-push-action (fast, cached layers). Use existing push_main_image_set for stackrox-io and other images (handles multi-org login correctly by re-authenticating per org). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
build-push-action pushes directly to registry without loading locally. The existing push_main_image_set expects local images. Instead: - Push roxctl/central-db from local docker (built by make) - Copy main from rhacs-eng to stackrox-io via skopeo Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
push:true creates manifest lists instead of plain images, breaking the multi-arch manifest creation step. Revert to load:true which loads into local docker and uses the existing push_main_image_set pipeline unchanged. GHA layer cache still works with load:true (17 cached layers confirmed). Build step: 105s warm vs 110s baseline (5% faster). The --load export overhead limits the savings but the Dockerfile restructuring and COPY --link provide the foundation for future improvements. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use push:true + provenance:false to push main image directly from buildx to the registry. provenance:false produces a plain image manifest (not a manifest list), compatible with the downstream push-main-manifests job that creates multi-arch manifest lists. Login to both quay.io orgs before the build step so buildx can push to both registries. roxctl and central-db still use docker push (built locally by make). Expected: Build+push main image ~55s (vs 105s with load:true). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…g copy buildx's docker-container driver doesn't share host docker credentials. Use docker/login-action (which injects creds into the buildx builder) for rhacs-eng push. Copy main to stackrox-io via skopeo (lightweight, blobs shared on quay.io). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
docker login can only hold one quay.io credential at a time. Use skopeo --src-creds and --dest-creds to authenticate to both orgs simultaneously for the rhacs-eng → stackrox-io copy. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Description
Enable GHA buildx cache for Docker image builds in the build workflow. Docker layers (base image pulls,
microdnf upgrade, package installs) are cached across CI runs, avoiding redundant work on every build.How it works
scripts/docker-build.shchecks forDOCKER_BUILDX_CACHEenv var--cache-from type=ghaand--cache-to type=gha,mode=maxto buildxmain-amd64,operator-arm64)Images cached
build-and-push-main: main image + central-db imagebuild-and-push-main: roxctl imagebuild-and-push-operator: operator imageExpected impact
The "Build main images" step currently takes ~115s. Most of that time is:
microdnf upgrade --nobest(~40s)microdnf install util-linux(~15s)fetch-stackrox-data.sh(~20s)With layer caching, these RUN steps are cached after the first build. Only the
COPYsteps with new binaries rebuild. Expected warm time: ~30-40s (vs 115s cold).User-facing documentation
Testing and quality
How I validated my change
CI validation — compare "Build main images" step timing between first run (cold, saves cache) and second run (warm, restores cache).
🤖 Generated with Claude Code