Hi @afrimberger, cc: @trizzi,
Thanks for the detailed feedback!
For UC2 (Docker Desktop error):
@afrimberger, can you share what version of Docker and containerd you're using?
This looks similar to a known issue where containerd v2.2.0+ (Docker 29.1.x) attempts to access the OCI Referrers API using paths like */referrers/* which Gitlab doesn't recognize. This could also explain why it works with crane but not docker.
As a temporary workaround, you could downgrade to Docker 29.0.x, but we recognize that's not ideal. A fix is on the way for containerd and we've also addressed this on GitLab's side.
For UC1 (Kubernetes image pull failure):
You might be right that this is due to the floating tag causing the system to return stale manifests. I recognize that this is a blocker for your use case, so I'll prioritize investigating this.
To help us narrow it down, would you be able to test the following scenarios:
Direct upstream pull: Does pulling the image directly (without the virtual registry) work consistently in your Kubernetes cluster?
Pinned tag via virtual registry: Does pulling a pinned tag (e.g., a specific image SHA) through the virtual registry work? Does the issue only occur with floating tags like main-latest?
Thanks!
Status: Merged and completed
Merged on March 5, this made it to %18.10
%18.10 continues to move at a strong pace! We've closed 5 out of our 12 Deliverable issues, with the remaining issues all on track. We're progressing well toward closing our 3 epics:
These enable us to release the Container Virtual Registry fully in Beta with the following configs:
A recap on our Deliverable issues with the respective statuses:
Thank you and feel free to reach out for any questions/comments/feedback
cc: @crystalpoole @jaime @trizzi @bonnie-tsang @radbatnag @rchanila @iamphill @pskorupa @fmccawley
Minor (non-blocking): I think it's more common to add the full URL that can easily be copy pasted to the browser's address bar. WDYT?
# TODO: Remove this logging in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/195939 when the error path is handled gracefullyThanks @radbatnag! Looks good
Hi @mkhalifa3, requesting for your backend maintainer review please, if you have capacity, thanks!
When a push fails, two log entries appear for the same request, but they don't obviously connect:
An INFO from the IP restriction enforcer:
message: "Attempting to access IP restricted group"
caller_id: "JwtController#auth"
correlation_id: "9cf7084d218353f7-ATL"
ip_address: "34.148.163.108"
allowed: false
An exception/error log from the unhandled ArgumentError:
exception.class: "ArgumentError"
exception.message: "Unauthorized"
correlation_id: "9cf7084d218353f7-ATL"
controller: "JwtController"
action: "auth"
What's missing: there's nothing that explicitly says "the push protection rule check failed, and here's why." You can infer a connection from the shared correlation ID, but the exception log doesn't tell you what check failed or why the service said unauthorized.
A WARN log now appears between those two, also sharing the same correlation ID:
message: "Container registry push protection rule check failed"
reason: "unauthorized"
error_message: "Unauthorized"
repository_path: "coupehealth/platform-enablement/container-images/..."
project_path: "coupehealth/platform-enablement/container-images/container-terraform"
username: "some-ci-user"
correlation_id: "9cf7084d218353f7-ATL"
Verifies the IP restriction theory. Right now we suspect the 500 is always caused by IP
restrictions, based on one reproduction. With this log, we can search for all occurrences of
message: "Container registry push protection rule check failed" and check whether each one has
a corresponding "Attempting to access IP restricted group" entry with the same correlation ID.
If the match is 1:1, the theory is confirmed. If some do not have a matching IP restriction log,
there's a second cause we haven't found yet.
Distinguishes the reason. The reason field comes directly from ERROR_RESPONSE_UNAUTHORIZED
(reason: :unauthorized). If there were ever a different error path producing a 500 here (e.g. a
database timeout causing can? to blow up), the reason or message would differ, and we'd see it
in the logs. This makes the log useful beyond this specific investigation.
Bridges the two existing log entries. With the correlation ID tying all three together, a Kibana search on a given correlation ID now gives a readable narrative: IP restriction fired β push protection check called and got unauthorized β exception raised β 500. Without the new WARN, step 2 was invisible.
https://gitlab.com/gitlab-com/request-for-help/-/work_items/4244
NA
The goal is to trigger the real IP restriction code path and confirm the new WARN log appears.
1. Create a group with IP restrictions
In the GitLab UI:
1.2.3.4/32
2. Create a container registry protection rule for a project under that group
This is required to hit the protection_rule_for_push_exists? code path at all:
# rails console
project = Project.find_by_full_path('your-group/your-project')
ContainerRegistry::Protection::Rule.create!(
project: project,
repository_path_pattern: project.full_path,
minimum_access_level_for_push: :maintainer
)
3. Attempt a docker push from your machine
docker login gdk.test:5000 -u your-username -p your-token
docker build -t gdk.test:5000/your-group/your-project/test:latest .
docker push gdk.test:5000/your-group/your-project/test:latest
4. Check the Rails auth log for the new WARN entry
grep "push protection rule check failed" log/auth.log
You should see a structured entry with reason: unauthorized, repository_path, project_path,
and username β all sharing the same correlation_id as the ArgumentError exception log.
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Thanks @tigerwnz for the feedback! Replied
Thanks @trizzi!
MR is merged
self-managed will have the feature flag container_virtual_registries default enabled in the %18.10 release.
Thank you for the explanation, @tigerwnz!
I see, so using the same sequence doesn't guarantee unique values. Good point also that iid can be set without the sequence.
Yes, I also think that this is unlikely for this specific migration since we are in beta and not default-enabled for self-managed. But good to know, that we cannot assume this safely for other features that are more established. Great heads up also for Cells when it's is in place. Thank you.
In the context of this migration, if we are confident that there will be similarly low usage even for self-managed installs that have opted into using the feature, I think we are fine to proceed.
Yes, I would say so. On Gitlab.com, there were 53 rows as of last week (Teleport request pending to get the latest numbers). And on self-managed, it is also not default enabled.
@radbatnag, WDYT?