Exclude “default” build-args from image history #31584
Exclude “default” build-args from image history #31584vdemeester merged 1 commit intomoby:masterfrom
Conversation
|
Thanks! ping @tonistiigi ptal; IIRC you mentioned that the build arg should be included if it's explicitly defined in the Dockerfile, but perhaps I misunderstood |
|
Yes, I think adding only explicit ones is slightly better than skipping the builtin ones. |
713e45f to
d829e47
Compare
|
@thaJeztah @tonistiigi updated to only remove from history if is a builtin arg and a corresponding |
|
If I'm following this correctly, I disagree with this PR. In particular this:
A changed HTTP_PROXY MUST change the image being built. We have no idea what RUN cmd will actually make use of this variable and in what way. For all we know the RUN cmd will echo the entire list of env vars into a file and save that in the image. I would hope a changed HTTP_PROXY would cause a new image to be built. |
|
@duglin Honestly I think this is fairly surprising behavior. The cache is there to help the user get faster builds. If the user wants to break the cache, they certainly can do so manually, meanwhile it's not possible to not break the build cache if the user doesn't want to break it. |
|
We're discussing this again in the maintainers meeting, and we think that busting the cache when switching to a different proxy is really a corner case. If someone wants it in the history, they need to define it in the |
|
Not to mention that if we started to itemize all the ways environmental variations could affect an image (example: switching networking config on the system hosting the daemon and getting a different DNS server without a recent (hours old) update to a CNAME/A record for a site used to download code), we would be able to never satisfy ourselves that cacheing was even a reasonable commitment based on the information we do know. |
|
In the meeting we also said that this should cover defaults such as HTTP_PROXY, NO_PROXY, FTP_PROXY and HTTPS_PROXY. Any new build-args in the future should be excluded from this behavior. |
|
My comments were not about allowing people to bust the cache, it was about making sure that when the cache is used its because we know for sure that its because we have the exact same inputs. Changing env vars available to a RUN are supposed to bust the cache. |
|
@dave-tucker can you please add a comment where the list of default build args are defined, so that if we ever add more default build args, it is understood that this new behavior is applied to that whole list. And if that is not desired for some newly added args, then there will have to be two different lists: "default" one and the "exclude_from_history" one. Today there is no reason to make them separate. |
|
Maybe @cyli or @diogomonica could have a look at this too ? |
|
To be clear.. if I have: and in doit.sh I have: Then with this PR, building once with HTTP_PROXY set and once w/o will yield the exact same results because the cache will be used in the 2nd run even though it should not have been. |
|
@duglin we discussed using |
|
I feel like we're talking about different things here. How does Let's back up... first, in order for HTTP_PROXY to be used as a build-arg it needs to be specified on the So, would people really expect (ignoring any config file): and to yield the same results if they used my "doit.sh" Dockerfile example above? I would claim no. Asking the user to know how an image was previously built so that they could add |
|
@duglin Why would anyone use your This is exactly what people are expecting to get the same results; the months spent arguing over even giving users |
|
I'm not suggesting that we can or should try to take all aspects of a container into account. I've used this: as an example of something people shouldn't do in their Dockerfiles if they really want "build.timestamp" to be the last time they ran the build because caching will muck with that logic. And in cases like that, the limitation is easy to understand... Docker doesn't know what's going on in the cmd string so it can't make an educated guess as to whether or not to use the cache (at least for the cmd string part of it). This case is different, as I mentioned above. This is about 1) an env var that the user is explicitly mentioning and 2) an env var the Docker is setting in the container, which means it knows whether its been changed. I've heard people complain about how we have 2 types of env vars during builds: |
|
The proposal is that if the ARG HTTP_PROXY
COPY doit.sh /
RUN /doit.shIf it's defined in the Dockerfile, it's an explicit choice of the image-author, and is not ignored in the cache. There probably are edge cases where a different proxy should produce a different result, but my expectation is that 99.9% of the cases, a proxy is just needed to give people internet access. Producing a different image because a proxy was used for those case, would be the same as invalidating the cache if the IP-address changes (/different network), or the DNS server changed (which would far more likely being a differentiator when building) |
|
@tiborvass the list of build args used comes from https://github.com/docker/docker/blob/master/builder/dockerfile/builder.go#L40. This is the only place these are defined in the code so this approach should be still be ok if new args are added. |
|
@dave-tucker we were discussing, and thought it would be good to put a comment there so that in case new "default" variables are added in future, we should review if those should be stripped from history as well, or if we should "split" the list at that point |
d829e47 to
71b979f
Compare
|
@thaJeztah @tiborvass added some comments to the code! |
|
LGTM |
4f9794f to
c20aa7f
Compare
|
Addressed @thaJeztah's comments. |
docs/reference/builder.md
Outdated
There was a problem hiding this comment.
Is this actually un-knowable? Things like "may be" inject uncertainty "Will they be visible? Won't they? How to know?" Can we explain when they would or wouldn't? Maybe we can just link to the section below? And instead of "may be" change it to "are visible if you specify them as ARG lines in the Dockerfile?
docs/reference/builder.md
Outdated
There was a problem hiding this comment.
Please use US spelling of "behavior". :) Also prefer active voice "To override this behavior, provide an explicit...."
docs/reference/builder.md
Outdated
There was a problem hiding this comment.
Isn't this the same content as above? Why not have the content one place and then provide a link at the other place?
docs/reference/builder.md
Outdated
There was a problem hiding this comment.
argh, missed this one; s/exluded/excluded/
c20aa7f to
057d190
Compare
|
Updated docs to address comments from @mstanleyjones and @thaJeztah |
thaJeztah
left a comment
There was a problem hiding this comment.
Have some suggestions, but I'm ok with addressing them in a follow up 😅
docs/reference/builder.md
Outdated
docs/reference/builder.md
Outdated
There was a problem hiding this comment.
I'd not refer to "both" docker files here, because the other Dockerfile is now move after the next paragraph. Perhaps just refer to the first one.
There was a problem hiding this comment.
Would something like this work?
By default, these pre-defined variables are excluded from the output of
`docker history`. Excluding them reduces the risk of accidentally leaking
sensitive authentication information in an `HTTP_PROXY` variable.
For example, consider building the following Dockerfile using
`--build-arg HTTP_PROXY=http://user:[email protected]`
``` Dockerfile
FROM ubuntu
RUN echo "Hello World"
```
In this case, the value of the `HTTP_PROXY` variable is not available in the
`docker history` and is not cached. If you were to change location, and your
proxy server changed to `http://user:[email protected]`, a subsequent
build does not result in a cache miss.
If you need to override this behaviour then you may do so by adding an `ARG`
statement in the Dockerfile as follows;
``` Dockerfile
FROM ubuntu
ARG HTTP_PROXY
RUN echo "Hello World"
```
When building this Dockerfile, the `HTTP_PROXY` is preserved in the
`docker history`, and changing its value invalidates the build cache.
There was a problem hiding this comment.
+1 this sounds much better.
Removes the build-args from the image history if they are in the BuiltinAllowedBuildArgs map unless they are explicitly defined in an ARG instruction. Signed-off-by: Dave Tucker <[email protected]>
057d190 to
89a2a88
Compare
|
@mstanleyjones @thaJeztah thanks for the suggestions! Updated once again... |
- What I did
Based on the discussion in #30588 (comment). This PR removes any of the "default" build args from the image history.
This is due to these being considered transparent - i.e the result of an image shouldn't differ based on the
HTTP_PROXYvariable.- How I did it
Edited the builder code to make sure that the any
defaultvariables are filtered from thesaveCmdunless there is a corresponding ARG statement`- How to verify it
Use the
forcetests Luke...Or alternatively.
Dockerfile:
Then
The
HTTP_PROXYvariable will be absent from the history buthttp_proxywill be present.- Description for the changelog
The values of default build time arguments (e.g
HTTP_PROXY) are no longer displayed indocker image historyunless a correspondingARGinstruction is written in the Dockerfile.- A picture of a cute animal (not mandatory but encouraged)