Skip to content

Exclude “default” build-args from image history #31584

Merged
vdemeester merged 1 commit intomoby:masterfrom
dave-tucker:ignore-default-build-args
Mar 22, 2017
Merged

Exclude “default” build-args from image history #31584
vdemeester merged 1 commit intomoby:masterfrom
dave-tucker:ignore-default-build-args

Conversation

@dave-tucker
Copy link
Contributor

@dave-tucker dave-tucker commented Mar 6, 2017

- What I did

Based on the discussion in #30588 (comment). This PR removes any of the "default" build args from the image history.
This is due to these being considered transparent - i.e the result of an image shouldn't differ based on the HTTP_PROXY variable.

- How I did it

Edited the builder code to make sure that the any default variables are filtered from the saveCmd unless there is a corresponding ARG statement`

- How to verify it

Use the force tests Luke...

Or alternatively.

Dockerfile:

FROM busybox
ARG http_proxy
RUN echo "Test build arg filterage"

Then

$ docker build -t ba-test --build-arg="HTTP_PROXY=http://user:[email protected]" --build-arg="http_proxy=http://someproxy.example.com" .
$ docker image history ba-test

The HTTP_PROXY variable will be absent from the history but http_proxy will be present.

- Description for the changelog

The values of default build time arguments (e.g HTTP_PROXY) are no longer displayed in docker image history unless a corresponding ARG instruction is written in the Dockerfile.

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah
Copy link
Member

Thanks! ping @tonistiigi ptal; IIRC you mentioned that the build arg should be included if it's explicitly defined in the Dockerfile, but perhaps I misunderstood

@tonistiigi
Copy link
Member

Yes, I think adding only explicit ones is slightly better than skipping the builtin ones.

@dave-tucker dave-tucker force-pushed the ignore-default-build-args branch from 713e45f to d829e47 Compare March 7, 2017 09:50
@dave-tucker
Copy link
Contributor Author

@thaJeztah @tonistiigi updated to only remove from history if is a builtin arg and a corresponding ARG instruction is not present in the Dockerfile.

@duglin
Copy link
Contributor

duglin commented Mar 7, 2017

If I'm following this correctly, I disagree with this PR. In particular this:

This is due to these being considered transparent - i.e the result of an image shouldn't differ based on the HTTP_PROXY variable.

A changed HTTP_PROXY MUST change the image being built. We have no idea what RUN cmd will actually make use of this variable and in what way. For all we know the RUN cmd will echo the entire list of env vars into a file and save that in the image. I would hope a changed HTTP_PROXY would cause a new image to be built.

@cpuguy83
Copy link
Member

cpuguy83 commented Mar 9, 2017

@duglin Honestly I think this is fairly surprising behavior. The cache is there to help the user get faster builds. If the user wants to break the cache, they certainly can do so manually, meanwhile it's not possible to not break the build cache if the user doesn't want to break it.

@thaJeztah
Copy link
Member

We're discussing this again in the maintainers meeting, and we think that busting the cache when switching to a different proxy is really a corner case. If someone wants it in the history, they need to define it in the Dockerfile. If they need to explicitly bust the cache, --no-cache is probably the best approach.

@estesp
Copy link
Contributor

estesp commented Mar 9, 2017

Not to mention that if we started to itemize all the ways environmental variations could affect an image (example: switching networking config on the system hosting the daemon and getting a different DNS server without a recent (hours old) update to a CNAME/A record for a site used to download code), we would be able to never satisfy ourselves that cacheing was even a reasonable commitment based on the information we do know.

@alexellis
Copy link
Contributor

In the meeting we also said that this should cover defaults such as HTTP_PROXY, NO_PROXY, FTP_PROXY and HTTPS_PROXY. Any new build-args in the future should be excluded from this behavior.

@duglin
Copy link
Contributor

duglin commented Mar 9, 2017

My comments were not about allowing people to bust the cache, it was about making sure that when the cache is used its because we know for sure that its because we have the exact same inputs. Changing env vars available to a RUN are supposed to bust the cache.

@tiborvass
Copy link
Contributor

@dave-tucker can you please add a comment where the list of default build args are defined, so that if we ever add more default build args, it is understood that this new behavior is applied to that whole list. And if that is not desired for some newly added args, then there will have to be two different lists: "default" one and the "exclude_from_history" one. Today there is no reason to make them separate.

@tiborvass
Copy link
Contributor

Maybe @cyli or @diogomonica could have a look at this too ?

@duglin
Copy link
Contributor

duglin commented Mar 9, 2017

To be clear.. if I have:

RUN doit.sh

and in doit.sh I have:

if [[ -n "$HTTP_PROXY" ]; then
   app1
else
   app2
fi

Then with this PR, building once with HTTP_PROXY set and once w/o will yield the exact same results because the cache will be used in the 2nd run even though it should not have been.

@alexellis
Copy link
Contributor

@duglin we discussed using --no-cache for this. As a long-term proxy user - this would mean changing from the office network to hotel WiFi would invalidate all my images' cache if I did a rebuild.

@duglin
Copy link
Contributor

duglin commented Mar 9, 2017

I feel like we're talking about different things here. How does --no-cache help someone - aside from basically having them start from scratch?

Let's back up... first, in order for HTTP_PROXY to be used as a build-arg it needs to be specified on the docker build command line. This means the user is explicitly aware of it. Now, I realize that with this PR we're making it implicit but to me a config file property is just a short-cut so that people don't need to type something over and over - it shouldn't change the semantics of how things work.

So, would people really expect (ignoring any config file):

docker build --build-arg HTTP_PROXY=myproxy.com .

and

docker build .

to yield the same results if they used my "doit.sh" Dockerfile example above? I would claim no.

Asking the user to know how an image was previously built so that they could add --no-cache when the issue is related to an env var the Docker itself knows about and has total control over its value in the container seems more than a little odd to me.

@estesp
Copy link
Contributor

estesp commented Mar 9, 2017

@duglin Why would anyone use your doit.sh example..it is quite remarkably a stupid thing to do that no one has any example of anyone doing :)

This is exactly what people are expecting to get the same results; the months spent arguing over even giving users http_proxy was an understanding that for some people to have "normal" internet access, this is a required feature. What has come up now is that many corporations have usernames and passwords embedded in this variable which are now stored in their (possibly public) images on DockerHub as a historical build layer. That is extremely unnecessary for a feature that is merely giving them access to the external internet and was never designed and should never be used to make build-time decisions. It's like saying I might decide to do something different in my Dockerfile because the TZ setting is different. The principle of least surprise is trying to work in our favor here. Some people like to come up with far-fetched reasons to surprise people with behavior, but I'm guessing that kind of Dockerfile is not exactly going to be a viral hit :)

@duglin
Copy link
Contributor

duglin commented Mar 9, 2017

I'm not suggesting that we can or should try to take all aspects of a container into account. I've used this:

RUN date > build.timestamp

as an example of something people shouldn't do in their Dockerfiles if they really want "build.timestamp" to be the last time they ran the build because caching will muck with that logic.

And in cases like that, the limitation is easy to understand... Docker doesn't know what's going on in the cmd string so it can't make an educated guess as to whether or not to use the cache (at least for the cmd string part of it).

This case is different, as I mentioned above. This is about 1) an env var that the user is explicitly mentioning and 2) an env var the Docker is setting in the container, which means it knows whether its been changed.

I've heard people complain about how we have 2 types of env vars during builds:
1 - known, saved in the image, part of cache check
2 - known, not saved in the image, part of cache check
I'm sure they'll be thrilled to know that now we're adding a third:
3 - known, not saved in the image and not part of the cache checking
:-)

@thaJeztah
Copy link
Member

The proposal is that if the HTTP_PROXY is an essential part of the image (i.e., the doit.sh example), that people should define it in the image;

ARG HTTP_PROXY
COPY doit.sh /
RUN /doit.sh

If it's defined in the Dockerfile, it's an explicit choice of the image-author, and is not ignored in the cache.

There probably are edge cases where a different proxy should produce a different result, but my expectation is that 99.9% of the cases, a proxy is just needed to give people internet access. Producing a different image because a proxy was used for those case, would be the same as invalidating the cache if the IP-address changes (/different network), or the DNS server changed (which would far more likely being a differentiator when building)

@dave-tucker
Copy link
Contributor Author

@tiborvass the list of build args used comes from https://github.com/docker/docker/blob/master/builder/dockerfile/builder.go#L40. This is the only place these are defined in the code so this approach should be still be ok if new args are added.

@thaJeztah
Copy link
Member

@dave-tucker we were discussing, and thought it would be good to put a comment there so that in case new "default" variables are added in future, we should review if those should be stripped from history as well, or if we should "split" the list at that point

@dave-tucker dave-tucker force-pushed the ignore-default-build-args branch from d829e47 to 71b979f Compare March 13, 2017 16:16
@dave-tucker
Copy link
Contributor Author

@thaJeztah @tiborvass added some comments to the code!

@tiborvass
Copy link
Contributor

LGTM

@dave-tucker dave-tucker force-pushed the ignore-default-build-args branch from 4f9794f to c20aa7f Compare March 17, 2017 14:55
@dave-tucker
Copy link
Contributor Author

Addressed @thaJeztah's comments.
@alexellis' comment re: behaviour/behavior still stands. IMO there is only one correct way 🇬🇧 :trollface:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually un-knowable? Things like "may be" inject uncertainty "Will they be visible? Won't they? How to know?" Can we explain when they would or wouldn't? Maybe we can just link to the section below? And instead of "may be" change it to "are visible if you specify them as ARG lines in the Dockerfile?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use US spelling of "behavior". :) Also prefer active voice "To override this behavior, provide an explicit...."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same content as above? Why not have the content one place and then provide a link at the other place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh, missed this one; s/exluded/excluded/

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dave-tucker
Copy link
Contributor Author

Updated docs to address comments from @mstanleyjones and @thaJeztah

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some suggestions, but I'm ok with addressing them in a follow up 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/lon..example/lon.example/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not refer to "both" docker files here, because the other Dockerfile is now move after the next paragraph. Perhaps just refer to the first one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this work?

By default, these pre-defined variables are excluded from the output of
`docker history`. Excluding them reduces the risk of accidentally leaking
sensitive authentication information in an `HTTP_PROXY` variable.

For example, consider building the following Dockerfile using
`--build-arg HTTP_PROXY=http://user:[email protected]`

``` Dockerfile
FROM ubuntu
RUN echo "Hello World"
```

In this case, the value of the `HTTP_PROXY` variable is not available in the
`docker history` and is not cached. If you were to change location, and your
proxy server changed to `http://user:[email protected]`, a subsequent
build does not result in a cache miss.

If you need to override this behaviour then you may do so by adding an `ARG`
statement in the Dockerfile as follows;

``` Dockerfile
FROM ubuntu
ARG HTTP_PROXY
RUN echo "Hello World"
```

When building this Dockerfile, the `HTTP_PROXY` is preserved in the
`docker history`, and changing its value invalidates the build cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this sounds much better.

Removes the build-args from the image history if they are in the
BuiltinAllowedBuildArgs map unless they are explicitly defined in an ARG
instruction.

Signed-off-by: Dave Tucker <[email protected]>
@dave-tucker dave-tucker force-pushed the ignore-default-build-args branch from 057d190 to 89a2a88 Compare March 21, 2017 16:37
@dave-tucker
Copy link
Contributor Author

@mstanleyjones @thaJeztah thanks for the suggestions! Updated once again...

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.