add missing extensions in TensorFlow 2.0.0 easyconfigs (+ update to tensorboard/tensorflow-estimator 2.0.1)#9338
Conversation
ee69679 to
e968939
Compare
|
@boegel I'm using |
|
@Flamefire Hmm, Can you share the output of |
|
@Flamefire We'll have to pick between #9308 (which is now require to go imho) or this one... |
Sure:
I guess both are (at least IMO). Although I'm a bit confused, that |
|
@Flamefire W.r.t. W.r.t. both PRs: this one doesn't have a successful test report yet, it uses a patch for So, my vote currently goes to #9308, but of course I'm a bit biased. ;) If you're OK with going forward with #9308, do you mind submitting a test report for it? |
I tried but EB failed me :(
Sure. That was my initial comment, but as ML guys usually want the latest and greatest...
Well it actually follows a defined workflow to order the packages which can be verified automatically. I consider this a plus
Both are TF 2.0 so I figured it makes sense to have both ECs be (almost) the same.
I can for both if you can help fix the EB issue I'm experiencing. I'd consider this a (serious) bug if it tests something else then it should. But this might probably sometime next week. |
That's true in general, but in the case of
OK, but then we should add the patch where it's missing, not drop it (it's there for a reason, see #6089
You're right that it seems fishy, but it's not clear to me how you deduced that it's using the wrong easyconfig file? The upload log in the test report is not a debug log, so it's hard to tell what went wrong. Maybe try again with Also, try this: |
Ok, then I can revert that.
Quoting you from #9254 (comment): I don't mind readding this but to me it seemed to not be required anymore.
Because it is using a patch file from my repo: BTW: I can no longer test ECs with EB 4.0.1 because the Java EC uses a new version format the old EB cannot handle. Does EB use semantic versioning? If so, the next version should be 5.x |
I was under the impression then that the existing patch didn't work anymore for TensorFlow 2.0.0, but it does work, so it's a no-brainer to keep using it...
That patch file is not touched in this PR, only files that are touched in this PR are downloaded from the PR.
You'll have to clarify this... Which |
I missed that, thanks.
The error message is caused by which has the dependency version a dict, instead of a string. It seems support for this is in current EB, but EB 4.0.1 does not know how to handle this: |
You shouldn't use easyconfigs from the Support for specifying dependency versions as a |
This sounds strange. How am I supposed to test the new ECs if I have to install (how?) the latest dev of EB too? Shouldn't (unless a new major version of the EC repo is announced) the new ECs work with the current EB?
Well while the next EB can read current and next ECs the current EB can only read current ECs. IMO this is a non-backwards-compatible change. At least for the ECs. But well... Is there some guide on how to track new EB and EasyBlocks when using the new ECs? Something easy? |
In this particular case, it should work just fine if you use the latest EasyBuild v4.0.1 release + In general though, PRs should be tested using the
The changes in EasyBuild releases are not forward-compatible though: easyconfigs in
Using the There's some scripts available to facilitate this, see https://easybuild.readthedocs.io/en/latest/Installation_Alternative.html#installation-of-latest-development-version . |
It wouldn't because I need all the GCCCore 8.3.0 dependencies that do not exist in current ECs. Or would they be downloaded too? |
e968939 to
f1d7ef0
Compare
f1d7ef0 to
135c4cc
Compare
ba32c8b to
9b0206b
Compare
|
Test report by @Flamefire |
Good question... Only stuff touched in the PR is pulled in when using So it boils down to: testing a PR with |
|
@Flamefire Now that we have successful test reports both here and in #9308, we'll have to pick what we go forward with. For the reasons mentioned in #9338 (comment), my preference still goes to #9308 . Is there anything included here that is not included in #9308 that is crucial? |
|
Have you verified that the TensorFlow-1.14.0_fix-cuda-mpi.patch is not needed for 2.0? |
It is required. You need to update the repo at |
I'll double check yours again, but this seems more complete. Yours e.g. is not installing |
|
|
This will add quite some more modules and you'll probably arrive at the same as mine here ;-) Re:
I added the lrt patch and removed the astor patch by downgrading setuptools. So IMO this is complete now. Only thing is the shuffeling but that is due to using a tool to determine the order |
Actually, I'd prefer not updating to |
|
You mean "I'd prefer not updating to tensorboard 2.0.1," ?? |
Yes, sorry.
I'm not touching that in #9308 @Flamefire I don't want to let your efforts go to waste, let me be clear on that. |
|
No but this PR changes to 2.0.1 for both tensorboard and tensorflow-estimator |
|
@Flamefire What I meant with my question about fix-cuda-mpi.patch is that it is not included in this PR (The fix-cuda-build.patch I had has been removed now) |
with
and
IMO the currently released TF 2.0 EC is faulty and must be reinstalled due to missing dependencies. The ECS here were derived by a programmatic approach determining the current dependencies as installed by pip and their dependencies sorted by a deterministic order. So while I agree to not tweak versions in existing ECs IMO it is reasonable to do here especially to keep both TF 2.0 ECS in sync (because the existing EC should not be used). To be clear: This is not updating an existing EC to tweak versions, but a structured approach to fix a broken EC by relying on officially available dependencies (as reported by pip)
IMO it should be updated according to the release notes: https://github.com/tensorflow/estimator/releases
Sorry?: https://github.com/easybuilders/easybuild-easyconfigs/pull/9338/files#diff-966a97db0e97e986124e9b34a2198830R124 Or do you mean for the foss? There it is not required because no CUDA |
|
@Flamefire What I meant is that there is a TensorFlow-1.14.0_fix-cuda-mpi.patch that is no longer used in the TF 2.0 ECs. I was wondering if you had verified that it is no longer needed. |
|
That release note for tensorflow-estimator makes it very clear that we need to update it. So I fully agree on that. |
|
@Flamefire Thanks a lot for the clarification, the And with the cleanups you did, this PR is in better shape than #9308, so let's go forward with this instead.. W.r.t. https://gist.github.com/boegel/fd9a636d652aa5c8e57778088e9c0a21 and your enhanced version https://gist.github.com/Flamefire/49426e502cd8983757bd01a08a10ae0d, I think we should look into integrating that in |
Ah, I had the wrong one. Yes that patch is in upstream now: https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/mpi_collectives/kernels/ring.cu.cc#L23 |
|
Regarding the deep check thingie, should that be a standalone thing to be used before creating a new PR too? |
|
Test report by @boegel edit: ignore this failure, outdated patch... |
|
Test report by @boegel |
|
@akesandgren The The feature is about automatically constructing |
|
Test report by @boegel edit: failed due to missing extensions from #9329, fixing... |
|
Test report by @akesandgren |
|
Test report by @boegel |
|
Test report by @boegel |
|
Going in, thanks @Flamefire! |
|
Test report by @boegel |
(created using
eb --new-pr)