fix patch for torchtext to use system libraries, to fix linking of RE2 in both PyTorch-bundle and torchtext#23823
Conversation
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
@Flamefire What's up with the failing test reports? |
|
@Flamefire Can you clarify why these changes are needed? Seems linked to the change made in: |
0380f0b to
61ea463
Compare
52431c2 to
2aa20cb
Compare
I think it's partly motivated by the linking problems I have been seeing with |
|
Ah thanks for the issue link, added this and description to the PR description above. Failures are partially fixed by the recent patch update (wrong C++ version used for Sentencepiece) while others are accuracy issues in the tests. |
Can you submit a test report for PyTorch-bundle-2.1.2-foss-2023a-CUDA-12.1.1.eb to confirm the fix from your end? |
|
Have a build running, those nodes aren't setup yet to send test reports. Will comment here anyway. |
|
Started another build on a node with test report upload enabled. |
|
Test report by @Flamefire |
|
The first build failed with what I think is an unrelated error which I've raised elsewhere, relating to |
|
@boegelbot please test @ jsc-zen3-a100 |
|
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 3275073540 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
Test report by @verdurin |
Still failing #23823 (comment) Although I can't see why in all the warnings. Manually rebuilding now |
|
Test report by @Flamefire Failure due to H100 not supporting CC 8.6 while CUDA 11.7 doesn't support CC 9.0 |
|
Test report by @boegel |
|
@boegel My error seems to be related to CUDA capabilities. Looks like we need to set them for the torch-extensions. Yours: "No space left on device" |
|
I think my failure is because I am using MIG slices on this node - will go back to non-MIG mode and re-try. @Flamefire any ideas for that |
|
I found that we need to pass CUDA compute capabilities or it will use (unsuitable) defaults. Testing a fix right now.
It seems I'm not seeing this on my side, so not sure what even causes this. What is the LD_PRELOAD value/lib you add? |
|
Test report by @verdurin |
|
Test report by @Flamefire |
|
Test report by @Flamefire Failures in tests, likely existed before so I'd ignore them |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Going in, thanks @Flamefire! |
(created using
eb --new-pr)torchtextlinks against our RE2 library which links against Abseil.Since #22805 they are static libraries so dependents of RE2 need to link against them too.
Our patch using system libraries did:
However torchtext still links against
re2 sentencepiecewhich causes linker flags-lre2 -lsentencepiecei.e. the above statements had no effect.Fix was to link against the target
re2::re2which can be easily done by introducing an ALIAS. The same (but in the other direction) is done in the RE2 sources which used to beadd_subdirectoryed.And this target has the correct dependencies set.
Similar sentencepiece is missing a target causing the link against the library directly (w/o targets). But the installation has no target and turns out to require C++17 for
std::string_view. So add a target for that and set the C++ standardFixes #23762