Replace nvgpu with nvidia-ml-py#2160
Replace nvgpu with nvidia-ml-py#2160matthewfeickert wants to merge 2 commits intoPlasmaControl:masterfrom
Conversation
3671f39 to
f63fb54
Compare
Memory benchmark result| Test Name | %Δ | Master (MB) | PR (MB) | Δ (MB) | Time PR (s) | Time Master (s) |
| -------------------------------------- | ------------ | ------------------ | ------------------ | ------------ | ------------------ | ------------------ |
test_objective_jac_w7x | 3.39 % | 4.139e+03 | 4.279e+03 | 140.50 | 41.78 | 38.96 |
test_proximal_jac_w7x_with_eq_update | 0.12 % | 6.615e+03 | 6.623e+03 | 7.99 | 165.34 | 161.01 |
test_proximal_freeb_jac | 0.33 % | 1.337e+04 | 1.341e+04 | 44.06 | 87.76 | 88.57 |
test_proximal_freeb_jac_blocked | -0.19 % | 7.792e+03 | 7.777e+03 | -14.85 | 78.93 | 80.35 |
test_proximal_freeb_jac_batched | 0.89 % | 7.675e+03 | 7.743e+03 | 68.34 | 77.82 | 78.32 |
test_proximal_jac_ripple | 0.41 % | 3.602e+03 | 3.616e+03 | 14.74 | 65.10 | 62.70 |
test_proximal_jac_ripple_bounce1d | 2.63 % | 3.796e+03 | 3.896e+03 | 100.02 | 77.56 | 75.24 |
test_eq_solve | -0.03 % | 2.202e+03 | 2.201e+03 | -0.69 | 98.78 | 100.58 |For the memory plots, go to the summary of |
f63fb54 to
11c073b
Compare
|
This is ready for review, but needs a maintainer to approve the CI runs. Let me know if you have any questions. 👍 |
aba1019 to
37170df
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2160 +/- ##
==========================================
- Coverage 94.45% 94.42% -0.04%
==========================================
Files 101 101
Lines 28604 28617 +13
==========================================
+ Hits 27018 27021 +3
- Misses 1586 1596 +10
🚀 New features to boost your workflow:
|
Given that the changes in this PR should be covered by DESC/tests/benchmarks/memory_funcs.py Line 21 in 2471d55 I assume the lack of coverage reported is that the benchmarks haven't finished running / need additional approval to run. |
Benchmarks don't increase coverage, and our CI cannot run GPU things, so this will just have 0 coverage which is fine with the devs |
Great. Thanks for the very fast follow up! |
There was a problem hiding this comment.
Thanks! It mostly looks good, but I think pynvml issue has to be documented.
@dpanici @ddudt can you also test this on a cluster? I tested on my laptop GPU and on Della. It works. However, on Della login node, you cannot run python; from desc import set_device; set_device("gpu");, it says the drivers are not there. I am not sure if we could run this on the login node before, though.
* As nvgpu is no longer maintained and uses pynvml, which directly tells the user
to use nvidia-ml-py instead at import
"The pynvml package is deprecated. Please install nvidia-ml-py instead. If you
did not install pynvml directly, please report this to the maintainers of the
package that installed pynvml for you."
drop nvgpu and replace its nvgpu.gpu_info() call with a single function using
nvidia-ml-py (which uses the pynvml namespace).
* Place a lower bound on nvidia-ml-py of 12.535.77, which was the first release
to support nvmlMemory_v2 which properly accounts for system-reserved memory.
* Remove all mentions of nvgpu in other areas of the codebase and replace them
with nvidia-ml-py, except for publications/ as this is historical information.
- Do NOT add nvidia-ml-py to dependabot.yml as pinning this tightly is an
anti-pattern in library design that will cause installation issues,
especially with NVIDIA libraries.
* Note that nvgpu has been replaced with nvidia-ml-py.
37170df to
89fd614
Compare
YigitElma
left a comment
There was a problem hiding this comment.
It looks good to me. I have tested on my personal laptop and Della.
We should wait for one more developer to test it.
Resolves #2159
nvgpuis no longer maintained and usespynvml, which directly tells the user to usenvidia-ml-pyinstead at importdrop
nvgpuand replace itsnvgpu.gpu_info()call with a single function usingnvidia-ml-py(which uses thepynvmlnamespace).nvidia-ml-pyof12.535.77, which was the first release to supportnvmlMemory_v2which properly accounts for system-reserved memory.nvgpuin other areas of the codebase and replace them withnvidia-ml-py, except forpublications/as this is historical information.nvidia-ml-pytodependabot.ymlas pinning this tightly will cause installation issues, especially with NVIDIA libraries.Example:
main(2471d55)As I made it a guarded import I can't directly import it from
desc, but is the same codeso
So the memory consumption is effectively the same (good), and an unmaintained dependency can be replaced with a maintained one.