Implement CUDA backend with cuDSS solver#31
Merged
govindchari merged 23 commits intomainfrom Dec 3, 2025
Merged
Conversation
- Add nt2kktcsr and ntdiag2kktcsr mappings for direct CSR updates - Create optimized kernels to update CSR values directly - Avoid expensive update_csr_from_csc_kernel for NT blocks - Handle edge cases: m=0, Wnnz=0, valid_ntdiag_count=0
- Add d_PregtoKKTcsr, d_AttoKKTcsr, d_GttoKKTcsr mappings - Create update_csr_matrix_data_kernel for direct CSR updates - Update P, A, G matrices directly in CSR format on GPU - Maintain host CSC matrix for consistency
e75ccb6 to
21f4578
Compare
21f4578 to
fd813fb
Compare
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
Download benchmark artifactsBenchmark Summary
Runtime regressions (> 5.0%)
Runtime improvements (> 5.0%)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a CUDA backend with cuDSS for linear system solve.
All computation is done on the CPU except for the linear system factorization and solve.
Note that since the Github CI machines do not have a GPU, the CUDA backend can only be tested locally.
To use cuDSS backend, you must have cuda, cuSparse, cuBLAS, and cuDSS installed, and pass in
-DQOCO_ALGEBRA_BACKEND:str="cuda"when building QOCO via CMake.Note:
lcvx_bad_scaling_testfails.Saved for following PRs: