Skip to content

Implement CUDA backend with cuDSS solver#31

Merged
govindchari merged 23 commits intomainfrom
gc/cuda-backend
Dec 3, 2025
Merged

Implement CUDA backend with cuDSS solver#31
govindchari merged 23 commits intomainfrom
gc/cuda-backend

Conversation

@govindchari
Copy link
Copy Markdown
Member

@govindchari govindchari commented Nov 19, 2025

This PR adds a CUDA backend with cuDSS for linear system solve.

All computation is done on the CPU except for the linear system factorization and solve.

Note that since the Github CI machines do not have a GPU, the CUDA backend can only be tested locally.

To use cuDSS backend, you must have cuda, cuSparse, cuBLAS, and cuDSS installed, and pass in -DQOCO_ALGEBRA_BACKEND:str="cuda" when building QOCO via CMake.

Note: lcvx_bad_scaling_test fails.

Saved for following PRs:

  • Move all computation to the GPU

- Add nt2kktcsr and ntdiag2kktcsr mappings for direct CSR updates
- Create optimized kernels to update CSR values directly
- Avoid expensive update_csr_from_csc_kernel for NT blocks
- Handle edge cases: m=0, Wnnz=0, valid_ntdiag_count=0
- Add d_PregtoKKTcsr, d_AttoKKTcsr, d_GttoKKTcsr mappings
- Create update_csr_matrix_data_kernel for direct CSR updates
- Update P, A, G matrices directly in CSR format on GPU
- Maintain host CSC matrix for consistency
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 20, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 21, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 21, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 21, 2025
@qoco-org qoco-org deleted a comment from github-actions bot Nov 21, 2025
@github-actions
Copy link
Copy Markdown

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • DPKLO1: diff=0.0002s, baseline=0.0001s, Δ=+17.5%
  • DUALC5: diff=0.0007s, baseline=0.0006s, Δ=+7.7%
  • HS21: diff=0.0000s, baseline=0.0000s, Δ=+7.7%
  • HS35: diff=0.0000s, baseline=0.0000s, Δ=+45.5%
  • HS53: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • HS76: diff=0.0000s, baseline=0.0000s, Δ=+6.7%

Runtime improvements (> 5.0%)

  • HS268: diff=0.0000s, baseline=0.0000s, Δ=-5.9%
  • QADLITTL: diff=0.0004s, baseline=0.0004s, Δ=-7.1%
  • QPTEST: diff=0.0000s, baseline=0.0000s, Δ=-9.1%

@github-actions
Copy link
Copy Markdown

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • DUAL2: diff=0.0023s, baseline=0.0022s, Δ=+6.4%
  • HS118: diff=0.0001s, baseline=0.0001s, Δ=+19.0%
  • HS53: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • S268: diff=0.0000s, baseline=0.0000s, Δ=+131.2%

Runtime improvements (> 5.0%)

  • CVXQP1_S: diff=0.0005s, baseline=0.0005s, Δ=-6.3%
  • HS21: diff=0.0000s, baseline=0.0000s, Δ=-7.1%
  • TAME: diff=0.0000s, baseline=0.0000s, Δ=-27.3%
  • ZECEVIC2: diff=0.0000s, baseline=0.0000s, Δ=-7.7%

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • DPKLO1: diff=0.0002s, baseline=0.0001s, Δ=+13.1%
  • HS268: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • QPCBLEND: diff=0.0006s, baseline=0.0006s, Δ=+6.0%

Runtime improvements (> 5.0%)

  • CVXQP2_S: diff=0.0004s, baseline=0.0004s, Δ=-5.3%
  • HS118: diff=0.0001s, baseline=0.0001s, Δ=-28.4%
  • HS21: diff=0.0000s, baseline=0.0000s, Δ=-7.1%
  • LOTSCHD: diff=0.0000s, baseline=0.0000s, Δ=-10.3%

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • DPKLO1: diff=0.0002s, baseline=0.0001s, Δ=+16.8%
  • HS268: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • HS35: diff=0.0000s, baseline=0.0000s, Δ=+10.0%
  • QPTEST: diff=0.0000s, baseline=0.0000s, Δ=+10.0%
  • S268: diff=0.0000s, baseline=0.0000s, Δ=+6.3%

Runtime improvements (> 5.0%)

  • DUAL4: diff=0.0013s, baseline=0.0014s, Δ=-5.3%
  • HS35MOD: diff=0.0000s, baseline=0.0000s, Δ=-6.2%
  • HS76: diff=0.0000s, baseline=0.0000s, Δ=-6.2%
  • PRIMAL2: diff=0.0056s, baseline=0.0060s, Δ=-6.4%

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • HS35: diff=0.0000s, baseline=0.0000s, Δ=+10.0%
  • QPTEST: diff=0.0000s, baseline=0.0000s, Δ=+10.0%

Runtime improvements (> 5.0%)

  • CVXQP2_S: diff=0.0004s, baseline=0.0004s, Δ=-5.0%
  • HS21: diff=0.0000s, baseline=0.0000s, Δ=-7.1%
  • QADLITTL: diff=0.0004s, baseline=0.0004s, Δ=-5.3%
  • QAFIRO: diff=0.0001s, baseline=0.0002s, Δ=-9.1%
  • QSCTAP1: diff=0.0024s, baseline=0.0028s, Δ=-15.6%

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • CVXQP3_S: diff=0.0007s, baseline=0.0006s, Δ=+5.6%
  • S268: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • ZECEVIC2: diff=0.0000s, baseline=0.0000s, Δ=+8.3%

Runtime improvements (> 5.0%)

  • CONT-300: diff=21.4696s, baseline=22.8941s, Δ=-6.2%
  • DPKLO1: diff=0.0001s, baseline=0.0002s, Δ=-5.7%
  • GENHS28: diff=0.0000s, baseline=0.0000s, Δ=-20.0%
  • HS35MOD: diff=0.0000s, baseline=0.0000s, Δ=-6.2%
  • QADLITTL: diff=0.0003s, baseline=0.0004s, Δ=-7.7%

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • CVXQP3_S: diff=0.0007s, baseline=0.0006s, Δ=+5.0%
  • HS35: diff=0.0000s, baseline=0.0000s, Δ=+10.0%
  • QPTEST: diff=0.0000s, baseline=0.0000s, Δ=+10.0%
  • QSCAGR7: diff=0.0006s, baseline=0.0006s, Δ=+5.4%

Runtime improvements (> 5.0%)

  • HS268: diff=0.0000s, baseline=0.0000s, Δ=-62.8%
  • HS35MOD: diff=0.0000s, baseline=0.0000s, Δ=-34.8%
  • QAFIRO: diff=0.0001s, baseline=0.0002s, Δ=-10.4%
  • QPCBLEND: diff=0.0006s, baseline=0.0006s, Δ=-8.3%

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • LOTSCHD: diff=0.0001s, baseline=0.0000s, Δ=+71.4%
  • QAFIRO: diff=0.0002s, baseline=0.0001s, Δ=+17.6%
  • QPTEST: diff=0.0000s, baseline=0.0000s, Δ=+10.0%

Runtime improvements (> 5.0%)

  • DUALC5: diff=0.0006s, baseline=0.0007s, Δ=-10.8%
  • GENHS28: diff=0.0000s, baseline=0.0000s, Δ=-20.0%
  • GOULDQP2: diff=0.0019s, baseline=0.0020s, Δ=-5.1%
  • ZECEVIC2: diff=0.0000s, baseline=0.0000s, Δ=-7.7%

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • CVXQP2_S: diff=0.0005s, baseline=0.0004s, Δ=+10.0%
  • HS35: diff=0.0000s, baseline=0.0000s, Δ=+10.0%
  • QAFIRO: diff=0.0002s, baseline=0.0001s, Δ=+13.9%

Runtime improvements (> 5.0%)

  • AUG2DCQP: diff=0.1230s, baseline=0.1310s, Δ=-6.1%
  • DPKLO1: diff=0.0001s, baseline=0.0001s, Δ=-5.5%
  • GENHS28: diff=0.0000s, baseline=0.0000s, Δ=-20.0%
  • GOULDQP2: diff=0.0020s, baseline=0.0022s, Δ=-7.9%
  • GOULDQP3: diff=0.0014s, baseline=0.0016s, Δ=-9.3%
  • HS118: diff=0.0001s, baseline=0.0001s, Δ=-5.6%
  • HS268: diff=0.0000s, baseline=0.0000s, Δ=-11.1%
  • HS53: diff=0.0000s, baseline=0.0000s, Δ=-5.6%
  • HUESTIS: diff=0.0158s, baseline=0.0168s, Δ=-5.8%
  • QRECIPE: diff=0.0011s, baseline=0.0013s, Δ=-16.4%
  • ZECEVIC2: diff=0.0000s, baseline=0.0000s, Δ=-14.3%

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • DUAL2: diff=0.0024s, baseline=0.0022s, Δ=+7.4%
  • HS35: diff=0.0000s, baseline=0.0000s, Δ=+10.0%
  • HS53: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • HS76: diff=0.0000s, baseline=0.0000s, Δ=+6.7%
  • POWELL20: diff=0.1076s, baseline=0.0852s, Δ=+26.2%
  • QISRAEL: diff=0.0041s, baseline=0.0038s, Δ=+6.5%
  • QSHARE2B: diff=0.0010s, baseline=0.0008s, Δ=+16.2%
  • S268: diff=0.0000s, baseline=0.0000s, Δ=+6.3%

Runtime improvements (> 5.0%)

  • DPKLO1: diff=0.0001s, baseline=0.0002s, Δ=-9.2%
  • GENHS28: diff=0.0000s, baseline=0.0000s, Δ=-20.0%
  • QPILOTNO: diff=0.1276s, baseline=0.1433s, Δ=-10.9%
  • QSCAGR7: diff=0.0006s, baseline=0.0006s, Δ=-6.5%

@govindchari govindchari merged commit fd7a85b into main Dec 3, 2025
13 checks passed
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • CVXQP1_S: diff=0.0005s, baseline=0.0005s, Δ=+5.1%
  • CVXQP2_L: diff=24.4839s, baseline=22.5203s, Δ=+8.7%
  • CVXQP3_S: diff=0.0007s, baseline=0.0006s, Δ=+7.5%
  • GENHS28: diff=0.0000s, baseline=0.0000s, Δ=+25.0%
  • HS21: diff=0.0000s, baseline=0.0000s, Δ=+7.7%
  • HS35: diff=0.0000s, baseline=0.0000s, Δ=+10.0%
  • STADAT2: diff=0.0146s, baseline=0.0137s, Δ=+6.5%
  • ZECEVIC2: diff=0.0000s, baseline=0.0000s, Δ=+16.7%

Runtime improvements (> 5.0%)

  • DPKLO1: diff=0.0001s, baseline=0.0002s, Δ=-12.7%

@govindchari govindchari deleted the gc/cuda-backend branch December 4, 2025 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant