Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

How to profile with nsight-system?

$ JULIA_CUDA_USE_BINARYBUILDER=false nsys launch julia nsys_OMEinsum.jl

Then open nsys UI on your local host.

How to profile the register pressure

Run your code remotely on your GPU host.

$ sudo JULIA_CUDA_USE_BINARYBUILDER=false /home/ubuntu/.local/bin/ncu -o profile /home/ubuntu/.local/bin/julia permutedims-ncu.jl

Download the profile output and type locally

$ ncu-ui profile.ncu-rep

Analyse the profile results, the "Registers Per Thread" matters a lot, should be <64 for good performance.

Profile pytorch program

$ nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s cpu  --capture-range=cudaProfilerApi --stop-on-range-end=true --cudabacktrace=true -x true -o my_profile python benchmark_pytorch.py profilegpu