Skip to content

Latest commit

 

History

History
962 lines (181 loc) · 12.5 KB

File metadata and controls

962 lines (181 loc) · 12.5 KB

nvrtc

Error Handling

NVRTC defines the following enumeration type and function for API call error handling.

.. autoclass:: cuda.nvrtc.nvrtcResult

    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_SUCCESS


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_OUT_OF_MEMORY


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_PROGRAM_CREATION_FAILURE


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_INVALID_INPUT


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_INVALID_PROGRAM


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_INVALID_OPTION


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_COMPILATION


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_BUILTIN_OPERATION_FAILURE


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID


    .. autoattribute:: cuda.nvrtc.nvrtcResult.NVRTC_ERROR_INTERNAL_ERROR

.. autofunction:: cuda.nvrtc.nvrtcGetErrorString

General Information Query

NVRTC defines the following function for general information query.

.. autofunction:: cuda.nvrtc.nvrtcVersion
.. autofunction:: cuda.nvrtc.nvrtcGetNumSupportedArchs
.. autofunction:: cuda.nvrtc.nvrtcGetSupportedArchs

Compilation

NVRTC defines the following type and functions for actual compilation.

.. autoclass:: cuda.nvrtc.nvrtcProgram
.. autofunction:: cuda.nvrtc.nvrtcCreateProgram
.. autofunction:: cuda.nvrtc.nvrtcDestroyProgram
.. autofunction:: cuda.nvrtc.nvrtcCompileProgram
.. autofunction:: cuda.nvrtc.nvrtcGetPTXSize
.. autofunction:: cuda.nvrtc.nvrtcGetPTX
.. autofunction:: cuda.nvrtc.nvrtcGetCUBINSize
.. autofunction:: cuda.nvrtc.nvrtcGetCUBIN
.. autofunction:: cuda.nvrtc.nvrtcGetNVVMSize
.. autofunction:: cuda.nvrtc.nvrtcGetNVVM
.. autofunction:: cuda.nvrtc.nvrtcGetLTOIRSize
.. autofunction:: cuda.nvrtc.nvrtcGetLTOIR
.. autofunction:: cuda.nvrtc.nvrtcGetOptiXIRSize
.. autofunction:: cuda.nvrtc.nvrtcGetOptiXIR
.. autofunction:: cuda.nvrtc.nvrtcGetProgramLogSize
.. autofunction:: cuda.nvrtc.nvrtcGetProgramLog
.. autofunction:: cuda.nvrtc.nvrtcAddNameExpression
.. autofunction:: cuda.nvrtc.nvrtcGetLoweredName

Supported Compile Options

NVRTC supports the compile options below. Option names with two preceding dashs (None) are long option names and option names with one preceding dash (-) are short option names. Short option names can be used instead of long option names. When a compile option takes an argument, an assignment operator (=) is used to separate the compile option argument from the compile option name, e.g., "--gpu-architecture=compute_60". Alternatively, the compile option name and the argument can be specified in separate strings without an assignment operator, .e.g, "--gpu-architecture" "compute_60". Single-character short option names, such as -D, -U, and -I, do not require an assignment operator, and the compile option name and the argument can be present in the same string with or without spaces between them. For instance, "-D=<def>", "-D<def>", and "-D <def>" are all supported. The valid compiler options are:

  • Compilation targets

    • --gpu-architecture=<arch> (-arch)

      Specify the name of the class of GPU architectures for which the input must be compiled.

      • Valid <arch>s:
        • compute_50
        • compute_52
        • compute_53
        • compute_60
        • compute_61
        • compute_62
        • compute_70
        • compute_72
        • compute_75
        • compute_80
        • compute_87
        • compute_89
        • compute_90
        • sm_50
        • sm_52
        • sm_53
        • sm_60
        • sm_61
        • sm_62
        • sm_70
        • sm_72
        • sm_75
        • sm_80
        • sm_87
        • sm_89
        • sm_90
      • Default: compute_52
  • Separate compilation / whole-program compilation

    • --device-c (-dc)

      Generate relocatable code that can be linked with other relocatable device code. It is equivalent to --relocatable-device-code=true.

    • --device-w (-dw)

      Generate non-relocatable code. It is equivalent to --relocatable-device-code=false.

    • --relocatable-device-code={true|false} (-rdc)

      Enable (disable) the generation of relocatable device code.

      • Default: false
    • --extensible-whole-program (-ewp)

      Do extensible whole program compilation of device code.

      • Default: false
  • Debugging support

    • --device-debug (-G)

      Generate debug information. If --dopt is not specified, then turns off all optimizations.

    • --generate-line-info (-lineinfo)

      Generate line-number information.

  • Code generation

    • --dopt on (-dopt)

    • --dopt=on

      Enable device code optimization. When specified along with '-G', enables limited debug information generation for optimized device code (currently, only line number information). When '-G' is not specified, '-dopt=on' is implicit.

    • --ptxas-options <options> (-Xptxas)

    • --ptxas-options=<options>

      Specify options directly to ptxas, the PTX optimizing assembler.

    • --maxrregcount=<N> (-maxrregcount)

      Specify the maximum amount of registers that GPU functions can use. Until a function-specific limit, a higher value will generally increase the performance of individual GPU threads that execute this function. However, because thread registers are allocated from a global register pool on each GPU, a higher value of this option will also reduce the maximum thread block size, thereby reducing the amount of thread parallelism. Hence, a good maxrregcount value is the result of a trade-off. If this option is not specified, then no maximum is assumed. Value less than the minimum registers required by ABI will be bumped up by the compiler to ABI minimum limit.

    • --ftz={true|false} (-ftz)

      When performing single-precision floating-point operations, flush denormal values to zero or preserve denormal values. --use_fast_math implies --ftz=true.

      • Default: false
    • --prec-sqrt={true|false} (-prec-sqrt)

      For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation. --use_fast_math implies --prec-sqrt=false.

      • Default: true
    • --prec-div={true|false} (-prec-div)

      For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation. --use_fast_math implies --prec-div=false.

      • Default: true
    • --fmad={true|false} (-fmad)

      Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA). --use_fast_math implies --fmad=true.

      • Default: true
    • --use_fast_math (-use_fast_math)

      Make use of fast math operations. --use_fast_math implies --ftz=true --prec-div=false --prec-sqrt=false --fmad=true.

    • --extra-device-vectorization (-extra-device-vectorization)

      Enables more aggressive device code vectorization in the NVVM optimizer.

    • --modify-stack-limit={true|false} (-modify-stack-limit)

      On Linux, during compilation, use setrlimit() to increase stack size to maximum allowed. The limit is reset to the previous value at the end of compilation. Note: setrlimit() changes the value for the entire process.

      • Default: true
    • --dlink-time-opt (-dlto)

      Generate intermediate code for later link-time optimization. It implies -rdc=true. Note: when this option is used the nvrtcGetLTOIR API should be used, as PTX or Cubin will not be generated.

    • --gen-opt-lto (-gen-opt-lto)

      Run the optimizer passes before generating the LTO IR.

    • --optix-ir (-optix-ir)

      Generate OptiX IR. The Optix IR is only intended for consumption by OptiX through appropriate APIs. This feature is not supported with link-time-optimization (-dlto)

. Note: when this option is used the nvrtcGetOptiX API should be used, as PTX or Cubin will not be generated.

  • Preprocessing

    • --define-macro=<def> (-D)

      <def> can be either <name> or <name=definitions>.

      • <name>

        Predefine <name> as a macro with definition 1.

      • <name>=<definition>

        The contents of <definition> are tokenized and preprocessed as if they appeared during translation phase three in a #define directive. In particular, the definition will be truncated by embedded new line characters.

    • --undefine-macro=<def> (-U)

      Cancel any previous definition of <def>.

    • --include-path=<dir> (-I)

      Add the directory <dir> to the list of directories to be searched for headers. These paths are searched after the list of headers given to nvrtcCreateProgram.

    • --pre-include=<header> (-include)

      Preinclude <header> during preprocessing.

    • --no-source-include (-no-source-include) The preprocessor by default adds the directory of each input sources to the include path. This option disables this feature and only considers the path specified explicitly.

  • Language Dialect

    • --std={c++03|c++11|c++14|c++17|c++20} (-std={c++11|c++14|c++17|c++20})

      Set language dialect to C++03, C++11, C++14, C++17 or C++20

    • --builtin-move-forward={true|false} (-builtin-move-forward)

      Provide builtin definitions of std::move and std::forward, when C++11 language dialect is selected.

      • Default: true
    • --builtin-initializer-list={true|false} (-builtin-initializer-list)

      Provide builtin definitions of std::initializer_list class and member functions when C++11 language dialect is selected.

      • Default: true
  • Misc.

    • --disable-warnings (-w)

      Inhibit all warning messages.

    • --restrict (-restrict)

      Programmer assertion that all kernel pointer parameters are restrict pointers.

    • --device-as-default-execution-space (-default-device)

      Treat entities with no execution space annotation as device entities.

    • --device-int128 (-device-int128)

      Allow the __int128 type in device code. Also causes the macro CUDACC_RTC_INT128 to be defined.

    • --optimization-info=<kind> (-opt-info)

      Provide optimization reports for the specified kind of optimization. The following kind tags are supported:

      • inline : emit a remark when a function is inlined.
    • --version-ident={true|false} (-dQ)

      Embed used compiler's version info into generated PTX/CUBIN

      • Default: false
    • --display-error-number (-err-no)

      Display diagnostic number for warning messages. (Default)

    • --no-display-error-number (-no-err-no)

      Disables the display of a diagnostic number for warning messages.

    • --diag-error=<error-number>,... (-diag-error)

      Emit error for specified diagnostic message number(s). Message numbers can be separated by comma.

    • --diag-suppress=<error-number>,... (-diag-suppress)

      Suppress specified diagnostic message number(s). Message numbers can be separated by comma.

    • --diag-warn=<error-number>,... (-diag-warn)

      Emit warning for specified diagnostic message number(s). Message numbers can be separated by comma.