Skip to content

JuliaLegate/cuNumeric.jl

Repository files navigation

cuNumeric.jl

Documentation dev codecov License: MIT

The cuNumeric.jl package wraps the cuPyNumeric C++ API from NVIDIA to bring simple distributed computing on GPUs and CPUs to Julia! We provide a simple array abstraction, the NDArray, which supports most of the operations you would expect from a normal Julia array.

Warning

cuNumeric.jl is under active development. This is an alpha API and is subject to change. Stability is not guaranteed until the first official release. We are actively working to improve the build experience to be more seamless and Julia-friendly.

Quick Start

cuNumeric.jl can be installed with the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:

pkg> add cuNumeric

Or, using the Pkg API:

using Pkg; Pkg.add(url = "https://github.com/JuliaLegate/cuNumeric.jl", rev = "main")

The first run might take awhile as it has to install multiple large dependencies such as the CUDA SDK (if you have an NVIDIA GPU). For more install instructions, please visit out install guide in the documentation.

To see information about your cuNumeric install run the versioninfo function.

cuNumeric.versioninfo()

Warning

Starting more than one instance of cuNumeric.jl can lead to a hard-crash. The default hardware configuration reserves all available resources. For more details, please visit our hardware configuration documentation.

Monte-Carlo Example

using cuNumeric

integrand = (x) -> exp.(-x.^2)

N = 1_000_000

x_max = 10.0f0
domain = [-x_max, x_max]
Ω = domain[2] - domain[1]

samples = Ω*cuNumeric.rand(N) .- x_max
estimate =/N) * sum(integrand(samples))

println("Monte-Carlo Estimate: $(estimate)")

Helping the Garbage Collector

Every intermediate NDArray (from a slice, broadcast, or function call) allocates a fresh buffer and waits for the Julia GC to free it. Because the GC runs on memory pressure, many dead buffers accumulate and pressure cuNumeric's allocator.

@analyze_lifetimes performs a static last-use analysis at macro-expansion time and inserts eager maybe_insert_delete calls immediately after each temporary's final use. Freed buffers are returned to cuNumeric's pool and recycled by the next same-sized allocation, skipping new buffer allocation.

This macro can improve runtime and reduce memory overheads.

@analyze_lifetimes begin
    result = A[1:end, :] .+ B[1:end, :]
    C .= result .* 2.0
end

Requirements

We require an x86 Linux platform and Julia >=1.10. For GPU support we require an NVIDIA GPU and a CUDA driver which supports CUDA 13.0. ARM support is theoretically possible, but we do not make binaries or test on ARM. Please open an issue if ARM support is of interest.

About

cuNumeric.jl wraps the cuPyNumeric C++ API providing a simple array programming interface that executes code on distributed clusters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors