cuNumeric.jl

The cuNumeric.jl package wraps the cuPyNumeric C++ API from NVIDIA to bring simple distributed computing on GPUs and CPUs to Julia! We provide a simple array abstraction, the NDArray, which supports most of the operations you would expect from a normal Julia array.

Warning

cuNumeric.jl is under active development. This is an alpha API and is subject to change. Stability is not guaranteed until the first official release. We are actively working to improve the build experience to be more seamless and Julia-friendly.

Quick Start

cuNumeric.jl can be installed with the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:

pkg> add cuNumeric

Or, using the Pkg API:

using Pkg; Pkg.add(url = "https://github.com/JuliaLegate/cuNumeric.jl", rev = "main")

The first run might take awhile as it has to install multiple large dependencies such as the CUDA SDK (if you have an NVIDIA GPU). For more install instructions, please visit out install guide in the documentation.

To see information about your cuNumeric install run the versioninfo function.

cuNumeric.versioninfo()

Warning

Starting more than one instance of cuNumeric.jl can lead to a hard-crash. The default hardware configuration reserves all available resources. For more details, please visit our hardware configuration documentation.

Monte-Carlo Example

using cuNumeric

integrand = (x) -> exp.(-x.^2)

N = 1_000_000

x_max = 10.0f0
domain = [-x_max, x_max]
Ω = domain[2] - domain[1]

samples = Ω*cuNumeric.rand(N) .- x_max
estimate = (Ω/N) * sum(integrand(samples))

println("Monte-Carlo Estimate: $(estimate)")

Helping the Garbage Collector

Every intermediate NDArray (from a slice, broadcast, or function call) allocates a fresh buffer and waits for the Julia GC to free it. Because the GC runs on memory pressure, many dead buffers accumulate and pressure cuNumeric's allocator.

@analyze_lifetimes performs a static last-use analysis at macro-expansion time and inserts eager maybe_insert_delete calls immediately after each temporary's final use. Freed buffers are returned to cuNumeric's pool and recycled by the next same-sized allocation, skipping new buffer allocation.

This macro can improve runtime and reduce memory overheads.

@analyze_lifetimes begin
    result = A[1:end, :] .+ B[1:end, :]
    C .= result .* 2.0
end

Requirements

We require an x86 Linux platform and Julia >=1.10. For GPU support we require an NVIDIA GPU and a CUDA driver which supports CUDA 13.0. ARM support is theoretically possible, but we do not make binaries or test on ARM. Please open an issue if ARM support is of interest.

Name		Name	Last commit message	Last commit date
Latest commit History 384 Commits
.buildkite		.buildkite
.github/workflows		.github/workflows
benchmark		benchmark
deps		deps
docs		docs
examples		examples
ext/CUDAExt		ext/CUDAExt
lib		lib
scripts		scripts
src		src
test		test
.JuliaFormatter.toml		.JuliaFormatter.toml
.clang-format		.clang-format
.gitattributes		.gitattributes
.githash		.githash
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md
TODO.md		TODO.md
codecov.yml		codecov.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cuNumeric.jl

Quick Start

Monte-Carlo Example

Helping the Garbage Collector

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cuNumeric.jl

Quick Start

Monte-Carlo Example

Helping the Garbage Collector

Requirements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages