TinyKernels.jl

TinyKernels.jl provides a tiny abstraction for GPU (and CPU) kernels, with full support for CUDA (Nvidia) and ROCm (AMD) backends, limited support for Metal (GPU programming on MacOS ARM) backend, and allowing for multi-threaded CPU execution.

TinyKernels.jl is mostly a heavily stripped-down version of KernelAbstractions.jl supporting the bare minimum of the features. This package provides a sandbox for Julia GPU tooling and to measure the performance of kernels in a GPU-agnostic way. While the API of KernelAbstractions.jl is in a "transient" state, this package will provide the thin abstraction layer on top the CUDA.jl, AMDGPU.jl and Metal.jl packages.

TinyKernels.jl allows to explicitly launch GPU kernels asynchronously on different streams with given priority. This feature facilitates the overlap between computations and memory transfers in distributed configurations.

TinyKernels.jl supports automatic differentiation with Enzyme.jl overloading the Enzyme.autodiff function to enable reverse mode AD of GPU (and CPU) kernels.

Preliminary benchmarks can be found in TinyBenchmarks.jl and Metal playground in MetalGPU.

Stay tuned 🚀

Notes

⚠️ Metal backend:

Only Float32 is being supported. For Float64, one could try using a construct from DoubleFloats.jl which may impact performance.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
docs/logo		docs/logo
examples		examples
ext		ext
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyKernels.jl

Notes

About

Releases 5

Packages

Contributors 3

Languages

License

utkinis/TinyKernels.jl

Folders and files

Latest commit

History

Repository files navigation

TinyKernels.jl

Notes

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 3

Languages

Packages