Skip to content

utkinis/TinyKernels.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TinyKernels.jl TinyKernels.jl

CI

TinyKernels.jl provides a tiny abstraction for GPU (and CPU) kernels, with full support for CUDA (Nvidia) and ROCm (AMD) backends, limited support for Metal (GPU programming on MacOS ARM) backend, and allowing for multi-threaded CPU execution.

TinyKernels.jl is mostly a heavily stripped-down version of KernelAbstractions.jl supporting the bare minimum of the features. This package provides a sandbox for Julia GPU tooling and to measure the performance of kernels in a GPU-agnostic way. While the API of KernelAbstractions.jl is in a "transient" state, this package will provide the thin abstraction layer on top the CUDA.jl, AMDGPU.jl and Metal.jl packages.

TinyKernels.jl allows to explicitly launch GPU kernels asynchronously on different streams with given priority. This feature facilitates the overlap between computations and memory transfers in distributed configurations.

TinyKernels.jl supports automatic differentiation with Enzyme.jl overloading the Enzyme.autodiff function to enable reverse mode AD of GPU (and CPU) kernels.

Preliminary benchmarks can be found in TinyBenchmarks.jl and Metal playground in MetalGPU.

Stay tuned 🚀

Notes

⚠️ Metal backend:

  • Only Float32 is being supported. For Float64, one could try using a construct from DoubleFloats.jl which may impact performance.