Replies: 1 comment
-
Hi @tjruwase , any comments for this idea? We are thinking about combine op builder interface with Triton kernels so we can open the possibilities for write non-triton kernels for sparse attention, while keep CUDA path on Triton. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I noticed that SparseAttention is implemented with Triton for CUDA execution. When we try to implement SparseAttention on other accelerator, we found Triton might be a blocker because Triton are not available on every other accelerator. In that case, implement SparseAttention OpBuilder and kernels would be a natural option.
I'm wondering whether DeepSpeed can allow Triton implement to coexist with OpBuilder implementation in DeepSpeed to enhance extendability. The idea is to implement a special PythonBuilder class that allows the OpBuilder loaded module to call python function, inside of python function we can call python function or Triton implementation. A demotration of the concept can be found in the following link.
https://github.com/delock/DeepSpeedSYCLSupport/blob/gma/kernel-python-study/op_builder/cpu/transformer_inference.py
With the OpBuilder introduced back, DeepSpeed would have the flexibility to implement a function with either Triton or accelerator native code, with a unify interface. This would enhance extendability on acclerator where Triton had not been implemented yet.
Beta Was this translation helpful? Give feedback.
All reactions