Skip to content

Release v0.3.2

Compare
Choose a tag to compare
@Ying1123 Ying1123 released this 02 Oct 17:19
· 415 commits to main since this release
37c5899

Highlight

  • Support torch.compile, cuda graph for triton attention backend and DeepSeek MLA #1442 #1422
  • Initial support for multi-LoRA serving #1307
  • Integrate torchao for quantization #1341
  • Optimize the CPU scheduler overhead
  • Multiple critical bug fixes for llama and llava (tokenizer, modality)
  • Support AMD backend #1420
  • New models: MiniCPM3, OLMoE

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.3.2