Replies: 1 comment
-
To fix the bottleneck at the memory width, I propose:
I will be making this one of my coding projects to get models on ANE. I have finally realized even after trying MLX, that my 2020 Macbook Pro 8GB RAM will be incapable of ever interfacing with large models. To add, llama.cpp doesnt use ANE either. Apple I believe understands that releasing an ANE API doc would massively put them ahead of others in the AI race, so they're waiting / commercializing on it first before opening the gates. I will be taking a crack at trying to reproduce LLM in a flash: Efficient Large Language Model Inference with Limited Memory on my own. I can see how it would even further help memory issues. I wonder if would be beneficial to code in swift, Apple's Native Language. |
Beta Was this translation helpful? Give feedback.
-
Apple released a new ANE-optimized transformers implementation (repo, blog).
h/t @antmikinka for pointing it out
Seems like they had some special obstacles to overcome specifically related to vision, but they mention 3 things that apply to ANE transformers in general:
It's interesting to see that they're using a CNN-Transformer hybrid here too. The iOS 17 speech-to-text model uses one as well. Wonder if we'll see more of that in the future.
Footnotes
Apple's Tiny-MOAT-1 (TM1) models have ~5M parameters (with 256x256 input) and ~10M parameters (with 512x512 input) vs the smallest gpt2 with 117M (80M excluding embedding params). ↩
Beta Was this translation helpful? Give feedback.
All reactions