Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreML support? #18

Open
ArtemBernatskyy opened this issue May 24, 2023 · 3 comments
Open

CoreML support? #18

ArtemBernatskyy opened this issue May 24, 2023 · 3 comments

Comments

@ArtemBernatskyy
Copy link

How can we add CoreML support? Thx!

@stlukey
Copy link
Owner

stlukey commented Jun 3, 2023

Whisper.cpp now has CoreML support:

ggerganov/whisper.cpp#566

Using just with whisper.cpp, should be as simple as compiling with the appropriate flags:

cd build
cmake -DWHISPER_COREML=1 ..

Check by running:

./main -m models/ggml-base.en.bin -f samples/gb0.wav

...

whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | COREML = 1 | 

...

note: COREML = 1.

For whispercpp.py, we can add the flag for CoreML here inside setup.py:

if sys.platform == 'darwin':
    os.environ['CFLAGS']   = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=gnu11'
    os.environ['CXXFLAGS'] = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=c++11'
    os.environ['LDFLAGS']  = '-framework Accelerate'

First update the submodule inside whispercpp.py for whisper.cpp. Check that it still runs, it might need some changes if the API has changed. Given it still works, add the flag inside setup.py.

I can't test this at the moment, but feel free to make the pull request, and we can get this feature added.

@ArtemBernatskyy
Copy link
Author

Thx!
I decided to use OpenAI's Whisper API for the current moment, in my tests it beats the local Whisper with CoreML by 3-4 times (comparing to Macbook M1 32GB)

@ryzn0518
Copy link

ryzn0518 commented Jul 21, 2023

Whisper.cpp now has CoreML support:

ggerganov/whisper.cpp#566

Using just with whisper.cpp, should be as simple as compiling with the appropriate flags:

cd build
cmake -DWHISPER_COREML=1 ..

Check by running:

./main -m models/ggml-base.en.bin -f samples/gb0.wav

...

whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | COREML = 1 | 

...

note: COREML = 1.

For whispercpp.py, we can add the flag for CoreML here inside setup.py:

if sys.platform == 'darwin':
    os.environ['CFLAGS']   = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=gnu11'
    os.environ['CXXFLAGS'] = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=c++11'
    os.environ['LDFLAGS']  = '-framework Accelerate'

First update the submodule inside whispercpp.py for whisper.cpp. Check that it still runs, it might need some changes if the API has changed. Given it still works, add the flag inside setup.py.

I can't test this at the moment, but feel free to make the pull request, and we can get this feature added.
@stlukey

@stlukey

I have verified that my computer is an M2. I found that CoreML does not seem to be enabled through this command.

I also added this compile flag, which also does not seem to work:

if sys.platform == 'darwin':
    print("run here.....")
    os.environ['CFLAGS'] = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=gnu11'
    os.environ['CXXFLAGS'] = '-DWHISPER_COREML=1 -DGGML_USE_ACCELERATE -O3 -std=c++11'
    os.environ['LDFLAGS'] = '-framework Accelerate'

That is, I added -DWHISPER_COREML=1 and used the latest whisper.cpp code. When I run the generated whisper.xxxx.so file to transcribe the same audio, it takes 12 minutes. But if I compile whisper.cpp with the same commit using cmake -DWHISPER_COREML=1 and run ./main on the same audio, it only takes 7 minutes. Also, I can see the loading process has:

whisper_init_state: loading Core ML model from 'models/ggml-large-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
system_info: n_threads = 4 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | COREML = 1 | OPENVINO = 0 |

It loading the Core ML model. andIt only takes 8 minutes. I expected the .so call can also do this. How can I modify it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants