-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in onnxruntime while aligning speech to transcript using whisper #66
Comments
Never seen it before. Could be a lot of things, including an ONNX runtime issue. Can you send the audio and transcript that produces it? |
No, unfortunately I cannot share the audio, it is confidential. I am getting this error on about 10 out of 20 audio files that I processed. |
The error looks like it has to do with some issue with an input tensor or its dimensions.
Edit: you can run with |
I just ran it with I am using Windows This is the output. I have redacted some parts, but not the last words. Prepare audio part at time position 1440.60.. 2.8ms
Extract mel spectogram from audio part.. 65.9ms
Normalize mel spectogram.. 16.8ms
Encode mel spectogram with Whisper encoder model.. 59.7ms
Decode text tokens with Whisper decoder model.. [REDACTED...] Daar kan hier echter geen sprake van2024-07-31 13:30:11.7071441 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running Expand node. Name:'/blocks.0/attn/Expand' Status Message: invalid expand shape
Error: Non-zero status code returned while running Expand node. Name:'/blocks.0/attn/Expand' Status Message: invalid expand shape
at Immediate.<anonymous> (C:\Users\luik001c\echogarden-github\node_modules\onnxruntime-node\dist\backend.js:45:108)
at process.processImmediate (node:internal/timers:483:21) |
It happens during the call to decode a single next token. Based on the fact that it shows that other tokens were decoded before it, the problem isn't with initializing the decoder, it's with the actual decoding itself, which narrows it down. I've tested the Whisper implementation with many different inputs. I've never encountered this particular error. Without a way to reproduce it I can't know what exactly causes it. You could try to the Whisper recognition model on the same audio input and see if you get an error. Most likely you wouldn't. If you don't get it, it could have something to do with the particular tokens that are decoded using the forced decoding. Maybe something about the language being Dutch. I don't know, maybe special tokens that are used there. It's really hard to determine. You say it's common in the inputs you are trying. You can also try English inputs to see if it happens with them as well. If you happen to have anything that produces this error that you can send, it will really help. |
I just ran it succesfully with the Whisper |
Update: still getting the error on some files, although on less files when using
|
It looks like a different exception. DmlExecutionProvider means it's using DirectML (Windows GPU acceleration). I get a lot of errors with that provider but usually on other models, not Whisper. May be an issue with the ONNX runtime. They only added support for GPU processing on |
Then I'll try rerunning in cpu mode |
The new Try to see if these issues still occur with the new version. Anyway, based on my testing I've still never encountered them myself. It could be related to the particular combination of OS/hardware I'm testing on. It's unlikely that a particular whisper model has an issue since these ONNX models are derived from the original ones from OpenAI. Maybe something in one of the model's internal configuration (parameters like number of heads, constants, etc.) is triggering the issue, but not really causing it. |
I am getting the following error when using the
whisper
engine withalign
.Command I am running, for reference:
echogarden align "audio.wav" "transcript.txt" "result.srt" "result.json" --language=nl --crop=false --engine=whisper
The text was updated successfully, but these errors were encountered: