We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@helena-intel
This works:
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, GenerationConfig, pipeline from pathlib import Path from optimum.intel.openvino import OVModelForSpeechSeq2Seq import openvino as ov import json model_id = "openai/whisper-small" model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id) processor = AutoProcessor.from_pretrained(model_id) generation_config = GenerationConfig.from_pretrained(model_id) ov_config = {"CACHE_DIR": ""} model_path = Path(model_id.replace('/', '_')) if not model_path.exists(): ov_model = OVModelForSpeechSeq2Seq.from_pretrained( model_id, ov_config=ov_config, export=True, compile=False, load_in_8bit=False ) ov_model.half() ov_model.save_pretrained(model_path) else: ov_model = OVModelForSpeechSeq2Seq.from_pretrained( model_path, ov_config=ov_config, compile=False ) ov_model.generation_config = generation_config device = 'gpu' ov_model.to(device) ov_model.compile() pipe = pipeline( "automatic-speech-recognition", model=ov_model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, chunk_length_s=30, batch_size=18, ) result = pipe("./4.wav") with open("sample.json", "w") as outfile: json.dump(result, outfile) print(result["text"])
This doesn't (added return_timestamps="word" at line 40):
return_timestamps="word"
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, GenerationConfig, pipeline from pathlib import Path from optimum.intel.openvino import OVModelForSpeechSeq2Seq import openvino as ov import json model_id = "openai/whisper-small" model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id) processor = AutoProcessor.from_pretrained(model_id) generation_config = GenerationConfig.from_pretrained(model_id) ov_config = {"CACHE_DIR": ""} model_path = Path(model_id.replace('/', '_')) if not model_path.exists(): ov_model = OVModelForSpeechSeq2Seq.from_pretrained( model_id, ov_config=ov_config, export=True, compile=False, load_in_8bit=False ) ov_model.half() ov_model.save_pretrained(model_path) else: ov_model = OVModelForSpeechSeq2Seq.from_pretrained( model_path, ov_config=ov_config, compile=False ) ov_model.generation_config = generation_config device = 'gpu' ov_model.to(device) ov_model.compile() pipe = pipeline( "automatic-speech-recognition", model=ov_model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, chunk_length_s=30, batch_size=18, ) result = pipe("./4.wav", return_timestamps="word") with open("sample.json", "w") as outfile: json.dump(result, outfile) print(result["text"])
and fails with:
INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino Compiling the encoder to GPU ... Compiling the decoder to GPU ... Compiling the decoder to GPU ... device must be of type <class 'str'> but got <class 'torch.device'> instead Traceback (most recent call last): File "/run/media/greggy/1a4fd6d7-1f9d-42c6-9324-661804695013/D/owisp/./n2.py", line 44, in <module> result = pipe("./4.wav", return_timestamps="word") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/greggy/.local/lib/python3.11/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 292, in __call__ return super().__call__(inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/greggy/.local/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1154, in __call__ return next( ^^^^^ File "/home/greggy/.local/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__ item = next(self.iterator) ^^^^^^^^^^^^^^^^^^^ File "/home/greggy/.local/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 266, in __next__ processed = self.infer(next(self.iterator), **self.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/greggy/.local/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1068, in forward model_outputs = self._forward(model_inputs, **forward_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/greggy/.local/lib/python3.11/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 507, in _forward tokens = self.model.generate( ^^^^^^^^^^^^^^^^^^^^ File "/home/greggy/.local/lib/python3.11/site-packages/optimum/intel/openvino/modeling_seq2seq.py", line 1018, in generate outputs["token_timestamps"] = self._extract_token_timestamps( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: '_OVModelForWhisper' object has no attribute '_extract_token_timestamps'
Of course non-OpenVino, purely transformers script works too and returns word level timestamps successfully.
The text was updated successfully, but these errors were encountered:
Noticed too late, that it's already assigned to @eaidova at OpenVino repo, this one is more precise though.
Sorry, something went wrong.
Issue in OpenVINO repo: openvinotoolkit/openvino#22794 @eaidova could you have a look?
eaidova
No branches or pull requests
@helena-intel
This works:
This doesn't (added
return_timestamps="word"
at line 40):and fails with:
Of course non-OpenVino, purely transformers script works too and returns word level timestamps successfully.
The text was updated successfully, but these errors were encountered: