Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Token '4' not found in text #70

Open
IMBAepsilon opened this issue Sep 14, 2024 · 1 comment
Open

Error: Token '4' not found in text #70

IMBAepsilon opened this issue Sep 14, 2024 · 1 comment

Comments

@IMBAepsilon
Copy link

when I use

echogarden align-transcript-and-translation 01.mp3 01.txt 01_translate.txt 01.json 01.srt

I got

Echogarden v1.5.0

Start stage 1: Align speech to transcript
Transcode with command-line ffmpeg.. 1102.4ms
Convert wave buffer to raw audio.. 384.1ms
Resample audio to 16kHz mono.. 962.1ms
Crop using voice activity detection.. 1263.1ms
Normalize and trim audio.. 181.2ms
No language specified. Detect language using reference text.. 84.4ms
Language detected: Japanese (ja)
Load alignment module.. 0.2ms
Synthesize alignment reference with eSpeak.. 5911.2ms

Starting alignment pass 1/1: granularity: low, max window duration: 189s
Compute reference MFCC features.. 1069.2ms
Compute source MFCC features.. 721.3ms
DTW cost matrix memory size: 685.4MB
Align reference and source MFCC features using DTW.. 2345.1ms

Convert path to timeline.. 20.7ms
Postprocess timeline.. 54.9ms
Total alignment time: 14195.5ms

Start stage 2: Align timeline to translated transcript
No source language specified. Detect source language.. 0.9ms
Source language detected: Japanese (ja)
No target language specified. Detect target language.. 0.6ms
Target language detected: Chinese (zh)
Load e5 module
Prepare text for semantic alignment.. 331.4ms
Initialize E5 embedding model.. 1184.6ms
Extract embeddings from source 1.. Error: Token '4' not found in text
@rotemdan
Copy link
Member

rotemdan commented Oct 4, 2024

Thanks for the report.

The align-transcript-and-translation is a complex operation that combines alignment engines and a special word embedding model.

Due to how the text is tokenized when passed to the embedding model, it's possible that there are various edge cases where the tokenization and de-tokenization fails to match the original text.

I'll need the exact inputs used so I can reproduce the error and determine how to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants