Skip to content

Whisper Advanced Parameters

jhj0517 edited this page May 31, 2024 · 6 revisions

Advanced Parameters

Parameter Description
beam_size Parameter used in the beam search algorithm.
TLDR; Higher beam size, higher quality but slower transcription. Smaller beam size, lower quality but faster transcription.
log_prob_threshold Parameter related to how whisper handles the "silent" part of the audio. If the average log probability over sampled tokens is below this value, treat as failed.
TLDR; Lower this value if you want Whisper to be more "sensitive" to small sounds. Adjust together with no_speech_threshold and see what happens.
no_speech_threshold Parameter related to how Whisper handles the "silent" part of the audio. If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below log_prob_threshold, consider the segment as silent.
TLDR; Lower this value if you want Whisper to be more "sensitive" to small sounds. Adjust together with log_prob_threshold and see what happens.
compute_type Compute type such as float16 or float32. default to float16 if CUDA is enabled, else float32.
best_of Number of candidates when sampling with non-zero temperature.
patience Beam search patience factor.
condition_on_previous_text If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync.
TLDR; If failure loop (repetitive hallucination) occurs, consider setting this to False.
initial_prompt Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.
temperature Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression_ratio_threshold or log_prob_threshold.
compression_ratio_threshold If the gzip compression ratio is above this value, treat as failed.
Clone this wiki locally