I want to use Parler tts for as the base model, is it possible? #312

Hassan-jr · 2024-09-19T09:13:06Z

OpenVoice is amazing and I like it a lot. However, I wanted to experiment with different TTS models like Parler tts from huggingface as the base model instead of the melotts.

How to achieve that?
thanks

dosubot · 2024-09-19T09:14:02Z

Hi @Hassan-jr! I'm here to assist you with any bugs, questions, or contributions you have. Let's solve this together!

Yes, it is possible to use a different TTS model like Parler TTS as the base model. You can refer to a similar open issue where a solution was suggested for replacing the base model with something different ^[1].

Here is a summarized approach to achieve this:

Generate Audio with Parler TTS: Use Parler TTS to generate an audio file of a sentence (preferably longer than 10 seconds).

Extract Tone Color Vector: Use the se_extractor function to extract the tone color vector from the generated audio. The se_extractor function is implemented in openvoice/se_extractor.py and its input and output specifications are as follows:

Input:
- audio_path (str): The path to the audio file.
- vc_model (object): The voice cloning model object.
- target_dir (str, optional): The directory where the processed files will be saved. Default is 'processed'.
- vad (bool, optional): A flag indicating whether to use Voice Activity Detection (VAD) for splitting the audio. Default is True.
Output:
- Returns a tuple containing:
  - The speaker embedding extracted by the vc_model.
  - The audio_name which is a string derived from the audio file name, model version, and a hash of the audio content.

def get_se(audio_path, vc_model, target_dir='processed', vad=True):
    device = vc_model.device
    version = vc_model.version
    print("OpenVoice version:", version)

    audio_name = f"{os.path.basename(audio_path).rsplit('.', 1)[0]}_{version}_{hash_numpy_array(audio_path)}"
    se_path = os.path.join(target_dir, audio_name, 'se.pth')

    if vad:
        wavs_folder = split_audio_vad(audio_path, target_dir=target_dir, audio_name=audio_name)
    else:
        wavs_folder = split_audio_whisper(audio_path, target_dir=target_dir, audio_name=audio_name)
    
    audio_segs = glob(f'{wavs_folder}/*.wav')
    if len(audio_segs) == 0:
        raise NotImplementedError('No audio segments found!')
    
    return vc_model.extract_se(audio_segs, se_save_path=se_path), audio_name

Use the Tone Color Vector: Once you have the tone color vector, you can use it as input along with a new sentence read by your base speaker (Parler TTS) and the tone color vector of the target speaker to the tone color converter.

For more detailed steps, you can refer to demo_part2.ipynb in the OpenVoice repository, which provides an example of using an external TTS model as the base speaker.

To continue talking to Dosu, mention @dosu.

meet1919 · 2024-09-29T13:13:15Z

Does this help in better voice cloning? Because using melotts' english model as the base, cloned voice has vast difference between reference voice

Hassan-jr · 2024-09-29T13:38:32Z

@meet1919 I am not sure. I wanted to experiment with it.

dosubot bot added the question Further information is requested label Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want to use Parler tts for as the base model, is it possible? #312

I want to use Parler tts for as the base model, is it possible? #312

Hassan-jr commented Sep 19, 2024

dosubot bot commented Sep 19, 2024

meet1919 commented Sep 29, 2024 •

edited

Loading

Hassan-jr commented Sep 29, 2024

I want to use Parler tts for as the base model, is it possible? #312

I want to use Parler tts for as the base model, is it possible? #312

Comments

Hassan-jr commented Sep 19, 2024

dosubot bot commented Sep 19, 2024

meet1919 commented Sep 29, 2024 • edited Loading

Hassan-jr commented Sep 29, 2024

meet1919 commented Sep 29, 2024 •

edited

Loading