Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to use Parler tts for as the base model, is it possible? #312

Open
Hassan-jr opened this issue Sep 19, 2024 · 3 comments
Open

I want to use Parler tts for as the base model, is it possible? #312

Hassan-jr opened this issue Sep 19, 2024 · 3 comments
Labels
question Further information is requested

Comments

@Hassan-jr
Copy link

OpenVoice is amazing and I like it a lot. However, I wanted to experiment with different TTS models like Parler tts from huggingface as the base model instead of the melotts.

How to achieve that?
thanks

Copy link

dosubot bot commented Sep 19, 2024

Hi @Hassan-jr! I'm here to assist you with any bugs, questions, or contributions you have. Let's solve this together!

Yes, it is possible to use a different TTS model like Parler TTS as the base model. You can refer to a similar open issue where a solution was suggested for replacing the base model with something different [1].

Here is a summarized approach to achieve this:

  1. Generate Audio with Parler TTS: Use Parler TTS to generate an audio file of a sentence (preferably longer than 10 seconds).

  2. Extract Tone Color Vector: Use the se_extractor function to extract the tone color vector from the generated audio. The se_extractor function is implemented in openvoice/se_extractor.py and its input and output specifications are as follows:

    • Input:
      • audio_path (str): The path to the audio file.
      • vc_model (object): The voice cloning model object.
      • target_dir (str, optional): The directory where the processed files will be saved. Default is 'processed'.
      • vad (bool, optional): A flag indicating whether to use Voice Activity Detection (VAD) for splitting the audio. Default is True.
    • Output:
      • Returns a tuple containing:
        • The speaker embedding extracted by the vc_model.
        • The audio_name which is a string derived from the audio file name, model version, and a hash of the audio content.
    def get_se(audio_path, vc_model, target_dir='processed', vad=True):
        device = vc_model.device
        version = vc_model.version
        print("OpenVoice version:", version)
    
        audio_name = f"{os.path.basename(audio_path).rsplit('.', 1)[0]}_{version}_{hash_numpy_array(audio_path)}"
        se_path = os.path.join(target_dir, audio_name, 'se.pth')
    
        if vad:
            wavs_folder = split_audio_vad(audio_path, target_dir=target_dir, audio_name=audio_name)
        else:
            wavs_folder = split_audio_whisper(audio_path, target_dir=target_dir, audio_name=audio_name)
        
        audio_segs = glob(f'{wavs_folder}/*.wav')
        if len(audio_segs) == 0:
            raise NotImplementedError('No audio segments found!')
        
        return vc_model.extract_se(audio_segs, se_save_path=se_path), audio_name
  3. Use the Tone Color Vector: Once you have the tone color vector, you can use it as input along with a new sentence read by your base speaker (Parler TTS) and the tone color vector of the target speaker to the tone color converter.

For more detailed steps, you can refer to demo_part2.ipynb in the OpenVoice repository, which provides an example of using an external TTS model as the base speaker.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the question Further information is requested label Sep 19, 2024
@meet1919
Copy link

meet1919 commented Sep 29, 2024

Does this help in better voice cloning? Because using melotts' english model as the base, cloned voice has vast difference between reference voice

@Hassan-jr
Copy link
Author

@meet1919 I am not sure. I wanted to experiment with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants