-
-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HF Gradio demo: sudden gender flip for slider #202
Comments
Hi, thanks for including Toucan in the Arena! The gender slider is not related to the pitch, it specifies a rotation around a principal component axis in the latent space of the speaker embedding generator. If no voice reference is given, the system will use an artificial speaker embedding that is not linked to any real human, but is instead generated by a GAN that learned to match the distribution of speaker embeddings. This generation process can be manipulated by this rotation. The direction of the rotation is not always the same, since a generated artificial speaker embedding might be flipped upside-down through a rotation on another axis. So the slider does not have a static direction, we can never know if the slider is masculine or feminine to the left or the right. It is different for every speaker embedding, and a new set of speaker embeddings is generated with every restart of the space. So every day there are new voices. For the arena, it's probably a good idea to keep the speaker always the same, right? I can make the random seed static, then we always have the same voices. Or, since the arena only supports English, I can make a separate space from which you can use the API that uses the real default embedding and not a generated artificial one. |
Ok, so I am not going crazy. Also cloning never works for me. It still seems to take the generated artifical speaker. I am thinking of using multiple voices and languages for the arena in the future. But for now it is a single female American-English voice. So I would still need a more deterministic outcome. [edit] |
I made a space that you can use for this. It features just a female American English voice and the inputs are greatly simplified, it's just the text and nothing else. https://huggingface.co/spaces/Flux9665/EnglishToucan Without the artificial speaker embeddings, I'm expecting much better and much more consistent results, that more accurately reflect what the model is capable of. |
I've added Toucan to the TTS Arena fork by using the MassivelyMultilingualTTS space.
Arena: https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
TTS Space: https://huggingface.co/spaces/Flux9665/MassivelyMultilingualTTS
After some time the "Gender of artificial Voice" slider values are flipped. I take it it always was meant to mean that -10 is the lowest average pitch and +10 the highest. Therefore it is a male/female slider in that order. Yet it sometimes flips in reverse.
Is something in the model reconfiguring?
Right now, a positive value means male gender on the space.
The text was updated successfully, but these errors were encountered: