-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anonymization dataset and model format #10
Comments
Will reply soon
…On Wed, Jan 24, 2024, 15:07 Wen-Chin Huang (unilight) < ***@***.***> wrote:
From the README (
https://github.com/DigitalPhonetics/VoicePAT?tab=readme-ov-file#anonymization)
it says that to anonymize my own data I should modify the following fields
in the config file:
data_dir: # path to original data in Kaldi-format for anonymization
results_dir: # path to location for all (intermediate) results of the anonymization
models_dir: # path to models location
Just wondering what exactly is the Kaldi-format. I guess it refers to a
text file with each line of the format <id> <wav path>, but just want to
double-check.
The README also says:
Pretrained models for this anonymization can be found at https://github.
com/DigitalPhonetics/speaker-anonymization/releases/tag/v2.0
<https://github.com/DigitalPhonetics/speaker-anonymization/releases/tag/v2.0>
and earlier releases.
But the link contains several zip files to download and it is very unclear
what should be done here.
Would appreciate it if some more details could be provided. I totally
understand this toolkit is under construction -- just raising my questions
here.
—
Reply to this email directly, view it on GitHub
<#10>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AECMKFLLT2IPOQUPK45LJ4DYQEIQPAVCNFSM6AAAAABCI2HQ7WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TQMZUGYZDGNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Regarding the first part of your question (Kaldi format), Kaldi is a Swiss army knife for various speech processing tasks. Kaldi format is the dataset organization used by this toolkit, for a complete reference viewing https://kaldi-asr.org/doc/data_prep.html helps. Alternatively you can request access to the VoicePrivacy Challenge datasets, the instructions for this are available at https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022. This framework primarily utilizes the files The function |
Regarding the second part of your question (installation), currently Bash script |
Hi @unilight, thanks for the question. Yeah, the toolkit is still under construction and will update soon. If you would like to use your own anonymized data, simply use the VoicePAT pretrained ASV and ASR models for evaluation:
|
Thank you for the replies! So for now it would be better to use the
Do you mean that I should prepare one or several folders each containing the wav files I want to anonymize? |
Yes.
Yes, but currently, the toolkit only supports 12 dev+test datasets provided by VoicePrivacy challenge, These datasets include wav.scp/trial/utt2spk/spk2utt files, and they are also indicated in the config file, https://github.com/DigitalPhonetics/VoicePAT/blob/vpc/configs/eval_pre_from_anon_datadir.yaml#L7-L27 The first step of the evaluation script is to prepare wav.scp/trial/utt2spk/spk2utt for anonymized data and evaluation subdatasets. You can find the implementation details here https://github.com/DigitalPhonetics/VoicePAT/blob/vpc/run_evaluation.py#L154-L175 We strictly follow the VPC data design, I understand this is complicated at the beginning. Sorry for any confusion. If you would like to use your own anonymized data instead of the VPC datasets, you will need to prepare your own data(including wav.scp/trial/utt2spk/spk2utt) and modify(or skip) https://github.com/DigitalPhonetics/VoicePAT/blob/vpc/run_evaluation.py#L154-L175. |
@xiaoxiaomiao323 Thank you for the kind replies (and sorry for getting back so late). So I tried running the VPC2022 baseline repo and I think I have a better understanding of how the directories should look right now. I have another question. Say I have developed an anonymization system by myself and I want to use this toolkit to do evaluation on the VPC2022 dataset. After anonymizing the official 12 dev/test sets, is there anything else I should do other than just putting them in separate folders? Do I need to prepare the wav.scp/uttspk/...etc? Also can you suggest what modifications I should make to the config files? |
Hi, glad to know! We updated the readme.
Noted: the VPC2024 plan (the challenge plan will release soon) to remove VCTK dev/test datasets, so the entry
If you still want to include VCTK, please modify the entry
|
@xiaoxiaomiao323 Thank you for the reply! I mainly want to align with the evaluation protocol in VPC2022, which, in my understanding, requires to train |
No problem! After many testing runs, we found that this really depends on the hard drives. |
@xiaoxiaomiao323 Thank you for the reply! (I know you probably do not have the official answer but) just wondering whether training |
I agree with your opinion. Actually we already decided to not use |
@xiaoxiaomiao323 I see, thank you for the answers! :) So to understand the evaluation process I synced the latest
which leads to error in |
@unilight, sorry I think this is just a name mistake, when did you download "data.zip"? this should no problem if you download the new version of "data.zip" https://github.com/DigitalPhonetics/VoicePAT/blob/vpc/01_download_data_model.sh#L65
And rerun. |
@xiaoxiaomiao323 Thank you for the reply! Yes, I indeed downloaded the files earlier (in late December). I'll rename and rerun, and let you know if I have more problems. (Sorry for running the scripts while you are still working in progress...) |
@unilight , no problem, thanks for your patience. Feel free to post any questions. |
@xiaoxiaomiao323 I have finished running I am trying to interpret the numbers and compare them with those I find on the VoicePAT paper but I am having a hard time. I have the following results:
Can you kindly tell me which numbers to look? Also I am very interested in GVD, and am wondering which variables in the config should I modify to get those GVD numbers? |
Yeah other people also pointed it out. Really depends on I/O speed of the machine. Try increase num_workers next time, it helps. For EER results:
2)$ASV_{eval}^{anon}$ you trained using anonymized LibriSpeech-360
The results you listed are obtained from
And we didn't test DSP system when we wrote the VoicePAT paper.
GVD computed by
comment asr, if you want to skip,
|
From the README (https://github.com/DigitalPhonetics/VoicePAT?tab=readme-ov-file#anonymization) it says that to anonymize my own data I should modify the following fields in the config file:
Just wondering what exactly is the Kaldi-format. I guess it refers to a text file with each line of the format
<id> <wav path>
, but just want to double-check.The README also says:
But the link contains several zip files to download and it is very unclear what should be done here.
Would appreciate it if some more details could be provided. I totally understand this toolkit is under construction -- just raising my questions here.
The text was updated successfully, but these errors were encountered: