Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choice and effect of "segment size" #159

Open
BenjSta opened this issue Nov 30, 2023 · 0 comments
Open

Choice and effect of "segment size" #159

BenjSta opened this issue Nov 30, 2023 · 0 comments

Comments

@BenjSta
Copy link

BenjSta commented Nov 30, 2023

My question relates to the "segment size" parameter. You use 8192, which is 372 ms @ 22050 Hz sampling rate. If I have computed correctly, the receptive field width is roughly 300 ms (?) in the v1/v2 configurations. That would mean that during training most of the generated audio is affected by padding in the convolutions. How did you choose the "segment size" parameter? Is there a trade-off between reducing the effect of padding and achieving enough speaker variability within each batch in multi-speaker training. Or does the padding even act as a regularization?

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant