Choice and effect of "segment size" #159

BenjSta · 2023-11-30T13:06:29Z

My question relates to the "segment size" parameter. You use 8192, which is 372 ms @ 22050 Hz sampling rate. If I have computed correctly, the receptive field width is roughly 300 ms (?) in the v1/v2 configurations. That would mean that during training most of the generated audio is affected by padding in the convolutions. How did you choose the "segment size" parameter? Is there a trade-off between reducing the effect of padding and achieving enough speaker variability within each batch in multi-speaker training. Or does the padding even act as a regularization?

Regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choice and effect of "segment size" #159

Choice and effect of "segment size" #159

BenjSta commented Nov 30, 2023

Choice and effect of "segment size" #159

Choice and effect of "segment size" #159

Comments

BenjSta commented Nov 30, 2023