Authors: Tian Huey Teh*, Vivian Hu*, Devang S Ram Mohan, Zack Hodari, Christopher Wallis, Tomás Gómez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark Gales, Simon King (*: contact)
Abstract: Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech. Most efforts have focused on sophisticated neural architectures intended to better model the data distribution. Yet, in evaluations it is generally found that no single model is preferred for all input texts. This suggests an approach that has rarely been used before for Text-to-Speech: an ensemble of models.
We apply ensemble learning to prosody prediction. We construct simple ensembles of prosody predictors by varying either model architecture or model parameter values. To automatically select amongst the models in the ensemble when performing Text-to-Speech, we propose a novel, and computationally trivial, variance-based criterion.
We demonstrate that even a small ensemble of prosody predictors yields useful diversity, which, combined with the proposed selection criterion, outperforms any individual model from the ensemble.
We conducted a listening test to measure preference for the individual models (CONV and RNN) in the ensemble using an A/B. We then created a human ORACLE by choosing the rendition of each utterance that was most preferred by listeners.
Have a listen to some of the samples produced by each model. (We encourage readers to listen to each rendition before revealing the “answer”.) What do you think? Do you agree with the crowd favorite? Was your preference based on intonation or some other factor?
Male
Male
Female
Female
Show ORACLE choice
Using the F0 variance-based criterion proposed in our paper, we are able to predict listener preference more accurately than using just a single model. However, greater variance doesn’t always correspond to crowd preference.
Below, we’ve shared some samples where the selection criterion agrees with the ORACLE choice and some where it does not. Which do you agree with?
Male
Male
Male
Female
Female
Female
Show ORACLE choice
Male
Male
Male
Female
Female
Female
Show ORACLE choice
© 2023 Papercup Technologies Ltd.