|Research Area:||Speech Synthesis||Year:||2010|
|Type of Publication:||In Proceedings||Keywords:||speech synthesis, acoustic driven phrasing, semisupervised|
|Authors:||Kishore S. Prahallad, Veera Raghavendra Elluru, Alan W. Black|
|Book title:||5th International Conference on Speech Prosody (Speech Prosody 2010)|
In this paper, we propose a semi-supervised learning of acoustic driven phrase breaks and its usefulness for text-to-speech systems. In this work, we derive a set of initial hypothesis of phrase breaks in a speech signal using pause as an acoustic cue. As these initial estimates are obtained based on knowledge of speech production and speech signal processing, one could treat the hypothesized phrase break regions as labeled data. Features such as duration, F0 and energy are extracted from these labeled regions and a machine learning model is trained to perform the classification of these acoustic features as belonging to the class of a phrase break or not a phrase break. We then attempt to bootstrap the machine learning model using unlabeled data (i.e., the rest of the data).