Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Semi-Supervised Learning of Acoustic Driven Prosodic Phrase Breaks for Text-to-Speech Systems
Research Area: Speech Synthesis Year: 2010
Type of Publication: In Proceedings Keywords: speech synthesis, acoustic driven phrasing, semisupervised
Authors: Kishore S. Prahallad, Veera Raghavendra Elluru, Alan W. Black  
Book title: 5th International Conference on Speech Prosody (Speech Prosody 2010)
Address: Chicago, Illinois
Month: May
   
Abstract:
In this paper, we propose a semi-supervised learning of acoustic driven phrase breaks and its usefulness for text-to-speech systems. In this work, we derive a set of initial hypothesis of phrase breaks in a speech signal using pause as an acoustic cue. As these initial estimates are obtained based on knowledge of speech production and speech signal processing, one could treat the hypothesized phrase break regions as labeled data. Features such as duration, F0 and energy are extracted from these labeled regions and a machine learning model is trained to perform the classification of these acoustic features as belonging to the class of a phrase break or not a phrase break. We then attempt to bootstrap the machine learning model using unlabeled data (i.e., the rest of the data).
Digital version