Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Exploiting phone-class specific landmarks for refinement of segment boundaries in TTS databases
Research Area: Speech Synthesis Year: 2011
Type of Publication: In Proceedings Keywords: speech segmentation, under-resourced language, landmarks
Authors: Vijayaditya P., Kishore S. Prahallad  
High accuracy speech segmentation methods invariably depend on manually labelled data. However under-resourced languages do not have annotated speech corpora required for training these segmentors. In this paper we propose a boundary refinement technique which uses knowledge of phone-class specific subband energy events, in place of manual labels, to guide the refinement process. The use of this knowledge enables proper placement of boundaries in regions with multiple spectral discontinuities in close proximity. It also helps in the correction of large alignment errors. The proposed refinement technique provides boundaries with an accuracy of 82% within 20ms of actual boundary. Combining the proposed technique with iterative isolated HMM training technique boosts the accuracy to 89%, without the use of any manually labelled data.
Digital version