Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system
Research Area: Uncategorized Year: 2005
Type of Publication: Article Keywords: neural nets, speaker recognition dynamic time warping technique, fixed-text speaker verification system, neural network models, source feature, spectral features, suprasegmental feature
Authors: B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta  
   
Abstract:
This paper proposes a text-dependent (fixed-text) speaker verification system which uses different types of information for making a decision regarding the identity claim of a speaker. The baseline system uses the dynamic time warping (DTW) technique for matching. Detection of the end-points of an utterance is crucial for the performance of the DTW-based template matching. A method based on the vowel onset point (VOP) is proposed for locating the end-points of an utterance. The proposed method for speaker verification uses the suprasegmental and source features, besides spectral features. The suprasegmental features such as pitch and duration are extracted using the warping path information in the DTW algorithm. Features of the excitation source, extracted using the neural network models, are also used in the text-dependent speaker verification system. Although the suprasegmental and source features individually may not yield good performance, combining the evidence from these features seem to improve the performance of the system significantly. Neural network models are used to combine the evidence from multiple sources of information.
Digital version