Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Speaker change detection in casual conversations using excitation source features
Research Area: Uncategorized Year: 2008
Type of Publication: Article Keywords: Speaker change detection; Multispeaker conversation; Autoassociative neural network (AANN) models; Excitation source features; Linear prediction (LP) residual
Authors: Dhananjaya N., B. Yegnanarayana  
In this paper we propose a method for speaker change detection using features of excitation source of the speech production mechanism. The method uses neural network models to capture the speaker-specific information from a signal that represents predominantly the excitation source. The focus in this paper is on speaker change detection in casual telephone conversations, in which short (<5 s) speaker turns are common. Excitation source features are a better choice for modeling a speaker, when limited amount of speech data is available, when compared to the vocal tract system features. Linear prediction residual is used as an estimate of the excitation source signal. Autoassociative neural network models are proposed to capture the higher order relations among the samples of the residual signal. Speaker models are generated for every one second of voiced speech from the first few seconds of the conversation. These models are used to detect the speaker change points. Performance of the proposed method for speaker change detection is evaluated on a database containing several two-speaker conversations.
Digital version