Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Speaker Localization Using Excitation Source Information in Speech
Research Area: Uncategorized Year: 2005
Type of Publication: Article Keywords: correlation methods, delay estimation, mean square error methods, speech processing excitation source information, generalized cross correlation, root mean square error, speaker localization, speech production, time delay estimation
Authors: V.C. Raykar, B. Yegnanarayana, S.R.M. Prasanna, R. Duraiswami  
This paper presents the results of simulation and real room studies for localization of a moving speaker using information about the excitation source of speech production. The first step in localization is the estimation of time-delay from speech collected by a pair of microphones. Methods for time-delay estimation generally use spectral features that correspond mostly to the shape of vocal tract during speech production. Spectral features are affected by degradations due to noise and reverberation. This paper proposes a method for localizing a speaker using features that arise from the excitation source during speech production. Experiments were conducted by simulating different noise and reverberation conditions to compare the performance of the time-delay estimation and source localization using the proposed method with the results obtained using the spectrum-based generalized cross correlation (GCC) methods. The results show that the proposed method shows lower number of discrepancies in the estimated time-delays. The bias, variance and the root mean square error (RMSE) of the proposed method is consistently equal or less than the GCC methods. The location of a moving speaker estimated using the time-delays obtained by the proposed method are closer to the actual values, than those obtained by the GCC method.
Digital version