Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Robust pitch estimation in noisy speech using ZTW and group delay function
Research Area: Signal Processing Year: 2015
Type of Publication: In Proceedings Keywords: HNGD, Pitch, Noisy speech
Authors: Ravi Shankar Prasad, B. Yegnanarayana  
   
Note:
The proposed algorithm utilizes a speech analysis method called zero-time windowing (ZTW) where the signal is processed using a heavily decaying win- dow, and the spectral characteristics are highlighted using the numerator of the group delay function. The amplitude contour of dominant resonances in the spectra are extracted, and pro- cessed further using a Gaussian window. The resulting contour reflects the energy profile of the signal which is utilized for es- timation of the pitch values. The proposed algorithm is robust to degradations, and has been tested on several utterances with added noises.
Abstract:
Identification of pitch for speech signals recorded in noisy en- vironments is a fundamental and long persistent problem in speech research. Several time domain based techniques attempt to exploit the periodic nature of the waveform using autocorre- lation function and its variants. Other set of techniques utilize the harmonic structure in the spectral domain to identify pitch values. Either of these techniques suffer significant degrada- tion in their performance in cases of noisy speech signals with low SNRs. The paper presents a robust technique to identify pitch values for speech signals. The proposed algorithm utilizes a speech analysis method called zero-time windowing (ZTW) where the signal is processed using a heavily decaying win- dow, and the spectral characteristics are highlighted using the numerator of the group delay function. The amplitude contour of dominant resonances in the spectra are extracted, and pro- cessed further using a Gaussian window. The resulting contour reflects the energy profile of the signal which is utilized for es- timation of the pitch values. The proposed algorithm is robust to degradations, and has been tested on several utterances with added noises. The algorithm exhibits significant increment in performance when compared to existing techniques.
Digital version