Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Discriminating Neutral and Emotional Speech using Neural Networks
Research Area: Neural Networks Year: 2014
Type of Publication: In Proceedings  
Authors: Sudarsana Reddy Kadiri, P. Gangamohan, B. Yegnanarayana  
In this paper, we address the issue of speaker-specific emotion detection (neu- tral vs emotion) from speech signals with models for neutral speech as reference. As emotional speech is produced by the hu- man speech production mechanism, the emotion information is expected to lie in the features of both excitation source and the vocal tract system. Linear Prediction residual is used as the excitation source component and Linear Prediction Coef- ficients as the vocal tract system com- ponent. A pitch synchronous analysis is performed. Separate Autoassociative Neural Network models are developed to capture the information specific to neu- tral speech, from the excitation and the vocal tract system components. Exper- imental results show that the excitation source carries more information than the vocal tract system. The accuracy neu- tral vs emotion classification using excita- tion source information is 91%, which is 8% higher than the accuracy obtained us- ing vocal tract system information. The Berlin EMO-DB database is used in this study. It is observed that, the proposed emotion detection system provides an im- provement of approximately 10% using excitation source features and 3% using vocal tract system features over the re- cently proposed emotion detection which uses the energy and pitch contour model- ing with functional data analysis.