Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home Publications
Analysis of laugh signals for automatic detection and synthesis
Research Area: Speech Synthesis Year: 2010
Type of Publication: Mastersthesis Keywords: Laughter, epoch, pitch period, strength of excitation, breathiness, loudness, zero-frequency resonator, synthesis
Authors: Sudheer K.  
   
Abstract:
Laughter is a nonverbal vocalization that occur often in continuous speech. It is produced by the speech production mechanism using a highly variable physiological process. The vocalized expression of laughter varies across gender, individuals and context. Despite its variability, laughter is perceived naturally by humans. Since laughter is produced by the human speech production mechanism, spectral fea- tures were generally used for the study of laughter acoustics. This work mainly aims at showing the significance of excitation information for analysis of laugh signals. We proposed acoustic features which are motivated from the production characteristics of laughter. The features are based on pitch period (T 0 ), the strength of excitation derived from zero-frequency filtered signal, amount of breathiness and loudness measure. It is observed that there will be sudden bursts of air flow through the vocal tract in the case of laughter. This will result in faster vibration of the vocal folds, and hence reduc- tion in the pitch period. Apart from decrease in the pitch period, there is also a raising pattern in the pitch period contour. Similar observations are also made in case of strength of excitation. The strength of excitation rises sharply and then falls almost at the same rate. Since there will be more air flow, laughter is typically accompanied by some amount of breathiness. It will be reflected perceptually as less louder and more noisy. These loudness and non-determenistic component (noise) in the signal are estimated using mea- sures based on Hilbert envelope. A method is proposed using these features for detecting laughter in continuous speech. The method is tested on a noisy data (TV show data) and a clean data (AMI Corpus) and the results are reported. A method is also proposed for synthesizing isolated laughter by modifying the above features. The perceptual significance of each of the features is studied using an analysis by synthesis based approach. The affect of each of the features are studied by deempha- sizing them from some original laugh signals, and performing perceptual evaluation on the obtained signals. The study indicated that pitch period pattern is the most significant factor, followed by call-intercall duration, breathiness and strength of excitation.
Digital version