Speech and Vision Lab

  • Increase font size
  • Default font size
  • Decrease font size
Home About Us

About SVL

The thrust of our activity in the Speech and Vision Lab (SVL) at IIIT-H is on the development of natural input and output interfaces to a computer through speech and image. The focus is on development of speech input/output systems for Indian languages, with the objective of achieving speech translation from one Indian language to another. Another objective is to develop person authentication systems for providing secure access to information using biometrics that involves inputs in the form of speech, video, finger print, etc., and developing systems for content based information retrieval for nontextual data.

We have been focusing on the development of speech-to-text and text-to-speech system for Indian languages. An unrestricted text-to-speech system has been developed for Hindi and other Indian languages. The system incorporates prosodic rules derived for Hindi. Currently we are developing a speech-to-text system for Indian languages using syllable-like units as basic sound units. We are also working on speaker recognition tasks, both text-dependent and text-independent, mainly for secure access to information. Enhancement of degraded speech is another area of active pursuit in the laboratory over the past few years. Several new techniques have been developed for enhancement of speech degraded by additive noise and reverberation. The SVL has been active in developing signal processing algorithms for speech and image processing. Several signal processing methods have been developed for dealing with remotely sensed data. In particular, new models based on constraint satisfaction neural network have been developed for classification of multi-spectral remotely sensed data. The powerful features of neural network models are being explored for several tasks in speech, image and decision making.

The Speech and Vision lab is involved in research on

  • Speech signal processing
  • Speech-to-text conversion
  • Text-to-speech conversion
  • Speaker recognition
  • Speech enhancement
  • Applications of neural networks
  • Image processing
  • Person authentication using biometrics
  • Information retrieval using audio and video indexing

Current Thrust

We are currently working on several areas related to speech and vision. The key activities are as follows:

  • Event based analysis of speech
  • Speech enhancement using source features
  • Speech enhancement in multispeaker environment
  • Development of phonetic engine for Indian languages
  • Preparation of speech corpus for Indian languages
  • Speaker segmentation and tracking
  • Automatic prosody modeling and manipulation
  • Speaker recognition using source features
  • Information retrieval using audio and video indexing
  • Speaker verification using audio and visual clues
  • Person authentication using biometrics
  • Applications of neural networks


Login Form

Who's Online

We have 5 guests online