Auditory Modelling for Speech Analysis and Recognition

Cochlear transformations of speech signals result in an auditory neural firing pattern sig-nificantly different from the spectrogram, a popular time-frequency-energy representation of speech. Phonetic features may correspond in a rather straightforward manner to the neural discharge pattern with which speech is coded by the auditory nerve. For these reasons, even an ear model that is just an approximation of physical reality appears to be a suitable system for identifying those aspects of the speech signal that are relevant for recognition.
A recently developed joint Synchrony/Mean-Rate (S/M-R) Auditory Speech Processing (ASP) scheme [8] was successfully applied in speech recognition tasks, where promising re-sults were obtained for speech segmentation and labelling [9]. Moreover, results reported elsewhere in the literature show that a combination of the same ASP scheme with multi-layer artificial neural networks produced an effective generalisation amongst speakers in classify¬ing vowels both for English [1] and Italian [2].
The joint S/M-R ASP scheme will be very briefly described and its application to the problem of speech segmentation and labelling, both for clean and noisy speech, will be intro¬duced and analysed.

Publication type: 
Contributo in volume
Author or Creator: 
Cosi P.
J. Wiley & sons, New York, USA
Visual Representations of Speech Signals, edited by M. Cooke, S. Beet and M. Crawford, pp. 205–212. New York: J. Wiley & sons, 1992
Resource Identifier:
ISTC Author: 
Piero Cosi's picture
Real name: