HMM/Neural Network-Based System for Italian Conituous Digit Recognition

An Italian speaker-independent continuous-speech digit
recognizer is described. The CSLU Toolkit was used to develop
and implement the system. In the first set of experiments, the
SPK-IRST corpus, a collection of digit sentences recorded in a
clean environment, was used both for training and testing the
system. In the second set, a band-filtered version (between 300
Hz and 3400 Hz) of the SPK-IRST corpus was considered for
training, while the telephone PANDA-CSELT corpus was used
for testing the system. A hybrid HMM/NN architecture was
applied; in this architecture, a three-layer neural network is used
as a state emission probability estimator and the conventional
forward-backward algorithm is applied for estimating
continuous targets for the NN training patterns. The final
network, trained to estimate the probability of 116 contextdependent
phonetic categories at every 10-msec frame, was not
trained on binary target values, but on the probabilities of each
phonetic category belonging to each frame. Training and testing
will be described in detail and recognition results will be

Publication type: 
Contributo in atti di convegno
Author or Creator: 
Cosi P.
Hosom J.P.
American Institute of Physics, Melville [NY], USA
ICPhS-99 - XIV International Congress of Phonetic Sciences, pp. 1669–1672, San Francisco, California, USA, 14-18 August, 1999
Resource Identifier:
ISTC Author: 
Piero Cosi's picture
Real name: