Statistical Spectral Envelope Transformation applied to Emotional Speech | Istituto di Scienze e Tecnologie della Cognizione

Transformation of sound by statistical techniques is a promising method for a new range of digital audio effects. In this paper a data driven voice transformation algorithm is used to alter the timbre of a neutral (non-emotional) voice in order to reproduce a particular emotional vocal timbre. Perceptually based Mel-Cepstral analysis and Mel Log Spectral Approximation digital filter are used to represent the speech timbre and to synthesize speech with modified spectral envelope. The transformation function adopts a GMM (Gaussian Mixture Model) based parametrization in order convert the spectral envelopes. Experiments with the first and second order derivatives of the mel-cepstral coefficients have been undertaken to prove the benefit of including dynamic information in the model. The proposed algorithm has been evaluated by means of objective measures in the neutral-to-happy and neutral-to-sad tasks.

Publication type:

Contributo in volume

Author or Creator:

Fabio Tesser

Enrico Zovato

Piero Cosi

Publisher:

Helmut Schmidt University - University of the Federal Armed Forces, Hamburg, DEU

Source:

Proceedings of DAFx-10 13th International Conference on Digital Audio Effects, edited by Hannes Pomberger, Franz Zotter And Alois Sontacchi, pp. 479–482. Hamburg: Helmut Schmidt University - University of the Federal Armed Forces, 2010

Date:

2010

Resource Identifier:

http://www.cnr.it/prodotto/i/140185

https://dx.doi.org/10.1002/9781119991298

info:doi:10.1002/9781119991298

http://www.scopus.com/record/display.url?eid=2-s2.0-84872703286&origin=inward

urn:isbn:978-3-200-01940-9

Language:

Eng

ISTC Author:

Real name:

Piero Cosi