Multimodal Score: an ANVIL Based Annotation Scheme for Multimodal Audio-Video Analysis

Face-to-face communication is multimodal: we communicate with voice, face, eyes, hand, body. But just a little part of these communicative instruments have been studied thoroughly: while Linguistics, since 2000 years, has studied the rules that govern verbal behaviour, not so much has been done for all other modalities.
Instead, to fully understand human multimodal communication, our task would be to write down the "lexicon" and the "alphabet" of nonverbal signals (Poggi, 2001): that is, on the one hand, to find out the systematic correspondences between signals and meanings in each mode-specific communication system; on the other hand, to single out all the minimal elements that compose all signals in each communication system.
To discover the elements and the rules that make up communication systems, as well as the rules of their simultaneous and sequential combination with each other, is a useful thing for both theoretical purposes and practical applications, such as, among others, the construction of Embodied Agents (Cassell et. al. 2000).
But to do so, it is necessary to analyse corpuses of Multimodal Communication by using precise methods of segmentation, transcription, and annotation of signals in the different modalities.
In a sense, this is a somehow circular endeavour. Our first task is to construct the alphabet and the lexicon of a communication system - for instance, to find out the correspondences between particular patterns of gaze signals and their particular meanings; to discover these correspondences it is necessary to analyse numerous items of gaze, and to this goal you must use a procedure for this analysis. But once an alphabet or a lexicon is singled out, it would be much easier and clearer how to analyse further corpora, to such an extent that it could sometime be possible to provide a tool for automatic analysis of Multimodal Communication. This is why the construction of tools for the analysis and annotation of multimodal data is an endless job.
In the last ten years, several tools have been proposed for this task: for example, Martin et al. (2001), Kipp (2001), Ingenhoff and Schmitz (2003) and Annual Reports of ISLE and NITE EU Projects. In this paper we present the ANVIL(TM) -Score, a system for the annotation of multimodal data.

Tipo Pubblicazione: 
Contributo in atti di convegno
Author or Creator: 
Magno Caldognetto E.
Poggi I.
Cosi P.
Cavicchio F.
Merola G.
Source: 
LREC 2004 Workshop on ‘Multimodal Corpora, Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces’, pp. 29–33, Lisbon, Portugal, 25 May 2004
Date: 
2004
Resource Identifier: 
http://www.cnr.it/prodotto/i/93379
http://www2.pd.istc.cnr.it/Papers/EmanuelaMagno/em-LREC2004.pdf
Language: 
Eng