Catálogo de publicaciones - libros

Compartir en
redes sociales


Affective Computing and Intelligent Interaction: 2nd International Conference, ACII 2007 Lisbon, Portugal, September 12-14, 2007 Proceedings

Ana C. R. Paiva ; Rui Prada ; Rosalind W. Picard (eds.)

En conferencia: 2º International Conference on Affective Computing and Intelligent Interaction (ACII) . Lisbon, Portugal . September 12, 2007 - September 14, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74888-5

ISBN electrónico

978-3-540-74889-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

A Systematic Comparison of Different HMM Designs for Emotion Recognition from Acted and Spontaneous Speech

Johannes Wagner; Thurid Vogt; Elisabeth André

In this work we elaborate the use of hidden Markov models (HMMs) for speech emotion recognition as a dynamic alternative to static modelling approaches. Since previous work on this field does not yet define a clear line which HMM design should be prioritised for this task, we run a systematic analysis of different HMM configurations. Furthermore, experiments are carried out on an acted and a spontaneous emotions corpus, since little is known about the suitability of HMMs for spontaneous speech. Additionally, we consider two different segmentation levels, namely words and utterances. Results are compared with the outcome of a support vector machine classifier trained on global statistics features. While for both databases similar performance was observed on utterance level, the HMM-based approach outperformed static classification on word level. However, setting up general guidelines which kind of models are best suited appeared to be rather difficult.

- Affective Speech Processing | Pp. 114-125

On the Necessity and Feasibility of Detecting a Driver’s Emotional State While Driving

Michael Grimm; Kristian Kroschel; Helen Harris; Clifford Nass; Björn Schuller; Gerhard Rigoll; Tobias Moosmayr

This paper brings together two important aspects of the human-machine interaction in cars: the psychological aspect and the engineering aspect. The psychologically motivated part of this study addresses questions such as it is important to automatically assess the driver’s affective state, which states are important and how a machine’s response should look like. The engineering part studies the emotional state of a driver can be estimated by extracting acoustic features from the speech signal and mapping them to an emotion state in a multidimensional, continuous-valued emotion space. Such a feasibility study is performed in an experiment in which spontaneous, authentic emotional utterances are superimposed by car noise of several car types and various road surfaces.

- Affective Speech Processing | Pp. 126-138

Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

Bogdan Vlasenko; Björn Schuller; Andreas Wendemuth; Gerhard Rigoll

Opposing the pre-dominant turn-wise statistics of acoustic Low-Level-Descriptors followed by static classification we re-investigate dynamic modeling directly on the frame-level in speech-based emotion recognition. This seems beneficial, as it is well known that important information on temporal sub-turn-layers exists. And, most promisingly, we integrate this frame-level information within a state-of-the-art large-feature-space emotion recognition engine. In order to investigate frame-level processing we employ a typical speaker-recognition set-up tailored for the use of emotion classification. That is a GMM for classification and MFCC plus speed and acceleration coefficients as features. We thereby also consider use of multiple states, respectively an HMM. In order to fuse this information with turn-based modeling, output scores are added to a super-vector combined with static acoustic features. Thereby a variety of Low-Level-Descriptors and functionals to cover prosodic, speech quality, and articulatory aspects are considered. Starting from 1.4k features we select optimal configurations including and excluding GMM information. The final decision task is realized by use of SVM. Extensive test-runs are carried out on two popular public databases, namely EMO-DB and SUSAS, to investigate acted and spontaneous data. As we face the current challenge of speaker-independent analysis we also discuss benefits arising from speaker normalization. The results obtained clearly emphasize the superior power of integrated diverse time-levels.

- Affective Speech Processing | Pp. 139-147

Characterizing Emotion in the Soundtrack of an Animated Film: Credible or Incredible?

Noam Amir; Rachel Cohen

In this study we present a novel emotional speech corpus, consisting of dialog that was extracted from an animated film. This type of corpus presents an interesting compromise between the sparsity of emotion found in spontaneous speech, and the contrived emotion found in speech acted solely for research purposes. The dialog was segmented into 453 short units and judged for emotional content by native and non-native English speakers. Emotion was rated on two scales: Activation and Valence. Acoustic analysis gave a comprehensive set of 100 features covering F0, intensity, voice quality and spectrum. We found that Activation is more strongly correlated to our acoustic features than Valence. Activat-ion was correlated to several types of features, whereas Valence was correlated mainly to intensity related features. Further, ANOVA analysis showed some interesting contrasts between the two scales, and interesting differences in the judgments of native vs. non-native English speakers.

- Affective Speech Processing | Pp. 148-158

Time- and Amplitude-Based Voice Source Correlates of Emotional Portrayals

Irena Yanushevskaya; Michelle Tooher; Christer Gobl; Ailbhe Ní Chasaide

A detailed analysis of glottal source parameters is presented for emotional portrayals which included both low and high activation states: , , , and , , . Time- and amplitude-based glottal source parameters, F0, RG, RK, RA, OQ, FA, EE, and RD were analysed. The results show statistically significant differentiation of all emotions in terms of all the glottal parameters analysed. Results furthermore suggest that the dynamics of the individual parameters are likely to be important in differentiating among the emotions.

- Affective Speech Processing | Pp. 159-170

Temporal Organization in Listeners’ Perception of the Speakers’ Emotions and Characteristics: A Way to Improve the Automatic Recognition of Emotion-Related States in Human Voice

Valérie Maffiolo; Noël Chateau; Gilles Le Chenadec

We propose to improve the automatic detection and characterization of emotion-related expressions in human voice by an approach based on human auditory perception. In order to determine the temporal hierarchical organization in human perception of the speakers’ emotions and characteristics, a listening test has been set up with seventy-two listeners. The corpus was constituted of eighteen voice messages extracted from a real-life application. Message segments of different temporal length have been listened to by listeners who were asked to verbalize their perception. Fourteen meta-categories have been obtained and related to age, gender, regional accent, timbre, personality, emotion, sound quality, expression style and so on. The temporal windows of listening necessary for listeners to perceive and verbalize these categories are defined and could underlie the building of sub-models relevant to the automatic recognition of emotion-related expressions.

- Affective Speech Processing | Pp. 171-178

Recognizing Social Attitude in Speech-Based Dialogues with an ECA

Fiorella de Rosis; Anton Batliner; Nicole Novielli; Stefan Steidl

We propose a method to recognize the ’social attitude’ of users towards an Embodied Conversational Agent (ECA) from a combination of linguistic and prosodic features. After describing the method and the results of applying it to a corpus of dialogues collected with a Wizard of Oz study, we discuss the advantages and disadvantages of statistical and machine learning methods if compared with other knowledge-based methods.

- Affective Text and Dialogue Processing | Pp. 179-190

Assessing Sentiment of Text by Semantic Dependency and Contextual Valence Analysis

Mostafa Al Masum Shaikh; Helmut Prendinger; Ishizuka Mitsuru

Text is not only an important medium to describe facts and events, but also to effectively communicate information about the writer’s (positive or negative) sentiment underlying an opinion, and an affect or emotion (e.g. happy, fearful, surprised etc.). We consider sentiment assessment and emotion sensing from text as two different problems, whereby sentiment assessment is a prior task to emotion sensing. This paper presents an approach to sentiment assessment, i.e. the recognition of negative or positive sense of a sentence. We perform semantic dependency analysis on the semantic verb frames of each sentence, and apply a set of rules to each dependency relation to calculate the contextual valence of the whole sentence. By employing a domain-independent, rule-based approach, our system is able to automatically identify sentence-level sentiment. Empirical results indicate that our system outperforms another state-of-the-art approach.

- Affective Text and Dialogue Processing | Pp. 191-202

How Rude Are You?: Evaluating Politeness and Affect in Interaction

Swati Gupta; Marilyn A. Walker; Daniela M. Romano

Recent research on conversational agents emphasises the need to build affective conversational systems with social intelligence. Politeness is an integral part of socially appropriate and affective conversational behaviour, e.g. consider the difference in the pragmatic effect of realizing the same communicative goal with either “Get me a glass of water mate!” or “I wonder if I could possibly have some water please?” This paper presents POLLy (Politeness for Language Learning), a system which combines a spoken language generator with an artificial intelligence planner to model Brown and Levinson’s theory of politeness in collaborative task-oriented dialogue, with the ultimate goal of providing a fun and stimulating environment for learning English as a second language. An evaluation of politeness perceptions of POLLy’s output shows that: (1) perceptions are generally consistent with Brown and Levinson’s predictions for choice of form and for discourse situation, i.e. utterances to strangers need to be much more polite than those to friends; (2) our indirect strategies which should be the politest forms, are seen as the rudest; and (3) English and Indian native speakers of English have different perceptions of politeness.

- Affective Text and Dialogue Processing | Pp. 203-217

Textual Affect Sensing for Sociable and Expressive Online Communication

Alena Neviarouskaya; Helmut Prendinger; Mitsuru Ishizuka

In this paper, we address the tasks of recognition and interpretation of affect communicated through text messaging. The evolving nature of language in online conversations is a main issue in affect sensing from this media type, since sentence parsing might fail while syntactical structure analysis. The developed Affect Analysis Model was designed to handle not only correctly written text, but also informal messages written in abbreviated or expressive manner. The proposed rule-based approach processes each sentence in sequential stages, including symbolic cue processing, detection and transformation of abbreviations, sentence parsing, and word/phrase/sentence-level analyses. In a study based on 160 sentences, the system result agrees with at least two out of three human annotators in 70% of the cases. In order to reflect the detected affective information and social behaviour, an avatar was created.

- Affective Text and Dialogue Processing | Pp. 218-229