Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Natural Multimodal Dialogue Systems

Jan C. J. van Kuppevelt ; Laila Dybkjær ; Niels Ole Bernsen (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-1-4020-3932-4

ISBN electrónico

978-1-4020-3933-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer 2005

Tabla de contenidos

Natural and Multimodal Interactivity Engineering - Directions and Needs

Niels Ole Bernsen; Laila Dybkjær

This introductory chapter discusses the field of natural and multimodal interactivity engineering and presents the following 15 chapters in this context. A brief presentation of each chapter is given, their contributions to specific natural and multimodal interactivity engineering needs are discussed, and the concepts of multimodality and natural interactivity are explained along with an overview of the modalities investigated in the 15 chapters.

- Natural and Multimodal Interactivity Engineering - Directions and Needs | Pp. 1-19

Social Dialongue with Embodied Conversational Agents

Timothy Bickmore; Justine Cassell

The functions of social dialogue between people in the context of performing a task is discussed, as well as approaches to modelling such dialogue in embodied conversational agents. A study of an agent’s use of social dialogue is presented, comparing embodied interactions with similar interactions conducted over the phone, assessing the impact these media have on a wide range of behavioural, task and subjective measures. Results indicate that subjects’ perceptions of the agent are sensitive to both interaction style (social vs. task-only dialogue) and medium.

Part I - Making Dialogues More Natural: Empirical Work and Applied Theory | Pp. 23-54

A First Experiment in Engagement for Human-Robot Interaction in Hosting Activities

Candace L. Sidner; Myroslava Dzikovska

To participate in collaborations with people, robots must not only see and talk with people but also make use of the conventions of conversation and of the means to be connected to their human counterparts. This chapter reports on initial research on engagement in human-human interaction and applications to stationary robots interacting with humans in hosting activities.

Part I - Making Dialogues More Natural: Empirical Work and Applied Theory | Pp. 55-76

Form

Craig H. Martell

Annotated corpora have played a critical role in speech and natural language research; and, there is an increasing interest in corpora-based research in sign language and gesture as well. We present a non-semantic, geometrically-based annotation scheme, FORM, which allows an annotator to capture the kinematic information in a gesture just from videos of speakers. In addition, FORM stores this gestural information in Annotation Graph format—allowing for easy integration of gesture information with other types of communication information, e.g., discourse structure, parts of speech, intonation information, etc.

Part II - Annotation and Analysis of Multimodal Data: Speech and Gesture | Pp. 79-95

On the Relationships Among Speech, Gestures, and Object Manipulation in Virtual Environments: Initial Evidence

Andrea Corradini; Philip R. Cohen

This chapter reports on a study whose goal was to investigate how people make use of gestures and spoken utterances while playing a videogame without the support of standard input devices. We deploy a Wizard of Oz technique to collect audio- video- and body movement-related data on people’s free use of gesture and speech input. Data was collected from ten subjects for up to 60 minutes of game interaction each. We provide information on preferential mode use, as well as the predictability of gesture based on the objects in the scene. The long-term goal of this on-going study is to collect natural and reliable data from different input modalities, which could provide training data for the design and development of a robust multimodal recognizer.

Part II - Annotation and Analysis of Multimodal Data: Speech and Gesture | Pp. 97-112

Analysing Multimodal Communication

Patrick G. T. Healey; Marcus Colman; Mike Thirlwell

There are few techniques available to inform the design of systems to support human-human interaction. Psycholinguistic models have the potential to fill this gap however existing approaches have some conceptual and practical limitations. This chapter presents a technique, based on the conversation analytic model of breakdown and repair, for modality and task independent analysis of communicative exchanges. The rationale for the approach is presented and a protocol for coding repair is described. The potential of this approach for analysing multimodal interactions is discussed.

Part II - Annotation and Analysis of Multimodal Data: Speech and Gesture | Pp. 113-129

Do Oral Messages Help Visual Search?

Noëlle Carbonell; Suzanne Kieffer

A preliminary experimental study is presented, that aims at eliciting the contribution of oral messages to facilitating visual search tasks on crowded displays. Results of quantitative and qualitative analyses suggest that appropriate verbal messages can improve both target selection time and accuracy. In particular, multimodal messages comprising a visual presentation of the isolated target together with absolute spatial oral information on its location in the displayed scene seem most effective. These messages also got top-ranking ratings from most subjects.

Part II - Annotation and Analysis of Multimodal Data: Speech and Gesture | Pp. 131-157

Geometric and Statistical Approaches to Audiovisual Segmentation

Trevor Darrell; John W. Fisher; Kevin W. Wilson; Michael R. Siracusa

Multimodal approaches are proposed for segmenting multiple speakers using geometric or statistical techniques. When multiple microphones and cameras are available, 3-D audiovisual tracking is used for source segmentation and array processing. With just a single camera and microphone, an information theoretic criteria separates speakers in a video sequence and associates relevant portions of the audio signal. Results are shown for each approach, and an initial integration effort is discussed.

Part II - Annotation and Analysis of Multimodal Data: Speech and Gesture | Pp. 159-180

The Psychology and Technology of Talking Heads: Applications in Language Learning

Dominic W. Massaro

Given the value of visible speech, our persistent goal has been to develop, evaluate, and apply animated agents to produce accurate visible speech. The goal of our recent research has been to increase the number of agents and to improve the accuracy of visible speech. Perceptual tests indicted positive results of this work. Given this technology and the framework of the fuzzy logical model of perception (FLMP), we have developed computer-assisted speech and language tutors for deaf, hard of hearing, and autistic children. Baldi, as the conversational agent, guides students through a variety of exercises designed to teach vocabulary and grammar, to improve speech articulation, and to develop linguistic and phonological awareness. The results indicate that the psychology and technology of Baldi holds great promise in language learning and speech therapy.

Part III - Animated Talking Heads and Evaluation | Pp. 183-214

Effective Interaction with Talking Animated Agents an Dialogue Systems

Björn Granström; David House

At the Centre for Speech Technology at KTH, we have for the past several years been developing spoken dialogue applications that include animated talking agents. Our motivation for moving into audiovisual output is to investigate the advantages of multimodality in human-system communication. While the mainstream character animation area has focussed on the naturalness and realism of the animated agents, our primary concern has been the possible increase of intelligibility and efficiency of interaction resulting from the addition of a talking face. In our first dialogue system, Waxholm, the agent used the deictic function of indicating specific information on the screen by eye gaze. In another project, Synface, we were specifically concerned with the advantages in intelligibility that a talking face could provide. In recent studies we have investigated the use of facial gesture cues to convey such dialogue-related functions as feedback and turn-taking as well as prosodic functions such as prominence. Results show that cues such as eyebrow and head movement can independently signal prominence. Current results also indicate that there can be considerable differences in cue strengths among visual cues such as smiling and nodding and that such cues can contribute in an additive manner together with auditory prosody as cues to different dialogue functions. Results from some of these studies are presented in the chapter along with examples of spoken dialogue applications using talking heads.

Part III - Animated Talking Heads and Evaluation | Pp. 215-243