Catálogo de publicaciones - libros

Compartir en
redes sociales


Perception and Interactive Technologies: International Tutorial and Research Workshop, Kloster Irsee, PIT 2006, Germany, June 19-21, 2006 Proceedings.

Elisabeth André ; Laila Dybkjær ; Wolfgang Minker ; Heiko Neumann ; Michael Weber (eds.)

En conferencia: International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems (PIT) . Kloster Irsee, Germany . June 19, 2006 - June 21, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Image Processing and Computer Vision; User Interfaces and Human Computer Interaction

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-34743-9

ISBN electrónico

978-3-540-34744-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Help Strategies for Speech Dialogue Systems in Automotive Environments

Alexander Hof; Eli Hagen

In this paper we discuss advanced help concepts for speech dialogues. Based on current research results in the field of human-machine-interfaces, we describe two advanced help concepts based on hierarchical structuring of help dialogues. Furthermore we explain the test design for our usabilty experiments and present the methods and measures we used to collect our test data. Finally we report the results from our usability tests and discuss our findings.

- Spoken Dialogue Systems | Pp. 107-116

Information Fusion for Visual Reference Resolution in Dynamic Situated Dialogue

Geert-Jan M. Kruijff; John D. Kelleher; Nick Hawes

Human-Robot Interaction (HRI) invariably involves dialogue about objects in the environment in which the agents are situated. The paper focuses on the issue of resolving discourse references to such visual objects. The paper addresses the problem using strategies for (identifying that different occurrences concern the same object), and , (relating object references across different modalities). Core to these strategies are sensorimotoric coordination, and ontology-based mediation between content in different modalities. The approach has been fully implemented, and is illustrated with several working examples.

- Multimodal and Situated Dialogue Systems | Pp. 117-128

Speech and 2D Deictic Gesture Reference to Virtual Scenes

Niels Ole Bernsen

Humans make ample use of deictic gesture and spoken reference in referring to perceived phenomena in the spatial environment, such as visible objects, sound sources, tactile objects, or even sources of smell and taste. Multimodal and natural interactive systems developers are beginning to face the challenges involved in making systems correctly interpret user input belonging to this general class of multimodal references. This paper addresses a first fragment of the general problem, i.e., spoken and/or 2D on-screen deictic gesture reference to graphics output scenes. The approach is to confront existing sketchy theory with new data and generalise the results to what may be a more comprehensive understanding of the problem.

- Multimodal and Situated Dialogue Systems | Pp. 129-140

Combining Modality Theory and Context Models

Andreas Ratzka

This paper outlines a research plan with the purpose of combining model-based methodology and multimodal interaction. This work picks up frameworks such as modality theory, TYCOON and CARE and correlates them to approaches for such as the interaction constraints model and the unifying reference framework for multi-target user interfaces. This research shall result in methodological design support for multimodal interaction. The resulting framework will consist of methodological design support, such as a design pattern language for multimodal interaction and a set of model-based notational elements.

- Multimodal and Situated Dialogue Systems | Pp. 141-151

Visual Interaction in Natural Human-Machine Dialogue

Joseph Machrouh; Franck Panaget

In this article, we describe a visual component able to detect and track a human face in video streaming. This component is integrated into an embodied conversational agent. Depending on the presence or absence of a user in front of the camera and the orientation of his head, the system begins, continues, resumes or closes the interaction. Several constraints have been taken into account: a simple webcam, a low error rate and a minimum computing time that permits the whole system to run on a simple pc.

- Integration of Perceptive Technologies and Animation | Pp. 152-163

Multimodal Sensing, Interpretation and Copying of Movements by a Virtual Agent

Elisabetta Bevacqua; Amaryllis Raouzaiou; Christopher Peters; George Caridakis; Kostas Karpouzis; Catherine Pelachaud; Maurizio Mancini

We present a scenario whereby an agent senses, interprets and copies a range of facial and gesture expression from a person in the real-world. Input is obtained via a video camera and processed initially using computer vision techniques. It is then processed further in a framework for agent perception, planning and behaviour generation in order to perceive, interpret and copy a number of gestures and facial expressions corresponding to those made by the human. By , we mean that the copied behaviour may not be an exact duplicate of the behaviour made by the human and sensed by the agent, but may rather be based on some level of interpretation of the behaviour. Thus, the copied behaviour may be altered and need not share all of the characteristics of the original made by the human.

- Integration of Perceptive Technologies and Animation | Pp. 164-174

Perception of Dynamic Facial Expressions of Emotion

Holger Hoffmann; Harald C. Traue; Franziska Bachmayr; Henrik Kessler

In order to assess subjects’ ability to recognize facially expressed emotions it is more realistic to present dynamic instead of static facial expressions. So far, no time windows for the optimal presentation for that kind of stimuli have been reported. We presented dynamic displays where the face evolves from a neutral to an emotional expression to normal subjects. This study measured the optimal velocities in which facial expressions were perceived as being natural. Subjects (N=46) viewed morphed sequences with facial emotions and could adjust the velocity until satisfied with the natural appearance. Velocities for each emotion are reported. Emotions differed significantly in their optimal velocities.

- Poster Session | Pp. 175-178

Multi-level Face Tracking for Estimating Human Head Orientation in Video Sequences

Tobias Bausch; Pierre Bayerl; Heiko Neumann

We propose a hierarchical scheme of tracking facial regions in video sequences. The hierarchy uses the face structure, facial regions and their components, such as eyes and mouth, to achieve improved robustness against structural deformations and the temporal loss of image components due to, e.g., self-occlusion. The temporal deformation of facial eye regions is mapped to estimate the head orientation around the yaw axis. The performance of the algorithm is demonstrated for free head motions.

- Poster Session | Pp. 179-182

The Effect of Prosodic Features on the Interpretation of Synthesised Backchannels

Åsa Wallers; Jens Edlund; Gabriel Skantze

A study of the interpretation of prosodic features in backchannels (Swed ish /a/ and /m/) produced by speech synthesis is presented. The study is part of work-in-progress towards endowing conversational spoken dialogue systems with the ability to produce and use backchannels and other feedback.

- Poster Session | Pp. 183-187

Unsupervised Learning of Spatio-temporal Primitives of Emotional Gait

Lars Omlor; Martin A. Giese

Experimental and computational studies suggest that complex motor behavior is based on simpler spatio-temporal primitives. This has been demonstrated by application of dimensionality reduction techniques to signals from electrophysiological and EMG recordings during execution of limb movements. However, the existence of such primitives on the level of kinematics, i.e. the joint trajectories of complex human full-body movements remains less explored. Known blind source separation techniques, e.g. PCA and ICA, tend to extract relatively large numbers of components or source signals from such trajectories that are typically difficult to interpret. For the analysis of emotional human gait patterns, we present a new method for blind source separation that is based on a nonlinear generative model with additional time delays. The resulting model is able to approximate high-dimensional movement trajectories very accurately with very few source components. Combining this method with sparse regression, we identified spatio-temporal primitives for the encoding of individual emotions in gait. We verified that these primitives match features that are important for the perception of emotions from gait in psychophysical studies. This suggests the existence of emotion-specific movement primitives that might be useful for the simulation of emotional behavior in technical applications.

- Poster Session | Pp. 188-192