Catálogo de publicaciones - libros
Perception and Interactive Technologies: International Tutorial and Research Workshop, Kloster Irsee, PIT 2006, Germany, June 19-21, 2006 Proceedings.
Elisabeth André ; Laila Dybkjær ; Wolfgang Minker ; Heiko Neumann ; Michael Weber (eds.)
En conferencia: International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems (PIT) . Kloster Irsee, Germany . June 19, 2006 - June 21, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Artificial Intelligence (incl. Robotics); Image Processing and Computer Vision; User Interfaces and Human Computer Interaction
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-34743-9
ISBN electrónico
978-3-540-34744-6
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11768029_11
Help Strategies for Speech Dialogue Systems in Automotive Environments
Alexander Hof; Eli Hagen
In this paper we discuss advanced help concepts for speech dialogues. Based on current research results in the field of human-machine-interfaces, we describe two advanced help concepts based on hierarchical structuring of help dialogues. Furthermore we explain the test design for our usabilty experiments and present the methods and measures we used to collect our test data. Finally we report the results from our usability tests and discuss our findings.
- Spoken Dialogue Systems | Pp. 107-116
doi: 10.1007/11768029_12
Information Fusion for Visual Reference Resolution in Dynamic Situated Dialogue
Geert-Jan M. Kruijff; John D. Kelleher; Nick Hawes
Human-Robot Interaction (HRI) invariably involves dialogue about objects in the environment in which the agents are situated. The paper focuses on the issue of resolving discourse references to such visual objects. The paper addresses the problem using strategies for (identifying that different occurrences concern the same object), and , (relating object references across different modalities). Core to these strategies are sensorimotoric coordination, and ontology-based mediation between content in different modalities. The approach has been fully implemented, and is illustrated with several working examples.
- Multimodal and Situated Dialogue Systems | Pp. 117-128
doi: 10.1007/11768029_13
Speech and 2D Deictic Gesture Reference to Virtual Scenes
Niels Ole Bernsen
Humans make ample use of deictic gesture and spoken reference in referring to perceived phenomena in the spatial environment, such as visible objects, sound sources, tactile objects, or even sources of smell and taste. Multimodal and natural interactive systems developers are beginning to face the challenges involved in making systems correctly interpret user input belonging to this general class of multimodal references. This paper addresses a first fragment of the general problem, i.e., spoken and/or 2D on-screen deictic gesture reference to graphics output scenes. The approach is to confront existing sketchy theory with new data and generalise the results to what may be a more comprehensive understanding of the problem.
- Multimodal and Situated Dialogue Systems | Pp. 129-140
doi: 10.1007/11768029_14
Combining Modality Theory and Context Models
Andreas Ratzka
This paper outlines a research plan with the purpose of combining model-based methodology and multimodal interaction. This work picks up frameworks such as modality theory, TYCOON and CARE and correlates them to approaches for such as the interaction constraints model and the unifying reference framework for multi-target user interfaces. This research shall result in methodological design support for multimodal interaction. The resulting framework will consist of methodological design support, such as a design pattern language for multimodal interaction and a set of model-based notational elements.
- Multimodal and Situated Dialogue Systems | Pp. 141-151
doi: 10.1007/11768029_15
Visual Interaction in Natural Human-Machine Dialogue
Joseph Machrouh; Franck Panaget
In this article, we describe a visual component able to detect and track a human face in video streaming. This component is integrated into an embodied conversational agent. Depending on the presence or absence of a user in front of the camera and the orientation of his head, the system begins, continues, resumes or closes the interaction. Several constraints have been taken into account: a simple webcam, a low error rate and a minimum computing time that permits the whole system to run on a simple pc.
- Integration of Perceptive Technologies and Animation | Pp. 152-163
doi: 10.1007/11768029_16
Multimodal Sensing, Interpretation and Copying of Movements by a Virtual Agent
Elisabetta Bevacqua; Amaryllis Raouzaiou; Christopher Peters; George Caridakis; Kostas Karpouzis; Catherine Pelachaud; Maurizio Mancini
We present a scenario whereby an agent senses, interprets and copies a range of facial and gesture expression from a person in the real-world. Input is obtained via a video camera and processed initially using computer vision techniques. It is then processed further in a framework for agent perception, planning and behaviour generation in order to perceive, interpret and copy a number of gestures and facial expressions corresponding to those made by the human. By , we mean that the copied behaviour may not be an exact duplicate of the behaviour made by the human and sensed by the agent, but may rather be based on some level of interpretation of the behaviour. Thus, the copied behaviour may be altered and need not share all of the characteristics of the original made by the human.
- Integration of Perceptive Technologies and Animation | Pp. 164-174
doi: 10.1007/11768029_17
Perception of Dynamic Facial Expressions of Emotion
Holger Hoffmann; Harald C. Traue; Franziska Bachmayr; Henrik Kessler
In order to assess subjects’ ability to recognize facially expressed emotions it is more realistic to present dynamic instead of static facial expressions. So far, no time windows for the optimal presentation for that kind of stimuli have been reported. We presented dynamic displays where the face evolves from a neutral to an emotional expression to normal subjects. This study measured the optimal velocities in which facial expressions were perceived as being natural. Subjects (N=46) viewed morphed sequences with facial emotions and could adjust the velocity until satisfied with the natural appearance. Velocities for each emotion are reported. Emotions differed significantly in their optimal velocities.
- Poster Session | Pp. 175-178
doi: 10.1007/11768029_18
Multi-level Face Tracking for Estimating Human Head Orientation in Video Sequences
Tobias Bausch; Pierre Bayerl; Heiko Neumann
We propose a hierarchical scheme of tracking facial regions in video sequences. The hierarchy uses the face structure, facial regions and their components, such as eyes and mouth, to achieve improved robustness against structural deformations and the temporal loss of image components due to, e.g., self-occlusion. The temporal deformation of facial eye regions is mapped to estimate the head orientation around the yaw axis. The performance of the algorithm is demonstrated for free head motions.
- Poster Session | Pp. 179-182
doi: 10.1007/11768029_19
The Effect of Prosodic Features on the Interpretation of Synthesised Backchannels
Åsa Wallers; Jens Edlund; Gabriel Skantze
A study of the interpretation of prosodic features in backchannels (Swed ish /a/ and /m/) produced by speech synthesis is presented. The study is part of work-in-progress towards endowing conversational spoken dialogue systems with the ability to produce and use backchannels and other feedback.
- Poster Session | Pp. 183-187
doi: 10.1007/11768029_20
Unsupervised Learning of Spatio-temporal Primitives of Emotional Gait
Lars Omlor; Martin A. Giese
Experimental and computational studies suggest that complex motor behavior is based on simpler spatio-temporal primitives. This has been demonstrated by application of dimensionality reduction techniques to signals from electrophysiological and EMG recordings during execution of limb movements. However, the existence of such primitives on the level of kinematics, i.e. the joint trajectories of complex human full-body movements remains less explored. Known blind source separation techniques, e.g. PCA and ICA, tend to extract relatively large numbers of components or source signals from such trajectories that are typically difficult to interpret. For the analysis of emotional human gait patterns, we present a new method for blind source separation that is based on a nonlinear generative model with additional time delays. The resulting model is able to approximate high-dimensional movement trajectories very accurately with very few source components. Combining this method with sparse regression, we identified spatio-temporal primitives for the encoding of individual emotions in gait. We verified that these primitives match features that are important for the perception of emotions from gait in psychophysical studies. This suggests the existence of emotion-specific movement primitives that might be useful for the simulation of emotional behavior in technical applications.
- Poster Session | Pp. 188-192