Catálogo de publicaciones - libros

Compartir en
redes sociales


Perception and Interactive Technologies: International Tutorial and Research Workshop, Kloster Irsee, PIT 2006, Germany, June 19-21, 2006 Proceedings.

Elisabeth André ; Laila Dybkjær ; Wolfgang Minker ; Heiko Neumann ; Michael Weber (eds.)

En conferencia: International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems (PIT) . Kloster Irsee, Germany . June 19, 2006 - June 21, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Image Processing and Computer Vision; User Interfaces and Human Computer Interaction

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-34743-9

ISBN electrónico

978-3-540-34744-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Guiding Eye Movements for Better Communication and Augmented Vision

Erhardt Barth; Michael Dorr; Martin Böhme; Karl Gegenfurtner; Thomas Martinetz

This paper briefly summarises our results on gaze guidance such as to complement the demonstrations that we plan to present at the workshop. Our goal is to integrate gaze into visual communication systems by measuring and guiding eye movements. Our strategy is to predict a set of about ten salient locations and then change the probability for one of these candidates to be attended: for one candidate the probability is increased, for the others it is decreased. To increase saliency, in our current implementation, we show a natural-scene movie and overlay red dots very briefly such that they are hardly perceived consciously. To decrease the probability, for example, we locally reduce the temporal frequency content of the movie. We here present preliminary results, which show that the three steps of our above strategy are feasible. The long-term goal is to find the optimal real-time video transformation that minimises the difference between the actual and the desired eye movements without being obtrusive. Applications are in the area of vision-based communication, augmented vision, and learning.

- Head Pose and Eye Gaze Tracking | Pp. 1-8

Detection of Head Pose and Gaze Direction for Human-Computer Interaction

Ulrich Weidenbacher; Georg Layher; Pierre Bayerl; Heiko Neumann

In this contribution we extend existing methods for head pose estimation and investigate the use of local image phase for gaze detection. Moreover we describe how a small database of face images with given ground truth for head pose and gaze direction was acquired. With this database we compare two different computational approaches for extracting the head pose. We demonstrate that a simple implementation of the proposed methods without extensive training sessions or calibration is sufficient to accurately detect the head pose for human-computer interaction. Furthermore, we propose how eye gaze can be extracted based on the outcome of local filter responses and the detected head pose. In all, we present a framework where different approaches are combined to a single system for extracting information about the attentional state of a person.

- Head Pose and Eye Gaze Tracking | Pp. 9-19

Modelling and Simulation of Spontaneous Perception Switching with Ambiguous Visual Stimuli in Augmented Vision Systems

Norbert Fürstenau

A behavioral nonlinear dynamics model of multistable perception due to ambiguous visual stimuli is presented. The perception state is formalized as the dynamic phase variable v(t) of a recursive process with cosinuidal transfer characteristic which is created by superposition (interference) of neuronal mean fields. The two parameters = difference of meaning of alternative percepts and G = attention parameter, control the transition between unambiguous and ambiguous stimuli, e.g. from stimulus off to stimulus on, and attention fatigue respectively. Mean field interference with delayed phase feedback enables transitions between chaotic and limit cycle attractors v(t) representing the perception states. Perceptual reversals are induced by attention fatigue G(t) (  adaptive gain g(v)) with time constant , and attention bias which determines the relative duration of the percepts. The coupled attention – perception dynamics with an additive stochastic noise term reproduces the experimentally observed Γ-distribution of the reversal time statistics. Mean reversal times of typically 3 – 5 s as reported in the literature, are correctly predicted if delay T is associated with the delay of 40 ms between stimulus onset and primary visual cortex (V1) response. Numerically determined perceptual transition times of 3 – 5 T are in reasonable agreement with stimulus – conscious perception delay of 150 – 200 ms [11]. Eigenfrequencies of the limit cycle oscillations are in the range of 10 – 100 Hz, in agreement with typical EEG frequencies.

- Modelling and Simulation of Perception | Pp. 20-31

Neural Network Architecture for Modeling the Joint Visual Perception of Orientation, Motion, and Depth

Daniel Oberhoff; Andy Stynen; Marina Kolesnik

We present a methodology and a neural network architecture for the modeling of low- and mid-level visual processing. The network architecture uses local filter operators as basic processing units which can be combined into a network via flexible connections. Using this methodology we design a neuronal network that models the joint processing of oriented contrast changes, their motion and depth. The network reflects the structure and the functionality of visual pathways. We present network responses to a stereo video sequence, highlight the correspondence to biological counterparts, outline the limitations of the methodology, and discuss specific aspects of the processing and the extent of visual tasks that can be successfully carried out by the suggested neuronal architecture.

- Modelling and Simulation of Perception | Pp. 32-39

AutoSelect: What You Want Is What You Get: Real-Time Processing of Visual Attention and Affect

Nikolaus Bee; Helmut Prendinger; Arturo Nakasone; Elisabeth André; Mitsuru Ishizuka

While objects of our focus of attention (“where we are looking at”) and accompanying affective responses to those objects is part of our daily experience, little research exists on investigating the relation between attention and positive affective evaluation. The purpose of our research is to process users’ emotion and attention in real-time, with the goal of designing systems that may recognize a user’s affective response to a particular visually presented stimulus in the presence of other stimuli, and respond accordingly. In this paper, we introduce the system that automatically detects a user’s preference based on eye movement data and physiological signals in a two-alternative forced choice task. In an exploratory study involving the selection of neckties, the system could correctly classify subjects’ choice of in 81%. In this instance of AutoSelect, the gaze ‘cascade effect’ played a dominant role, whereas pupil size could not be shown as a reliable predictor of preference.

- Integrating Information from Multiple Channels | Pp. 40-52

Emotion Recognition Using Physiological and Speech Signal in Short-Term Observation

Jonghwa Kim; Elisabeth André

Recently, there has been a significant amount of work on the recognition of emotions from visual, verbal or physiological information. Most approaches to emotion recognition so far concentrate, however, on a single modality while work on the integration of multimodal information, in particular on fusing physiological signals with verbal or visual data, is scarce. In this paper, we analyze various methods for fusing physiological and vocal information and compare the recognition results of the bimodal recognition approach with the results of the unimodal approach.

- Integrating Information from Multiple Channels | Pp. 53-64

Visual Attention in Auditory Display

Thorsten Mahler; Pierre Bayerl; Heiko Neumann; Michael Weber

The interdisciplinary field of image sonification aims at the transformation of images to auditory signals. It brings together researchers from different fields of computer science like sound synthesizing, data mining and human computer interaction. Its goal is the use of sound and all its attributes to display the data sets itself and thus making the highly developed human aural system usable for data analysis. Unlike previous approaches we aim to sonify images of any kind. We propose that models of visual attention and visual grouping can be utilized to dynamically select relevant visual information to be sonified. For the auditory synthesis we employ an approach, which takes advantage of the sparseness of the selected input data. The presented approach proposes a combination of data sonification approaches, such as auditory scene generation, and models of human visual perception. It extends previous pixel-based transformation algorithms by incorporating mid-level vision coding and high-level control. The mapping utilizes elaborated sound parameters that allow non-trivial orientation and positioning in 3D space.

- Visual and Auditory Displays Driven by Perceptive Principles | Pp. 65-72

A Perceptually Optimized Scheme for Visualizing Gene Expression Ratios with Confidence Values

Hans A. Kestler; Andre Müller; Malte Buchholz; Thomas M Gress; Günther Palm

Gene expression data studies are often concerned with comparing experimental versus control conditions. Ratios of gene expression values, fold changes, are therefore commonly used as biologically meaningful markers. Visual representations are inevitable for the explorative analysis of data. Fold changes alone are no reliable markers, since low signal intensities may lead to unreliable ratios and should therefore be visually marked less important than the more trustworthy ratios of larger expression values.

We propose a new visualization scheme showing ratios and their confidence together in one single diagram, enabling a more precise explorative assessment of gene expression data. Basis of the visualization scheme are near-uniform perceptible color scales improving the readability of the commonly used red-green color scale. A sub-sampling algorithm for optimizing color scales is presented. Instead of difficult to read bivariate color maps encoding two variables into a single color we propose the use of colored patches (rectangles) of different sizes representing the absolute values, while representing ratios by a univariate color map. Different pre-processing steps for visual bandwidth limitation and reliability value estimation are proposed.

The proposed bivariate visualization scheme shows a clear perceptible order in ratio and reliability values leading to better and clearer interpretable diagrams. The proposed color scales were specifically adapted to human visual perception. Psychophysical optimized color scales are superior to traditional sRGB red-green maps. This leads to an improved explorative assessment of gene expression data.

- Visual and Auditory Displays Driven by Perceptive Principles | Pp. 73-84

Combining Speech User Interfaces of Different Applications

Dongyi Song

This paper describes a novel approach to automatically or semi-automatically constructing a multi-application dialogue system based on existing dialogue systems. Nowadays there exist different dialogue systems with general architecture supporting different applications. Yet there is no efficient way to endow the multi-application supported dialogue systems with the corresponding applications. The approach represented in this paper provides an efficient way to integrate different applications into one dialogue system and addresses three issues in multi-application dialogue systems – transparent application switching, task sharing and information sharing by merging the dialogue specifications of different applications into a unified dialogue specification as automatically as possible, which provides necessary domain information for a multi-application supported dialogue system.

- Spoken Dialogue Systems | Pp. 85-96

Learning and Forgetting of Speech Commands in Automotive Environments

Alexander Hof; Eli Hagen

In this paper we deal with learning and forgetting of speech commands in speech dialogue systems. We discuss two mathematical models on learning and four models on forgetting. Furthermore we describe the experiments used to determine the learning and forgetting curve in our environment. These findings are compared to the theoretical models and based on this we deduce the equations that describe learning and forgetting in our automotive environment most adequately.

- Spoken Dialogue Systems | Pp. 97-106