Catálogo de publicaciones - libros

Compartir en
redes sociales


SmartKom: Foundations of Multimodal Dialogue Systems

Wolfgang Wahlster (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-23732-7

ISBN electrónico

978-3-540-36678-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Dialogue Systems Go Multimodal: The SmartKom Experience

Wolfgang Wahlster

Multimodal dialogue systems exploit one of the major characteristics of humanhuman interaction: the coordinated use of different modalities. Allowing all of the modalities to refer to and depend upon each other is a key to the richness of multimodal communication. We introduce the notion of symmetric multimodality for dialogue systems in which all input modes (e.g., speech, gesture, facial expression) are also available for output, and vice versa. A dialogue system with symmetric multimodality must not only understand and represent the user’s multimodal input, but also its own multimodal output. We present an overview of the SmartKom system that provides full symmetric multimodality in a mixed-initiative dialogue system with an embodied conversational agent. SMARTKOM represents a new generation of multimodal dialogue systems that deal not only with simple modality integration and synchronization but cover the full spectrum of dialogue phenomena that are associated with symmetric multimodality (including crossmodal references, one-anaphora, and backchannelling). We show that SmartKom ’s plug-and-play architecture supports multiple recognizers for a single modality, e.g., the user’s speech signal can be processed by three unimodal recognizers in parallel (speech recognition, emotional prosody, boundary prosody). We detail SmartKom ’s three-tiered representation of multimodal discourse, consisting of a domain layer, a discourse layer, and a modality layer. We discuss the limitations of SmartKom and how they are overcome in the follow-up project SmartWeb. In addition, we present the research roadmap for multimodality addressing the key open research questions in this young field. To conclude, we discuss the economic and scientific impact of the SMARTKOM project, which has led to more than 50 patents and 29 spin-off products.

Palabras clave: Dialogue System; Multimodal Interface; Multimodal Interaction; Emotional Prosody; Deictic Gesture.

Part I - Introduction | Pp. 3-27

Facts and Figures About the SmartKom Project

Anselm Blocher

The SmartKom project Dialogue-Based Human Computer Interaction by Coordinated Analysis and Generation of Multiple Modalities was one of six lead projects in the area of human computer interaction funded by the Federal Ministry of Education and Research and the Federal Ministry of Economics and Labour of Germany. We describe the intention of this initiative and summarize the organizational and funding structure of the SmartKom project. The final functionalities of the demonstrator system are compiled as well as an overview of the reception of the project in the research community and in the media.

Palabras clave: Human Computer Interaction; Module Coordination; Lead Project; Augmented Reality System; Multimodal Interaction.

Part I - Introduction | Pp. 29-39

An Exemplary Interaction with SmartKom

Norbert Reithinger; Gerd Herzog

The different instantiations of the SmartKom demonstration system offer a broad range of application functions and sophisticated dialogue capabilities. We provide a first look at the final SmartKom prototype from the point of view of the end user. In particular, a typical interaction sequence is presented in order to illustrate the functionality of the integrated multimodal dialogue system.

Palabras clave: Scanning Area; Dialogue Protocol; Pedestrian Navigation; Document Scanning; Multimodal Dialogue.

Part I - Introduction | Pp. 41-52

The SmartKom Architecture: A Framework for Multimodal Dialogue Systems

Gerd Herzog; Norbert Reithinger

SmartKom provides an adaptive and reusable dialogue shell for multimodal interaction that has been employed successfully to realize fully-fledged prototype systems for various application scenarios. Taking the perspective of system architects, we review the overall design and specific architecture framework being applied within SmartKom . The basic design principles underlying our approach are described, and the different instantiations of the conceptual architecture are presented to illustrate the adaptibility and flexibility of the generic framework.

Palabras clave: Application Scenario; Service Component; Multimodal Interface; Multimodal Interaction; Intelligent User Interface.

Part II - Multimodal Input Analysis | Pp. 55-70

Modeling Domain Knowledge: Know-How and Know-What

Iryna Gurevych; Robert Porzel; Rainer Malaka

The approach to knowledge representation taken in the multimodal, multidomain, and multiscenario dialogue system — SmartKom — is presented. We focus on the ontological and representational issues and choices helping to construct an ontology, which is shared by multiple components of the system and can be reused in different projects and applied to various tasks. Finally, two applications of the ontology that highlight the usefulness of our approach are described.

Palabras clave: Knowledge Representation; Natural Language Processing; Dialogue System; Semantic Coherence; Natural Language Processing System.

Part II - Multimodal Input Analysis | Pp. 71-84

Speech Recognition

André Berton; Alfred Kaltenmeier; Udo Haiber; Olaf Schreiner

The human machine interaction of SmartKom is a very complex task, defined by natural, spontaneous language, speaker independence, large vocabularies, and background noises. Speech recognition is an integral part of the multimodal dialogue system. It transforms the acoustic input signal into an orthographic transcription representing the utterance of the speaker. This contribution discusses how to enhance and customize the speech recognizer for the SmartKom applications. Significant improvements were achieved by adapting the speech recognizer to the environment, to the speaker, and to the task. Speech recognition confidence measures were investigated to reject unreliable user input and to detect user input containing unknown words, i.e., words that are not contained in the vocabulary of the speech recognizer. Finally, we present new ideas for future work.

Palabras clave: Speech Recognition; Gaussian Mixture Model; Word Error Rate; Unknown Word; Speech Recognizer.

Part II - Multimodal Input Analysis | Pp. 85-107

Class-Based Language Model Adaptation

Martin C. Emele; Zica Valsan; Yin Hay Lam; Silke Goronzy

In this paper we introduce and evaluate two class-based language model adaptation techniques for adapting general n -gram-based background language models to a specific spoken dialogue task. The required background language models are derived from available newspaper corpora and Internet newsgroup collections. We followed a standard mixture-based approach for language model adaptation by generating several clusters of topic-specific language models and combined them into a specific target language model using different weights depending on the chosen application domain. In addition, we developed a novel word n -gram pruning technique for domain adaptation and proposed a new approach for thematic text clustering. This method relies on a new discriminative n -gram-based key term selection process for document clustering. These key terms are then used to automatically cluster the whole document collection. By selecting only relevant text clusters for language model training, we addressed the problem of generating task-specific language models. Different key term selection methods are investigated using perplexity as the evaluation measure. Automatically computed clusters are compared with manually labeled genre clusters, and the results provide a significant performance improvement depending on the chosen key term selection method.

Palabras clave: Language Model; Automatic Speech Recognition; Word Error Rate; Text Cluster; Star Trek.

Part II - Multimodal Input Analysis | Pp. 109-121

The Dynamic Lexicon

Silke Goronzy; Stefan Rapp; Martin Emele

The dynamic lexicon is one of the central knowledge sources in SmarTkom that provides the whole system with the capabability to dynamically update the vocabulary. The corresponding multilingual pronunciations, which are needed by all speech-related components, are automatically generated. Furthermore, a novel approach for generating nonnative pronunciation variants and therefore for adapting the speech recognizer to nonnative accents is presented. The capability to deal with nonnative accents is crucial for dialogue systems dealing with multimedia and therefore often with multilingual content.

Palabras clave: Speech Synthesis; Source Language; Dialogue System; Electronic Program Guide; Phonemic Representation.

Part II - Multimodal Input Analysis | Pp. 123-138

The Prosody Module

Viktor Zeißler; Johann Adelhardt; Anton Batliner; Carmen Frank; Elmar Nöth; Rui Ping Shi; Heinrich Niemann

In multimodal dialogue systems, several input and output modalities are used for user interaction. The most important modality for human computer interaction is speech. Similar to human human interaction, it is necessary in the human computer interaction that the machine recognizes the spoken word chain in the user’s utterance. For better communication with the user it is advantageous to recognize his internal emotional state because it is then possible to adapt the dialogue strategy to the situation in order to reduce, for example, anger or uncertainty of the user. In the following sections we describe first the state of the art in emotion and user state recognition with the help of prosody. The next section describes the prosody module . After that we present the experiments and results for recognition of user states. We summarize our results in the last section.

Part II - Multimodal Input Analysis | Pp. 139-152

The Sense of Vision: Gestures and Real Objects

Jens Racky; Michael Lützeler; Hans Röttger

Natural human communication is based on visual and acoustical signals. Thus a technical system needs the same senses to allow intuitive interaction. The acoustical interface has been described in earlier chapters. In this chapter we concentrate on the perception of conscious visual utterances, for example, hand gestures and handling of dedicated (real) objects, e.g., paper documents.

Palabras clave: Real Object; Gesture Recognition; Panoramic Image; Dynamic Gesture; Portable Network Graphic.

Part II - Multimodal Input Analysis | Pp. 153-165