Catálogo de publicaciones - libros

Compartir en
redes sociales


Text, Speech and Dialogue: 8th International Conference, TSD 2005, Karlovy Vary, Czech Republic, September 12-15, 2005, Proceedings

Václav Matoušek ; Pavel Mautner ; Tomáš Pavelka (eds.)

En conferencia: 8º International Conference on Text, Speech and Dialogue (TSD) . Karlovy Vary, Czech Republic . September 12, 2005 - September 15, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Information Systems Applications (incl. Internet)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-28789-6

ISBN electrónico

978-3-540-31817-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Language Modeling Experiments with Random Forests

Frederick Jelinek

L. Breiman recently introduced the concept of random forests (randomly constructed collection of decision trees) for classification. We have modified the method for regression and applied it to language modeling for speech recognition. Random forests achieve excellent results in both perplexity and error rate. They can be regarded as a language model in HMM form and have interesting properties that achieve very robust smoothing.

- Invited Talks | Pp. 1-1

The Role of Speech in Multimodal Human-Computer Interaction

Hynek Hermansky; Petr Fousek; Mikko Lehtonen

Natural audio-visual interface between human user and machine requires understanding of user’s audio-visual commands. This does not necessarily require full speech and image recognition. It does require, just as the interaction with any working animal does, that the machine is capable of reacting to certain particular sounds and/or gestures while ignoring the rest. Towards this end, we are working on sound identification and classification approaches that would ignore most of the acoustic input and react only to a particular sound (keyword).

- Invited Talks | Pp. 2-8

Why Is the Recognition of Spontaneous Speech so Hard?

Sadaoki Furui; Masanobu Nakamura; Tomohisa Ichiba; Koji Iwano

Although speech, derived from reading texts, and similar types of speech, e.g. that from reading newspapers or that from news broadcast, can be recognized with high accuracy, recognition accuracy drastically decreases for spontaneous speech. This is due to the fact that spontaneous speech and read speech are significantly different acoustically as well as linguistically. This paper reports analysis and recognition of spontaneous speech using a large-scale spontaneous speech database “Corpus of Spontaneous Japanese (CSJ)”. Recognition results in this experiment show that recognition accuracy significantly increases as a function of the size of acoustic as well as language model training data and the improvement levels off at approximately 7M words of training data. This means that acoustic and linguistic variation of spontaneous speech is so large that we need a very large corpus in order to encompass the variations. Spectral analysis using various styles of utterances in the CSJ shows that the spectral distribution/difference of phonemes is significantly reduced in spontaneous speech compared to read speech. Experimental results also show that there is a strong correlation between mean spectral distance between phonemes and phoneme recognition accuracy. This indicates that spectral reduction is one major reason for the decrease of recognition accuracy of spontaneous speech.

- Invited Talks | Pp. 9-22

On the Acoustic Components in Multimedia Presentations

Klaus Fellbaum; Bettina Ketzmerick

This paper describes some of our activities in the area of elearning with a special focus on the speech (or more general: on the acoustic) component. We found out that in many (probably the most) electronic tutorials the main emphasis is on the visual presentation with partly excellent 3D video and graphical material, but the acoustic components are more or less primitive ones or forgotten. That’s why we made several investigations on how speech, music, sounds, noise etc. can enrich and improve the elearning material, above all when these components are used with their synergy and completion to the visual components in the sense of a real multimedia presentation.

- Invited Talks | Pp. 23-32

Fusing Data Streams in Continuous Audio-Visual Speech Recognition

Leon J. M. Rothkrantz; Jacek C. Wojdeł; Pascal Wiggers

Speech recognition still lacks robustness when faced with changing noise characteristics. Automatic lip reading on the other hand is not affected by acoustic noise and can therefore provide the speech recognizer with valuable additional information, especially since the visual modality contains information that is complementary to information in the audio channel. In this paper we present a novel way of processing the video signal for lip reading and a post-processing data transformation that can be used alongside it. The presented Lip Geometry Estimation (LGE) is compared with other geometry- and image intensity-based techniques typically deployed for this task. A large vocabulary continuous audio-visual speech recognizer for Dutch using this method has been implemented. We show that a combined system improves upon audio-only recognition in the presence of noise.

- Invited Talks | Pp. 33-44

Speech Based User Interface for Users with Special Needs

Pavel Slavík; Vladislav Němec; Adam J. Sporka

The number of people using computers is permanently increasing in last years. Not all potential users have all capabilities that allow them to use computers without obstacles. This is especially true for handicapped and elderly users. For this class of users a special approach for design and implementation of user interfaces is needed. The missing capabilities of these users must be substituted by capabilities that these users have. In most of cases the use of sounds and speech offers a natural solution to this problem. In the paper the outline of problems related to special user interfaces will be discussed. In further the examples of application of user interfaces using special forms of speech and related acoustic communication will be given.

- Invited Talks | Pp. 45-55

Automatic Construction of a Valency Lexicon of Czech Adjectives

Drahomíra “Johanka” Doležalová

This paper describes conversion of a surface valency lexicon of Czech verbs to a surface valency lexicon of adjectives that can be derived from these verbs and that use their (possibly modified) valency frames. After preparing the necessary data by hand, the conversion can be fully automatic and every change of the source lexicon can be automatically reflected in the destination lexicon. We have successfully converted the verb valency lexicon “Brief” with about 15,000 verbs to a valency lexicon of about 27,000 deverbal adjectives. The paper also describes some interesting peculiarities in the process of creating passive adjectives and their valency frames.

- Text | Pp. 56-60

WebTranscribe – An Extensible Web-Based Speech Annotation Framework

Christoph Draxler

WebTranscribe is a platform independent and extensible web-based annotation framework for speech research and spoken language technology. The framework consists of an annotation editor front-end running as a Java Web Start application on a client computer, and a DBMS on a server. The framework implements a “select – annotate – save” annotation workflow.

The annotation capabilities are determined by annotation editors, implemented as plug-ins to the general framework. An annotation configuration generally consists of an editor, editing buttons, a signal display and a quality assessment panel. A configuration file determines which plug-ins to use for a given annotation project.

WebTranscribe has been used in numerous projects at BAS and has reached a mature state now. The software is freely available [19].

- Text | Pp. 61-68

Learning Syntactic Patterns Using Boosting and Other Classifier Combination Schemas

András Hócza; László Felföldi; András Kocsor

This paper presents a method for the syntactic parsing of Hungarian natural language texts using a machine learning approach. This method learns tree patterns with various phrase types described by regular expressions from an annotated corpus. The PGS algorithm, an improved version of the RGLearn method, is developed and applied as a classifier in classifier combination schemas. Experiments show that classifier combinations, especially the Boosting algorithm, can effectively improve the recognition accuracy of the syntactic parser.

- Text | Pp. 69-76

Text Classification with Tournament Methods

Louise Guthrie; Wei Liu; Yunqing Xia

This paper compares the effectiveness of -way (>2) classification using a probabilistic classifier to the use of multiple binary probabilistic classifiers. We describe the use of binary classifiers in both Round Robin and Elimination tournaments, and compare both tournament methods and -way classification when determining the language of origin of speakers (both native and non-native English speakers) speaking English. We conducted hundreds of experiments by varying the number of categories as well as the categories themselves. In all experiments the tournament methods performed better than the -way classifier, and of these tournament methods, on average, Round Robin performs slightly better than the Elimination tournament.

- Text | Pp. 77-84