Catálogo de publicaciones - libros

Compartir en
redes sociales

Text, Speech and Dialogue: 8th International Conference, TSD 2005, Karlovy Vary, Czech Republic, September 12-15, 2005, Proceedings

Václav Matoušek ; Pavel Mautner ; Tomáš Pavelka (eds.)

En conferencia: 8º International Conference on Text, Speech and Dialogue (TSD) . Karlovy Vary, Czech Republic . September 12, 2005 - September 15, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Information Systems Applications (incl. Internet)

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-28789-6

ISBN electrónico

978-3-540-31817-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11551874_41

Mapping the Speech Signal onto Electromagnetic Articulography Trajectories Using Support Vector Regression

Asterios Toutios; Konstantinos Margaritis

We report work on the mapping between the speech signal and articulatory trajectories from the MOCHA database. Contrasting previous works that used Neural Networks for the same task, we employ Support Vector Regression as our main tool, and Principal Component Analysis as an auxiliary one. Our results are comparable, even though, due to training time considerations we use only a small portion of the available data.

- Speech | Pp. 318-325

doi: 10.1007/11551874_42

Automatic Transcription of Numerals in Inflectional Languages

Jan Zelinka; Jakub Kanis; Luděk Müller

In this paper we describe the part of the text preprocessing module in our text-to-speech synthesis system which converts numerals written as figures into a readable full-length form, which could be processed by a phonetic transcription module. The numerals conversion is a significant issue in inflectional language as Czech, Russian or Slovak because morphological and semantic information is necessary to make the conversion unambiguous. In the paper three part-of-speech tagging methods are compared. Furthermore, a method reducing the tagset to increase the numerals conversion accuracy is presented in the paper.

- Speech | Pp. 326-333

doi: 10.1007/11551874_43

Experimental Evaluation of Tree-Based Algorithms for Intonational Breaks Representation

Panagiotis Zervas; Gerasimos Xydas; Nikolaos Fakotakis; George Kokkinakis; Georgios Kouroupetroglou

The prosodic specification of an utterance to be spoken by a Text-to-Speech synthesis system can be devised in break indices, pitch accents and boundary tones. In particular, the identification of break indices formulates the intonational phrase breaks that affect all the forthcoming prosody-related procedures. In the present paper we use tree-structured predictors, and specifically the commonly used in similar tasks CART and the introduced C4.5 one, to cope with the task of break placement in the presence of shallow textual features. We have utilized two 500-utterance prosodic corpora offered by two Greek universities in order to compare the machine learning approaches and to argue on the robustness they offer for Greek break modeling. The evaluation of the resulted models revealed that both approaches were positively compared with similar works published for other languages, while the C4.5 method accuracy scaled from 1% to 2,7% better than CART.

- Speech | Pp. 334-341

doi: 10.1007/11551874_44

Compact Representation of Speech Using 2-D Cepstrum – An Application to Slovak Digits Recognition

Roman Jarina; Michal Kuba; Martin Paralic

HMM speech recogniser with a small number of acoustic observations based on 2-D cepstrum (TDC) is proposed. TDC represents both static and dynamic features of speech implicitly in matrix form. It is shown that TDC analysis enables a compact representation of speech signals. Thus a great advantage of the proposed model is a massive reduction of speech features used for recognition what lessens computational and memory requirements, so it may be favourable for limited-power ASR applications. Experiments on isolated Slovak digits recognition task show that the method gives comparable results as the conventional MFCC approach. For speech degraded by additive white noise, it reaches better performance than the MFCC method.

- Speech | Pp. 342-347

doi: 10.1007/11551874_45

An Alternative Way of Semantic Interpretation

Miloslav Konopík; Roman Mouček

In this work we deal with interpretation methods of speech utterances. We describe the basics of interpretation theory as well as a classic approach to interpretation. After that we suggest an alternative method based on modern knowledge in artificial intelligence. We describe the main points of that methodology; show its advantages, drawbacks and successfulness in selected restricted domain.

- Speech | Pp. 348-355

doi: 10.1007/11551874_46

Robust Rule-Based Method for Automatic Break Assignment in Russian Texts

Ilya Oparin

In this paper a new rule-based approach to break assignment for the Russian language is discussed. It is a flexible and robust method of segmentation of texts in Russian in prosodic units. We implemented it in the recent “Orator” text-to-speech (TTS) system. The model was developed to use for the inflective languages as an alternative both for statistic and for strict rule-based algorithms. It is designed in such a way that all potentially tunable context dependencies are brought up to the interface grammar and can be easily modified by linguists. The algorithm we developed performs well on different kinds of texts due to this simple and intuitive grammar built upon an elaborate mechanism of morpho-grammatical analysis. Juncture correct rate varies between more than 98% for simple literary texts and 85% for raw transcripts of spontaneous speech.

- Speech | Pp. 356-363

doi: 10.1007/11551874_47

Introduction of Improved UWB Speaker Verification System

Aleš Padrta; Jan Vaněk

In this paper, the improvements of the speaker verification system, which is used at Department of Cybernetics at University of West Bohemia, are introduced. The paper summarizes our actual pieces of knowledge in the acoustic modeling domain, in the domain of the model creation and in the domain of score normalization based on the universal background models. The constituent components of the state-of-art verification system were modified or replaced by virtue of the actual pieces of knowledge. A set of experiments was performed to evaluate and compare the performance of the improved verification system and the baseline verification system based on HTK-toolkit. The results prove that the improved verification system outperforms the baseline system in both of the reviewed criterions – the equal error rate and the time consumption.

- Speech | Pp. 364-370

doi: 10.1007/11551874_48

Formal Prosodic Structures and Their Application in NLP

Jan Romportl; Jindřich Matoušek

A formal prosody description framework is introduced together with its relation to language semantics and NLP. The framework incorporates deep prosodic structures based on a generative grammar of abstract prosodic functionally involved units. This grammar creates for each sentence a structure of immediate prosodic constituents in the form of a tree. A speech corpus manually annotated by such prosodic structures is presented and its quantitative characteristics are discussed.

- Speech | Pp. 371-378

doi: 10.1007/11551874_49

The VoiceTRAN Speech-to-Speech Communicator

Jerneja Žganec-Gros; France Mihelič; Tomaž Erjavec; Špela Vintar

The paper presents the design concept of the VoiceTRAN Communicator that integrates speech recognition, machine translation and text-to-speech synthesis using the DARPA Galaxy architecture. The aim of the project is to build a robust speech-to-speech translation communicator able to translate simple domain-specific sentences in the Slovenian-English language pair. The project represents a joint collaboration between several Slovenian research organizations that are active in human language technologies. We provide an overview of the task, describe the system architecture and individual servers. Further we describe the language resources that will be used and developed within the project. We conclude the paper with plans for evaluation of the VoiceTRAN Communicator.

- Speech | Pp. 379-384

doi: 10.1007/11551874_50

Cluster Analysis of Railway Directory Inquire Dialogs

Mikhail Alexandrov; Emilio Sanchis Arnal; Paolo Rosso

Cluster analysis of dialogs with transport directory service allows revealing the typical scenarios of dialogs, which is useful for designing automatic dialog systems. We show how to parameterize dialogs and how to control the process of clustering. The parameters include both data of transport service and features of passenger s behavior. Control of clustering consists in manipulating the parameter s weights and checking stability of the results. This technique resembles Makagonov s approach to the analysis of dweller s complaints to city administration. We shortly describe B. Stein s new MajorClust method and demonstrate its work on real person-to-person dialogs provided by Spanish railway service.

- Dialogue | Pp. 385-392