Catálogo de publicaciones - libros
Text, Speech and Dialogue: 8th International Conference, TSD 2005, Karlovy Vary, Czech Republic, September 12-15, 2005, Proceedings
Václav Matoušek ; Pavel Mautner ; Tomáš Pavelka (eds.)
En conferencia: 8º International Conference on Text, Speech and Dialogue (TSD) . Karlovy Vary, Czech Republic . September 12, 2005 - September 15, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Information Systems Applications (incl. Internet)
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-28789-6
ISBN electrónico
978-3-540-31817-0
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11551874_41
Mapping the Speech Signal onto Electromagnetic Articulography Trajectories Using Support Vector Regression
Asterios Toutios; Konstantinos Margaritis
We report work on the mapping between the speech signal and articulatory trajectories from the MOCHA database. Contrasting previous works that used Neural Networks for the same task, we employ Support Vector Regression as our main tool, and Principal Component Analysis as an auxiliary one. Our results are comparable, even though, due to training time considerations we use only a small portion of the available data.
- Speech | Pp. 318-325
doi: 10.1007/11551874_42
Automatic Transcription of Numerals in Inflectional Languages
Jan Zelinka; Jakub Kanis; Luděk Müller
In this paper we describe the part of the text preprocessing module in our text-to-speech synthesis system which converts numerals written as figures into a readable full-length form, which could be processed by a phonetic transcription module. The numerals conversion is a significant issue in inflectional language as Czech, Russian or Slovak because morphological and semantic information is necessary to make the conversion unambiguous. In the paper three part-of-speech tagging methods are compared. Furthermore, a method reducing the tagset to increase the numerals conversion accuracy is presented in the paper.
- Speech | Pp. 326-333
doi: 10.1007/11551874_43
Experimental Evaluation of Tree-Based Algorithms for Intonational Breaks Representation
Panagiotis Zervas; Gerasimos Xydas; Nikolaos Fakotakis; George Kokkinakis; Georgios Kouroupetroglou
The prosodic specification of an utterance to be spoken by a Text-to-Speech synthesis system can be devised in break indices, pitch accents and boundary tones. In particular, the identification of break indices formulates the intonational phrase breaks that affect all the forthcoming prosody-related procedures. In the present paper we use tree-structured predictors, and specifically the commonly used in similar tasks CART and the introduced C4.5 one, to cope with the task of break placement in the presence of shallow textual features. We have utilized two 500-utterance prosodic corpora offered by two Greek universities in order to compare the machine learning approaches and to argue on the robustness they offer for Greek break modeling. The evaluation of the resulted models revealed that both approaches were positively compared with similar works published for other languages, while the C4.5 method accuracy scaled from 1% to 2,7% better than CART.
- Speech | Pp. 334-341
doi: 10.1007/11551874_44
Compact Representation of Speech Using 2-D Cepstrum – An Application to Slovak Digits Recognition
Roman Jarina; Michal Kuba; Martin Paralic
HMM speech recogniser with a small number of acoustic observations based on 2-D cepstrum (TDC) is proposed. TDC represents both static and dynamic features of speech implicitly in matrix form. It is shown that TDC analysis enables a compact representation of speech signals. Thus a great advantage of the proposed model is a massive reduction of speech features used for recognition what lessens computational and memory requirements, so it may be favourable for limited-power ASR applications. Experiments on isolated Slovak digits recognition task show that the method gives comparable results as the conventional MFCC approach. For speech degraded by additive white noise, it reaches better performance than the MFCC method.
- Speech | Pp. 342-347
doi: 10.1007/11551874_45
An Alternative Way of Semantic Interpretation
Miloslav Konopík; Roman Mouček
In this work we deal with interpretation methods of speech utterances. We describe the basics of interpretation theory as well as a classic approach to interpretation. After that we suggest an alternative method based on modern knowledge in artificial intelligence. We describe the main points of that methodology; show its advantages, drawbacks and successfulness in selected restricted domain.
- Speech | Pp. 348-355
doi: 10.1007/11551874_46
Robust Rule-Based Method for Automatic Break Assignment in Russian Texts
Ilya Oparin
In this paper a new rule-based approach to break assignment for the Russian language is discussed. It is a flexible and robust method of segmentation of texts in Russian in prosodic units. We implemented it in the recent “Orator” text-to-speech (TTS) system. The model was developed to use for the inflective languages as an alternative both for statistic and for strict rule-based algorithms. It is designed in such a way that all potentially tunable context dependencies are brought up to the interface grammar and can be easily modified by linguists. The algorithm we developed performs well on different kinds of texts due to this simple and intuitive grammar built upon an elaborate mechanism of morpho-grammatical analysis. Juncture correct rate varies between more than 98% for simple literary texts and 85% for raw transcripts of spontaneous speech.
- Speech | Pp. 356-363
doi: 10.1007/11551874_47
Introduction of Improved UWB Speaker Verification System
Aleš Padrta; Jan Vaněk
In this paper, the improvements of the speaker verification system, which is used at Department of Cybernetics at University of West Bohemia, are introduced. The paper summarizes our actual pieces of knowledge in the acoustic modeling domain, in the domain of the model creation and in the domain of score normalization based on the universal background models. The constituent components of the state-of-art verification system were modified or replaced by virtue of the actual pieces of knowledge. A set of experiments was performed to evaluate and compare the performance of the improved verification system and the baseline verification system based on HTK-toolkit. The results prove that the improved verification system outperforms the baseline system in both of the reviewed criterions – the equal error rate and the time consumption.
- Speech | Pp. 364-370
doi: 10.1007/11551874_48
Formal Prosodic Structures and Their Application in NLP
Jan Romportl; Jindřich Matoušek
A formal prosody description framework is introduced together with its relation to language semantics and NLP. The framework incorporates deep prosodic structures based on a generative grammar of abstract prosodic functionally involved units. This grammar creates for each sentence a structure of immediate prosodic constituents in the form of a tree. A speech corpus manually annotated by such prosodic structures is presented and its quantitative characteristics are discussed.
- Speech | Pp. 371-378
doi: 10.1007/11551874_49
The VoiceTRAN Speech-to-Speech Communicator
Jerneja Žganec-Gros; France Mihelič; Tomaž Erjavec; Špela Vintar
The paper presents the design concept of the VoiceTRAN Communicator that integrates speech recognition, machine translation and text-to-speech synthesis using the DARPA Galaxy architecture. The aim of the project is to build a robust speech-to-speech translation communicator able to translate simple domain-specific sentences in the Slovenian-English language pair. The project represents a joint collaboration between several Slovenian research organizations that are active in human language technologies. We provide an overview of the task, describe the system architecture and individual servers. Further we describe the language resources that will be used and developed within the project. We conclude the paper with plans for evaluation of the VoiceTRAN Communicator.
- Speech | Pp. 379-384
doi: 10.1007/11551874_50
Cluster Analysis of Railway Directory Inquire Dialogs
Mikhail Alexandrov; Emilio Sanchis Arnal; Paolo Rosso
Cluster analysis of dialogs with transport directory service allows revealing the typical scenarios of dialogs, which is useful for designing automatic dialog systems. We show how to parameterize dialogs and how to control the process of clustering. The parameters include both data of transport service and features of passenger s behavior. Control of clustering consists in manipulating the parameter s weights and checking stability of the results. This technique resembles Makagonov s approach to the analysis of dweller s complaints to city administration. We shortly describe B. Stein s new MajorClust method and demonstrate its work on real person-to-person dialogs provided by Spanish railway service.
- Dialogue | Pp. 385-392