Catálogo de publicaciones - libros
Text, Speech and Dialogue: 10th International Conference, TSD 2007, Pilsen, Czech Republic, September 3-7, 2007. Proceedings
Václav Matoušek ; Pavel Mautner (eds.)
En conferencia: 10º International Conference on Text, Speech and Dialogue (TSD) . Pilsen, Czech Republic . September 3, 2007 - September 7, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Data Mining and Knowledge Discovery; Information Storage and Retrieval; Information Systems Applications (incl. Internet)
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-74627-0
ISBN electrónico
978-3-540-74628-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Tabla de contenidos
Festival-si: A Sinhala Text-to-Speech System
Ruvan Weerasinghe; Asanka Wasala; Viraj Welgama; Kumudu Gamage
This paper brings together the development of the first Text-to- Speech (TTS) system for Sinhala using the Festival framework and practical applications of it. Construction of a diphone database and implementation of the natural language processing modules are described. The paper also presents the development methodology of direct Sinhala Unicode text input by rewriting letter-to-sound rules in Festival’s context sensitive rule format and the implementation of Sinhala syllabification algorithm. A Modified Rhyme Test (MRT) was conducted to evaluate the intelligibility of the synthesized speech and yielded a score of 71.5% for the TTS system described.
- Speech | Pp. 472-479
Voice Conversion Based on Probabilistic Parameter Transformation and Extended Inter-speaker Residual Prediction
Zdeněk Hanzlíček; Jindřich Matoušek
Voice conversion is a process which modifies speech produced by one speaker so that it sounds as if it is uttered by another speaker. In this paper a new voice conversion system is presented. The system requires parallel training data. By using linear prediction analysis, speech is described with line spectral frequencies and the corresponding residua. LSFs are converted together with instantaneous F by joint probabilistic function. The residua are transformed by employing residual prediction. In this paper, a new modification of residual prediction is introduced which uses information on the desired target F to determine a proper residuum and it also allows an efficient control of F in resulting speech.
- Speech | Pp. 480-487
Automatic Czech – Sign Speech Translation
Jakub Kanis; Luděk Müller
This paper is devoted to the problem of automatic translation between Czech and SC in both directions. We introduced our simple monotone phrase-based decoder - suitable for fast translation and compared its results with the results of the state-of-the-art phrase-based decoder - . We compare the translation accuracy of handcrafted and automatically derived phrases and introduce a ”class-based” language model and post-processing step in order to increase the translation accuracy according to several criteria. Finally, we use the described methods and decoding techniques in the task of SC to Czech automatic translation and report the first results for this direction.
- Speech | Pp. 488-495
Maximum Likelihood and Maximum Mutual Information Training in Gender and Age Recognition System
Valiantsina Hubeika; Igor Szöke; Lukáš Burget; Jan Černocký
Gender and age estimation based on Gaussian Mixture Models (GMM) is introduced. Telephone recordings from the Czech SpeechDat-East database are used as training and test data set. Mel-Frequency Cepstral Coefficients (MFCC) are extracted from the speech recordings. To estimate the GMMs’ parameters Maximum Likelihood (ML) training is applied. Consequently these estimations are used as the baseline for Maximum Mutual Information (MMI) training. Results achieved when employing both ML and MMI training are presented and discussed.
- Speech | Pp. 496-501
Pitch Marks at Peaks or Valleys?
Milan Legát; Daniel Tihelka; Jindřich Matoušek
This paper deals with the problem of speech waveform polarity. As the polarity of speech waveform can influence the performance of pitch marking algorithms (see Sec. 4), a simple method for the speech signal polarity determination is presented in the paper. We call this problem peak/valley decision making, i.e. making of decision whether pitch marks should be placed at peaks (local maxima) or at valleys (local minima) of a speech waveform. Besides, the proposed method can be utilized to check the polarity consistence of a speech corpus, which is important for the concatenation of speech units in speech synthesis.
- Speech | Pp. 502-507
Quality Deterioration Factors in Unit Selection Speech Synthesis
Daniel Tihelka; Jindřich Matoušek; Jiří Kala
The purpose of the present paper is to examine the relationships between target and concatenation costs and the quality (with focus on naturalness) of generated speech. Several synthetic phrases were examined by listeners with the aim to find unnatural artefacts in them, and the mutual relation between the artefacts and the behaviour of features used in given unit selection algorithm was examined.
- Speech | Pp. 508-515
Topic-Focus Articulation Algorithm on the Syntax-Prosody Interface of Romanian
Neculai Curteanu; Diana Trandabăţ; Mihai Alex Moruz
We propose in this paper an implementation of the Prague School’s TFA (Topic-Focus Articulation) algorithm to support the Romanian , relying on the experience with FDG (Functional Dependency Grammar) and SCD (Segmentation-Cohesion-Dependency) parsing strategies for the classical, predication-driven, but Information Structure (IS) non-dependent, syntax. As contributions worth to be mentioned are: Outlining the and organization of linguistic within SCD and FDG local-global parsing, on both sides of the of Romanian. Pointing out the relationship between classical (IS-free) syntactic structures, IS (topic-focus, communicative dynamism) depending textual spans, and the corresponding units. and the TFA for to Romanian prosodic structures, to be continued with TFA sentence-level refinements, its rhetorical-level extension, and embedding into local-global .
- Speech | Pp. 516-523
Translation and Conversion for Czech Sign Speech Synthesis
Zdeněk Krňoul; Miloš Železný
Recent research progress in developing of Czech Sign Speech synthesizer is presented. The current goal is to improve a system for automatic synthesis to produce accurate synthesis of the Sign Speech. The synthesis system converts written text to an animation of an artificial human model. This includes translation of text to sign phrases and its conversion to the animation of an avatar. The animation is composed of movements and deformations of segments of hands, a head and also a face. The system has been evaluated by two initial perceptual tests. The perceptual tests indicate that the designed synthesis system is capable to produce intelligible Sign Speech.
- Speech | Pp. 524-531
A Wizard-of-Oz System Evaluation Study
Melita Hajdinjak; France Mihelič
In order to evaluate the performance of the dialogue-manager component of a developing, Slovenian and Croatian spoken dialogue system, two Wizard-of-Oz experiments were performed. The only difference between the two experiment settings was in the dialogue-management manner, i.e., while in the first experiment dialogue management was performed by a human, the wizard, in the second experiment it was performed by the newly-implemented dialogue-manager component. The data from both Wizard-of-Oz experiments was evaluated with the PARADISE evaluation framework, a potential general methodology for evaluating and comparing different versions of spoken-language dialogue systems. The study ascertains a remarkable difference in the performance functions when taking different satisfaction-measure sums or even individual scores as the target to be predicted, it proves the indispensableness of the recently introduced when evaluating information-providing dialogue systems, and it confirms the dialogue manager’s cooperativity subject to the incorporated knowledge representation.
- Dialog | Pp. 532-539
New Measures for Open-Domain Question Answering Evaluation Within a Time Constraint
Elisa Noguera; Fernando Llopis; Antonio Ferrández; Alberto Escapa
Previous works on evaluating the performance of Question Answering (QA) systems are focused on the evaluation of the precision. In this paper, we developed a mathematic procedure in order to explore new evaluation measures in QA systems considering the answer time. Also, we carried out an exercise for the evaluation of QA systems within a time constraint in the CLEF-2006 campaign, using the proposed measures. The main conclusion is that the evaluation of QA systems in realtime can be a new scenario for the evaluation of QA systems.
- Dialog | Pp. 540-547