Catálogo de publicaciones - libros

Compartir en
redes sociales


Text, Speech and Dialogue: 10th International Conference, TSD 2007, Pilsen, Czech Republic, September 3-7, 2007. Proceedings

Václav Matoušek ; Pavel Mautner (eds.)

En conferencia: 10º International Conference on Text, Speech and Dialogue (TSD) . Pilsen, Czech Republic . September 3, 2007 - September 7, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Data Mining and Knowledge Discovery; Information Storage and Retrieval; Information Systems Applications (incl. Internet)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74627-0

ISBN electrónico

978-3-540-74628-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Festival-si: A Sinhala Text-to-Speech System

Ruvan Weerasinghe; Asanka Wasala; Viraj Welgama; Kumudu Gamage

This paper brings together the development of the first Text-to- Speech (TTS) system for Sinhala using the Festival framework and practical applications of it. Construction of a diphone database and implementation of the natural language processing modules are described. The paper also presents the development methodology of direct Sinhala Unicode text input by rewriting letter-to-sound rules in Festival’s context sensitive rule format and the implementation of Sinhala syllabification algorithm. A Modified Rhyme Test (MRT) was conducted to evaluate the intelligibility of the synthesized speech and yielded a score of 71.5% for the TTS system described.

- Speech | Pp. 472-479

Voice Conversion Based on Probabilistic Parameter Transformation and Extended Inter-speaker Residual Prediction

Zdeněk Hanzlíček; Jindřich Matoušek

Voice conversion is a process which modifies speech produced by one speaker so that it sounds as if it is uttered by another speaker. In this paper a new voice conversion system is presented. The system requires parallel training data. By using linear prediction analysis, speech is described with line spectral frequencies and the corresponding residua. LSFs are converted together with instantaneous F by joint probabilistic function. The residua are transformed by employing residual prediction. In this paper, a new modification of residual prediction is introduced which uses information on the desired target F to determine a proper residuum and it also allows an efficient control of F in resulting speech.

- Speech | Pp. 480-487

Automatic Czech – Sign Speech Translation

Jakub Kanis; Luděk Müller

This paper is devoted to the problem of automatic translation between Czech and SC in both directions. We introduced our simple monotone phrase-based decoder - suitable for fast translation and compared its results with the results of the state-of-the-art phrase-based decoder - . We compare the translation accuracy of handcrafted and automatically derived phrases and introduce a ”class-based” language model and post-processing step in order to increase the translation accuracy according to several criteria. Finally, we use the described methods and decoding techniques in the task of SC to Czech automatic translation and report the first results for this direction.

- Speech | Pp. 488-495

Maximum Likelihood and Maximum Mutual Information Training in Gender and Age Recognition System

Valiantsina Hubeika; Igor Szöke; Lukáš Burget; Jan Černocký

Gender and age estimation based on Gaussian Mixture Models (GMM) is introduced. Telephone recordings from the Czech SpeechDat-East database are used as training and test data set. Mel-Frequency Cepstral Coefficients (MFCC) are extracted from the speech recordings. To estimate the GMMs’ parameters Maximum Likelihood (ML) training is applied. Consequently these estimations are used as the baseline for Maximum Mutual Information (MMI) training. Results achieved when employing both ML and MMI training are presented and discussed.

- Speech | Pp. 496-501

Pitch Marks at Peaks or Valleys?

Milan Legát; Daniel Tihelka; Jindřich Matoušek

This paper deals with the problem of speech waveform polarity. As the polarity of speech waveform can influence the performance of pitch marking algorithms (see Sec. 4), a simple method for the speech signal polarity determination is presented in the paper. We call this problem peak/valley decision making, i.e. making of decision whether pitch marks should be placed at peaks (local maxima) or at valleys (local minima) of a speech waveform. Besides, the proposed method can be utilized to check the polarity consistence of a speech corpus, which is important for the concatenation of speech units in speech synthesis.

- Speech | Pp. 502-507

Quality Deterioration Factors in Unit Selection Speech Synthesis

Daniel Tihelka; Jindřich Matoušek; Jiří Kala

The purpose of the present paper is to examine the relationships between target and concatenation costs and the quality (with focus on naturalness) of generated speech. Several synthetic phrases were examined by listeners with the aim to find unnatural artefacts in them, and the mutual relation between the artefacts and the behaviour of features used in given unit selection algorithm was examined.

- Speech | Pp. 508-515

Topic-Focus Articulation Algorithm on the Syntax-Prosody Interface of Romanian

Neculai Curteanu; Diana Trandabăţ; Mihai Alex Moruz

We propose in this paper an implementation of the Prague School’s TFA (Topic-Focus Articulation) algorithm to support the Romanian , relying on the experience with FDG (Functional Dependency Grammar) and SCD (Segmentation-Cohesion-Dependency) parsing strategies for the classical, predication-driven, but Information Structure (IS) non-dependent, syntax. As contributions worth to be mentioned are:  Outlining the and organization of linguistic within SCD and FDG local-global parsing, on both sides of the of Romanian.  Pointing out the relationship between classical (IS-free) syntactic structures, IS (topic-focus, communicative dynamism) depending textual spans, and the corresponding units.   and the TFA for to Romanian prosodic structures, to be continued with TFA sentence-level refinements, its rhetorical-level extension, and embedding into local-global .

- Speech | Pp. 516-523

Translation and Conversion for Czech Sign Speech Synthesis

Zdeněk Krňoul; Miloš Železný

Recent research progress in developing of Czech Sign Speech synthesizer is presented. The current goal is to improve a system for automatic synthesis to produce accurate synthesis of the Sign Speech. The synthesis system converts written text to an animation of an artificial human model. This includes translation of text to sign phrases and its conversion to the animation of an avatar. The animation is composed of movements and deformations of segments of hands, a head and also a face. The system has been evaluated by two initial perceptual tests. The perceptual tests indicate that the designed synthesis system is capable to produce intelligible Sign Speech.

- Speech | Pp. 524-531

A Wizard-of-Oz System Evaluation Study

Melita Hajdinjak; France Mihelič

In order to evaluate the performance of the dialogue-manager component of a developing, Slovenian and Croatian spoken dialogue system, two Wizard-of-Oz experiments were performed. The only difference between the two experiment settings was in the dialogue-management manner, i.e., while in the first experiment dialogue management was performed by a human, the wizard, in the second experiment it was performed by the newly-implemented dialogue-manager component. The data from both Wizard-of-Oz experiments was evaluated with the PARADISE evaluation framework, a potential general methodology for evaluating and comparing different versions of spoken-language dialogue systems. The study ascertains a remarkable difference in the performance functions when taking different satisfaction-measure sums or even individual scores as the target to be predicted, it proves the indispensableness of the recently introduced when evaluating information-providing dialogue systems, and it confirms the dialogue manager’s cooperativity subject to the incorporated knowledge representation.

- Dialog | Pp. 532-539

New Measures for Open-Domain Question Answering Evaluation Within a Time Constraint

Elisa Noguera; Fernando Llopis; Antonio Ferrández; Alberto Escapa

Previous works on evaluating the performance of Question Answering (QA) systems are focused on the evaluation of the precision. In this paper, we developed a mathematic procedure in order to explore new evaluation measures in QA systems considering the answer time. Also, we carried out an exercise for the evaluation of QA systems within a time constraint in the CLEF-2006 campaign, using the proposed measures. The main conclusion is that the evaluation of QA systems in realtime can be a new scenario for the evaluation of QA systems.

- Dialog | Pp. 540-547