Catálogo de publicaciones - libros
Text, Speech and Dialogue: 10th International Conference, TSD 2007, Pilsen, Czech Republic, September 3-7, 2007. Proceedings
Václav Matoušek ; Pavel Mautner (eds.)
En conferencia: 10º International Conference on Text, Speech and Dialogue (TSD) . Pilsen, Czech Republic . September 3, 2007 - September 7, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Data Mining and Knowledge Discovery; Information Storage and Retrieval; Information Systems Applications (incl. Internet)
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-74627-0
ISBN electrónico
978-3-540-74628-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Tabla de contenidos
Spanish-Basque Parallel Corpus Structure: Linguistic Annotations and Translation Units
A. Casillas; A. Díaz de Illarraza; J. Igartua; R. Martínez; K. Sarasola; A. Sologaistoa
In this paper we propose a corpus structure which represents and manages an aligned parallel corpus. The corpus structure is based on a stand-off annotation model, which is composed of several XML documents. A bilingual parallel corpus represented in the proposed structure will contain: (1) the entire corpus together with its corresponding linguistic information, (2) translation units and alignment relations between units of the two languages: paragraphs, sentences and named entities. The proposed structure permits to work with the corpus both as an annotated corpus with linguistic information, and as a translation memory.
- Speech | Pp. 230-237
An Automatic Version of the Post-Laryngectomy Telephone Test
Tino Haderlein; Korbinian Riedhammer; Andreas Maier; Elmar Nöth; Hikmet Toy; Frank Rosanowski
Tracheoesophageal (TE) speech is a possibility to restore the ability to speak after total laryngectomy, i.e. the removal of the larynx. The quality of the substitute voice has to be evaluated during therapy. For the intelligibility evaluation of German speakers over telephone, the Post-Laryngectomy Telephone Test (PLTT) was defined. Each patient reads out 20 of 400 different monosyllabic words and 5 out of 100 sentences. A human listener writes down the words and sentences understood and computes an overall score. This paper presents a means of objective and automatic evaluation that can replace the subjective method. The scores of 11 naïve raters for a set of 31 test speakers were compared to the word recognition rate of speech recognizers. Correlation values of about 0.9 were reached.
- Speech | Pp. 238-245
Speaker Normalization Via Springy Discriminant Analysis and Pitch Estimation
Dénes Paczolay; András Bánhalmi; András Kocsor
Speaker normalization techniques are widely used to improve the accuracy of speaker independent speech recognition. One of the most popular group of such methods is Vocal Tract Length Normalization (VTLN). These methods try to reduce the inter-speaker variability by transforming the input feature vectors into a more compact domain, to achieve better separations between the phonetic classes. Among others, two algorithms are commonly applied: the Maximum Likelihood criterion-based, and the Linear Discriminant criterion-based normalization algorithms. Here we propose the use of the Springy Discriminant criterion for the normalization task. In addition we propose a method for the VTLN parameter determination that is based on pitch estimation. In the experiments this proves to be an efficient and swift way to initialize the normalization parameters for training, and to estimate them for the voice samples of new test speakers.
- Speech | Pp. 246-253
A Study on Speech with Manifest Emotions
Horia-Nicolai Teodorescu; Silvia Monica Feraru
We present a study of the prosody – seen in a broader sense – that supports the theory of the interrelationship function of speech. “Pure emotions” are meant to show a relationship of the speaker with the general context. The analysis goes beyond the basic prosody, as related to pitch trajectory; namely, the analysis also aims to determine the change in higher formants. The refinement in the analysis asks for improved tools. Methodological aspects are discussed, including a discussion of the limitations of the currently available tools. Some conclusions are drawn.
- Speech | Pp. 254-261
Speech Recognition Supported by Prosodic Information for Fixed Stress Languages
György Szaszák; Klára Vicsi
In our paper we examine the usage prosodic features in speech recognition, with a special attention payed to agglutinating and fixed stress languages. The used prosodic features, acoustic-prosodic pre-processing, and segmentation in terms of prosodic units are presented in details. We use the expression ”prosodic unit” in order to make a difference from prosodic phrases, which are longer. We trained a HMM-based prosodic segmenter reliing on fundamental frequency and intensity of speech. The output of the prosodic segmenter is used for N-best lattice rescoring in parallel with a simplified bigram language model in a continuous speech recognizer, in order to improve speech recognition performance. Experiments for Hungarian language show a WER reduction of about 4% using a simple lattice rescoring.
- Speech | Pp. 262-269
TRAP-Based Techniques for Recognition of Noisy Speech
František Grézl; Jan Černocký
This paper presents a systematic study of performance of TempoRAl Patterns (TRAP) based features and their proposed modifications and combinations for speech recognition in noisy environment. The experimental results are obtained on AURORA 2 database with clean training data. We observed large dependency of performance of different TRAP modifications on noise level. Earlier proposed TRAP system modifications help in clean conditions but degrade the system performance in presence of noise. The combination techniques on the other hand can bring large improvement in case of weak noise and degrade only slightly for strong noise cases. The vector concatenation combination technique is improving the system performance up to strong noise.
- Speech | Pp. 270-277
Intelligibility Is More Than a Single Word: Quantification of Speech Intelligibility by ASR and Prosody
Andreas Maier; Tino Haderlein; Maria Schuster; Emeka Nkenke; Elmar Nöth
In this paper we examine the quality of the prediction of intelligibility scores of human experts. Furthermore, we investigate the differences between subjective expert raters who evaluated speech disorders of laryngectomees and children with cleft lip and palate. We use the recognition rate of a word recognizer and prosodic features to predict the intelligibility score of each individual expert. For each expert and the mean opinion of all experts we present the best features to model their scoring behavior according to the mean rank obtained during a 10-fold cross-validation. In this manner all individual speech experts were modeled with a correlation coefficient of at least > .75. The mean opinion of all raters is predicted with a correlation of =.90 for the laryngectomees and =.86 for the children.
- Speech | Pp. 278-285
Appositions Versus Double Subject Sentences – What Information the Speech Analysis Brings to a Grammar Debate
Horia-Nicolai Teodorescu; Diana Trandabăţ
We propose a method based on spoken language analysis to deal with controversial syntactic issues; we apply the method to the problem of the double subject sentences in the Romanian language. The double subject construction is a controversial linguistic phenomenon in Romanian. While some researchers accept it as a language ‘curiosity’ (specific only to the Asian languages, but not to the European ones), others consider it apposition-type structure, in order to embody its behavior in the already existing theories. This paper brings a fresh gleam of light over the debate, by presenting what we believe to be the first study on the phonetic analysis of double-subject sentences in order to account for its difference vs. the appositional constructions.
- Speech | Pp. 286-293
Automatic Evaluation of Pathologic Speech – from Research to Routine Clinical Use
Elmar Nöth; Andreas Maier; Tino Haderlein; Korbinian Riedhammer; Frank Rosanowski; Maria Schuster
Previously we have shown that ASR technology can be used to objectively evaluate pathologic speech. Here we report on progress for routine clinical use: 1) We introduce an easy-to-use recording and evaluation environment. 2) We confirm our previous results for a larger group of patients. 3) We show that telephone speech can be analyzed with the same methods with only a small loss of agreement with human experts. 4) We show that prosodic information leads to more robust results. 5) We show that text reference instead of transliteration can be used for evaluation. Using word accuracy of a speech recognizer and prosodic features as features for SVM regression, we achieve a correlation of .90 between the automatic analysis and human experts.
- Speech | Pp. 294-301
The LIA Speech Recognition System: From 10xRT to 1xRT
G. Linarès; P. Nocera; D. Massonié; D. Matrouf
The LIA developed a speech recognition toolkit providing most of the components required by speech-to-text systems. This toolbox allowed to build a Broadcast News (BN) transcription system was involved in the ESTER evaluation campaign ([11]), on and tasks. In this paper, we describe the techniques we used to reach the real-time, starting from our baseline 10xRT system. We focus on some aspects of the A* search algorithm which are critical for both efficiency and accuracy. Then, we evaluate the impact of the different system components (lexicon, language models and acoustic models) to the trade-off between efficiency and accuracy. Experiments are carried out in framework of the ESTER evaluation campaign. Our results show that the real time system reaches performance on about 5.6% absolute WER whorses than the standard 10xRT system, with an absolute WER (Word Error Rate) of about 26.8%.
- Speech | Pp. 302-308