Catálogo de publicaciones - libros

Compartir en
redes sociales

Chinese Spoken Language, Processing: 5th International Symposium, ISCSLP 2006, Singapore, December 13-16, 2006, Proceedings

Qiang Huo ; Bin Ma ; Eng-Siong Chng ; Haizhou Li (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Data Mining and Knowledge Discovery; Algorithm Analysis and Problem Complexity; Document Preparation and Text Processing

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-49665-6

ISBN electrónico

978-3-540-49666-3

Editor responsable

Springer Nature

País de edición

China

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11939993_41

Unsupervised Speaker Adaptation Using Reference Speaker Weighting

Tsz-Chung Lai; Brian Mak

Recently, we revisited the fast adaptation method called (RSW), and suggested a few modifications. We then showed that the algorithmically simplest technique actually outperformed conventional adaptation techniques like MAP and MLLR for 5- or 10-second supervised adaptation on the Wall Street Journal 5K task. In this paper, we would like to further investigate the performance of RSW in unsupervised adaptation mode, which is the more natural way of doing adaptation in practice. Moreover, various analyses were carried out on the reference speakers computed by the method.

- Speech Adaptation/Normalization | Pp. 380-389

doi: 10.1007/11939993_42

Automatic Construction of Regression Class Tree for MLLR Via Model-Based Hierarchical Clustering

Shih-Sian Cheng; Yeong-Yuh Xu; Hsin-Min Wang; Hsin-Chia Fu

In this paper, we propose a model-based hierarchical clustering algorithm that automatically builds a regression class tree for the well-known speaker adaptation technique – Maximum Likelihood Linear Regression (MLLR). When building a regression class tree, the mean vectors of the Gaussian components of the model set of a speaker independent CDHMM-based speech recognition system are collected as the input data for clustering. The proposed algorithm comprises two stages. First, the input data (i.e., all the Gaussian mean vectors of the CDHMMs) is iteratively partitioned by a divisive hierarchical clustering strategy, and the Bayesian Information Criterion (BIC) is applied to determine the number of clusters (i.e., the base classes of the regression class tree). Then, the regression class tree is built by iteratively merging these base clusters using an agglomerative hierarchical clustering strategy, which also uses BIC as the merging criterion. We evaluated the proposed regression class tree construction algorithm on a Mandarin Chinese continuous speech recognition task. Compared to the regression class tree implementation in HTK, the proposed algorithm is more effective in building the regression class tree and can determine the number of regression classes automatically.

- Speech Adaptation/Normalization | Pp. 390-398

doi: 10.1007/11939993_43

A Minimum Boundary Error Framework for Automatic Phonetic Segmentation

Jen-Wei Kuo; Hsin-Min Wang

This paper presents a novel framework for HMM-based automatic phonetic segmentation that improves the accuracy of placing phone boundaries. In the framework, both training and segmentation approaches are proposed according to the minimum boundary error (MBE) criterion, which tries to minimize the expected boundary errors over a set of possible phonetic alignments. This framework is inspired by the recently proposed minimum phone error (MPE) training approach and the minimum Bayes risk decoding algorithm for automatic speech recognition. To evaluate the proposed MBE framework, we conduct automatic phonetic segmentation experiments on the TIMIT acoustic-phonetic continuous speech corpus. MBE segmentation with MBE-trained models can identify 80.53% of human-labeled phone boundaries within a tolerance of 10 ms, compared to 71.10% identified by conventional ML segmentation with ML-trained models. Moreover, by using the MBE framework, only 7.15% of automatically labeled phone boundaries have errors larger than 20 ms.

- General Topics in Speech Recognition | Pp. 399-409

doi: 10.1007/11939993_44

Advances in Mandarin Broadcast Speech Transcription at IBM Under the DARPA GALE Program

Yong Qin; Qin Shi; Yi Y. Liu; Hagai Aronowitz; Stephen M. Chu; Hong-Kwang Kuo; Geoffrey Zweig

This paper describes the technical and system building advances in the automatic transcription of Mandarin broadcast speech made at IBM in the first year of the DARPA GALE program. In particular, we discuss the application of (MPE) discriminative training and a new topic-adaptive language modeling technique. We present results on both the RT04 evaluation data and two larger community-defined test sets designed to cover both the broadcast news and the broadcast conversation domain. It is shown that with the described advances, the new transcription system achieves a 26.3% relative reduction in character error rate over our previous best-performing system, and is competitive with published numbers on these datasets. The results are further analyzed to give a comprehensive account of the relationship between the errors and the properties of the test data.

- Large Vocabulary Continuous Speech Recognition | Pp. 410-421

doi: 10.1007/11939993_45

Improved Large Vocabulary Continuous Chinese Speech Recognition by Character-Based Consensus Networks

Yi-Sheng Fu; Yi-Cheng Pan; Lin-shan Lee

Word-based consensus networks have been verified to be very useful in minimizing word error rates (WER) for large vocabulary continuous speech recognition for western languages. By considering the special structure of Chinese language, this paper points out that character-based rather then word-based consensus networks should work better for Chinese language. This was verified by extensive experimental results also reported in the paper.

- Large Vocabulary Continuous Speech Recognition | Pp. 422-434

doi: 10.1007/11939993_46

All-Path Decoding Algorithm for Segmental Based Speech Recognition

Yun Tang; Wenju Liu; Bo Xu

In conventional speech processing, researchers adopt a dividable assumption, that the speech utterance can be divided into non-overlapping feature sequences and each segment represents an acoustic event or a label. And the probability of a label sequence on an utterance approximates to the probability of the best utterance segmentation for this label sequence. But in the real case, feature sequences of acoustic events may be overlapped partially, especially for the neighboring phonemes within a syllable. And the best segmentation approximation even reinforces the distortion by the dividable assumption. In this paper, we propose an all-path decoding algorithm, which can fuse the information obtained by different segmentations (or paths) without paying obvious computation load, so the weakness of the dividable assumption could be alleviated. Our experiments show, the new decoding algorithm can improve the system performance effectively in tasks with heavy insertion and deletion errors.

- Large Vocabulary Continuous Speech Recognition | Pp. 435-444

doi: 10.1007/11939993_47

Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models

Huanliang Wang; Yao Qian; Frank Soong; Jian-Lai Zhou; Jiqing Han

Tone plays an important lexical role in spoken tonal languages like Mandarin Chinese. In this paper we propose a two-pass search strategy for improving tonal syllable recognition performance. In the first pass, instantaneous F0 information is employed along with corresponding cepstral information in a 2-stream HMM based decoding. The F0 stream, which incorporates both discrete voiced/unvoiced information and continuous F0 contour, is modeled with a multi-space distribution. With just the first-pass decoding, we recently reported a relative improvement of 24% reduction of tonal syllable recognition errors on a Mandarin Chinese database [5]. In the second pass, F0 information over a horizontal, longer time span is used to build explicit tone models for rescoring the lattice generated in the first pass. Experimental results on the same Mandarin database show that an additional 8% relative error reduction of tonal syllable recognition is obtained by the second-pass search, lattice rescoring with enhanced tone models.

- Large Vocabulary Continuous Speech Recognition | Pp. 445-453

doi: 10.1007/11939993_48

On Using Entropy Information to Improve Posterior Probability-Based Confidence Measures

Tzan-Hwei Chen; Berlin Chen; Hsin-Min Wang

In this paper, we propose a novel approach that reduces the confidence error rate of traditional posterior probability-based confidence measures in large vocabulary continuous speech recognition systems. The method enhances the discriminability of confidence measures by applying entropy information to the posterior probability-based confidence measures of word hypotheses. The experiments conducted on the Chinese Mandarin broadcast news database MATBN show that entropy-based confidence measures outperform traditional posterior probability-based confidence measures. The relative reductions in the confidence error rate are 14.11% and 9.17% for experiments conducted on field reporter speech and interviewee speech, respectively.

- Large Vocabulary Continuous Speech Recognition | Pp. 454-463

doi: 10.1007/11939993_49

Vietnamese Automatic Speech Recognition: The FLaVoR Approach

Quan Vu; Kris Demuynck; Dirk Van Compernolle

Automatic speech recognition for languages in Southeast Asia, including Chinese, Thai and Vietnamese, typically models both acoustics and languages at the syllable level. This paper presents a new approach for recognizing those languages by exploiting information at the word level. The new approach, adapted from our FLaVoR architecture[1], consists of two layers. In the first layer, a pure acoustic-phonemic search generates a dense phoneme network enriched with meta data. In the second layer, a word decoding is performed in the composition of a series of finite state transducers (FST), combining various knowledge sources across sub-lexical, word lexical and word-based language models. Experimental results on the Vietnamese Broadcast News corpus showed that our approach is both effective and flexible.

- Large Vocabulary Continuous Speech Recognition | Pp. 464-474

doi: 10.1007/11939993_50

Language Identification by Using Syllable-Based Duration Classification on Code-Switching Speech

Dau-cheng Lyu; Ren-yuan Lyu; Yuang-chin Chiang; Chun-nan Hsu

Many approaches to automatic spoken language identification (LID) on monolingual speech are successfully, but LID on the code-switching speech identifying at least 2 languages from one acoustic utterance challenges these approaches. In [6], we have successfully used one-pass approach to recognize the Chinese character on the Mandarin-Taiwanese code-switching speech. In this paper, we introduce a classification method (named syllable-based duration classification) based on three clues: recognized common tonal syllable tonal syllable, the corresponding duration and speech signal to identify specific language from code-switching speech. Experimental results show that the performance of the proposed LID approach on code-switching speech exhibits closely to that of parallel tonal syllable recognition LID system on monolingual speech.

- Multilingual Recognition and Identification | Pp. 475-484