Catálogo de publicaciones - libros
Chinese Spoken Language, Processing: 5th International Symposium, ISCSLP 2006, Singapore, December 13-16, 2006, Proceedings
Qiang Huo ; Bin Ma ; Eng-Siong Chng ; Haizhou Li (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Data Mining and Knowledge Discovery; Algorithm Analysis and Problem Complexity; Document Preparation and Text Processing
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-49665-6
ISBN electrónico
978-3-540-49666-3
Editor responsable
Springer Nature
País de edición
China
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11939993_51
CCC Speaker Recognition Evaluation 2006: Overview, Methods, Data, Results and Perspective
Thomas Fang Zheng; Zhanjiang Song; Lihong Zhang; Michael Brasser; Wei Wu; Jing Deng
For the special session on speaker recognition of the (ISCSLP 2006), the (CCC), the session organizer, developed a speaker recognition evaluation (SRE) to act as a platform for developers in this field to evaluate their speaker recognition systems using two databases provided by the CCC. In this paper, the objective of the evaluation, and the methods and the data used are described. The results of the evaluation are also presented.
- Speaker Recognition and Characterization | Pp. 485-493
doi: 10.1007/11939993_52
The IIR Submission to CSLP 2006 Speaker Recognition Evaluation
Kong-Aik Lee; Hanwu Sun; Rong Tong; Bin Ma; Minghui Dong; Changhuai You; Donglai Zhu; Chin-Wei Eugene Koh; Lei Wang; Tomi Kinnunen; Eng-Siong Chng; Haizhou Li
This paper describes the design and implementation of a practical automatic speaker recognition system for the CSLP speaker recognition evaluation (SRE). The speaker recognition system is built upon four subsystems using speaker information from acoustic spectral features. In addition to the conventional spectral features, a novel (TDCT) feature is introduced in order to capture long-term speech dynamic. The speaker information is modeled using two complementary speaker modeling techniques, namely, Gaussian mixture model (GMM) and support vector machine (SVM). The resulting subsystems are then integrated at the score level through a multilayer perceptron (MLP) neural network. Evaluation results confirm that the feature selection, classifier design, and fusion strategy are successful, giving rise to an effective speaker recognition system.
- Speaker Recognition and Characterization | Pp. 494-505
doi: 10.1007/11939993_53
A Novel Alternative Hypothesis Characterization Using Kernel Classifiers for LLR-Based Speaker Verification
Yi-Hsiang Chao; Hsin-Min Wang; Ruei-Chuan Chang
In a log-likelihood ratio (LLR)-based speaker verification system, the alternative hypothesis is usually ill-defined and hard to characterize a priori, since it should cover the space of all possible impostors. In this paper, we propose a new LLR measure in an attempt to characterize the alternative hypothesis in a more effective and robust way than conventional methods. This LLR measure can be further formulated as a non-linear discriminant classifier and solved by kernel-based techniques, such as the Kernel Fisher Discriminant (KFD) and Support Vector Machine (SVM). The results of experiments on two speaker verification tasks show that the proposed methods outperform classical LLR-based approaches.
- Speaker Recognition and Characterization | Pp. 506-517
doi: 10.1007/11939993_54
Speaker Verification Using Complementary Information from Vocal Source and Vocal Tract
Nengheng Zheng; Ning Wang; Tan Lee; P. C. Ching
This paper describes a speaker verification system which uses two complementary acoustic features: Mel-frequency cepstral coefficients (MFCC) and wavelet octave coefficients of residues (WOCOR). While MFCC characterizes mainly the spectral envelope, or the formant structure of the vocal tract system, WOCOR aims at representing the spectro-temporal characteristics of the vocal source excitation. Speaker verification experiments carried out on the ISCSLP 2006 SRE database demonstrate the complementary contributions of MFCC and WOCOR to speaker verification. Particularly, WOCOR performs even better than MFCC in single channel speaker verification task. Combining MFCC and WOCOR achieves higher performance than using MFCC only in both single and cross channel speaker verification tasks.
- Speaker Recognition and Characterization | Pp. 518-528
doi: 10.1007/11939993_55
ISCSLP SR Evaluation, UVA–CS_es System Description. A System Based on ANNs
Carlos E. Vivaracho
This paper shows a description of the system used in the ISCSLP06 Speaker Recognition Evaluation, text independent cross-channel speaker verification task. It is a discriminative Artificial Neural Network-based system, using the Non-Target Incremental Learning method to select world representatives. Two different training strategies have been followed: (i) to use world representative samples with the same channel type as the true model, (ii) to select the world representatives from a pool of samples without channel type identification. The best results have been achieved with the first alternative, but with the appearance of the additional problem of the true model channel type recognition. The system used in this task will also be shown.
- Speaker Recognition and Characterization | Pp. 529-538
doi: 10.1007/11939993_56
Evaluation of EMD-Based Speaker Recognition Using ISCSLP2006 Chinese Speaker Recognition Evaluation Corpus
Shingo Kuroiwa; Satoru Tsuge; Masahiko Kita; Fuji Ren
In this paper, we present the evaluation results of our proposed text-independent speaker recognition method based on the Earth Mover’s Distance (EMD) using corpus developed by the Chinese Corpus Consortium (CCC). The EMD based speaker recognition (EMD-SR) was originally designed to apply to a distributed speaker identification system, in which the feature vectors are compressed by vector quantization at a terminal and sent to a server that executes a pattern matching process. In this structure, we had to train speaker models using quantized data, so that we utilized a non-parametric speaker model and EMD. From the experimental results on a Japanese speech corpus, EMD-SR showed higher robustness to the quantized data than the conventional GMM technique. Moreover, it has achieved higher accuracy than the GMM even if the data were not quantized. Hence, we have taken the challenge of by using EMD-SR. Since the identification tasks defined in the evaluation were on an open-set basis, we introduce a new speaker verification module in this paper. Evaluation results showed that EMD-SR achieves 99.3% in a closed-channel speaker identification task.
- Speaker Recognition and Characterization | Pp. 539-548
doi: 10.1007/11939993_57
Integrating Complementary Features with a Confidence Measure for Speaker Identification
Nengheng Zheng; P. C. Ching; Ning Wang; Tan Lee
This paper investigates the effectiveness of integrating complementary acoustic features for improved speaker identification performance. The complementary contributions of two acoustic features, i.e. the conventional vocal tract related features MFCC and the recently proposed vocal source related features WOCOR, for speaker identification are studied. An integrating system, which performs a score level fusion of MFCC and WOCOR with a confidence measure as the weighting parameter, is proposed to take full advantage of the complementarity between the two features. The confidence measure is derived based on the speaker discrimination powers of MFCC and WOCOR in each individual identification trial so as to give more weight to the one with higher confidence in speaker discrimination. Experiments show that information fusion with such a confidence measure based varying weight outperforms that with a pre-trained fixed weight in speaker identification.
- Speaker Recognition and Characterization | Pp. 549-557
doi: 10.1007/11939993_58
Discriminative Transformation for Sufficient Adaptation in Text-Independent Speaker Verification
Hao Yang; Yuan Dong; Xianyu Zhao; Jian Zhao; Haila Wang
In conventional Gaussian Mixture Model – Universal Background Model (GMM-UBM) text-independent speaker verification applications, the discriminability between speaker models and the universal background model (UBM) is crucial to system’s performance. In this paper, we present a method based on heteroscedastic linear discriminant analysis (HLDA) that can enhance the discriminability between speaker models and UBM. This technique aims to discriminate the individual Gaussian distributions of the feature space. After the discriminative transformation, the overlapped parts of Gaussian distributions can be reduced. As a result, some Gaussian components of a target speaker model can be adapted more sufficiently during Maximum a Posteriori (MAP) adaptation, and these components will have more discriminative capability over the UBM. Results are presented on NIST 2004 Speaker Recognition data corpora where it is shown that this method provides significant performance improvements over the baseline system.
- Speaker Recognition and Characterization | Pp. 558-565
doi: 10.1007/11939993_59
Fusion of Acoustic and Tokenization Features for Speaker Recognition
Rong Tong; Bin Ma; Kong-Aik Lee; Changhuai You; Donglai Zhu; Tomi Kinnunen; Hanwu Sun; Minghui Dong; Eng-Siong Chng; Haizhou Li
This paper describes our recent efforts in exploring effective discriminative features for speaker recognition. Recent researches have indicated that the appropriate fusion of features is critical to improve the performance of speaker recognition system. In this paper we describe our approaches for the NIST 2006 Speaker Recognition Evaluation. Our system integrated the cepstral GMM modeling, cepstral SVM modeling and tokenization at both phone level and frame level. The experimental results on both NIST 2005 SRE corpus and NIST 2006 SRE corpus are presented. The fused system achieved 8.14% equal error rate on 1conv4w-1conv4w test condition of the NIST 2006 SRE.
- Speaker Recognition and Characterization | Pp. 566-577
doi: 10.1007/11939993_60
Contextual Maximum Entropy Model for Edit Disfluency Detection of Spontaneous Speech
Jui-Feng Yeh; Chung-Hsien Wu; Wei-Yen Wu
This study describes an approach to edit disfluency detection based on maximum entropy (ME) using contextual features for rich transcription of spontaneous speech. The contextual features contain word-level, chunk-level and sentence-level features for edit disfluency modeling. Due to the problem of data sparsity, word-level features are determined according to the taxonomy of the primary features of the words defined in Hownet. Chunk-level features are extracted based on mutual information of the words. Sentence-level feature are identified according to verbs and their corresponding features. The Improved Iterative Scaling (IIS) algorithm is employed to estimate the optimal weights in the maximum entropy models. Performance on edit disfluency detection and interruption point detection are conducted for evaluation. Experimental results show that the proposed method outperforms the DF-gram approach.
- Spoken Language Understanding | Pp. 578-589