Catálogo de publicaciones - libros
Advances in Nonlinear Speech Processing: International Conference on Non-Linear Speech Processing, NOLISP 2007 Paris, France, May 22-25, 2007 Revised Selected Papers
Mohamed Chetouani ; Amir Hussain ; Bruno Gas ; Maurice Milgram ; Jean-Luc Zarader (eds.)
En conferencia: International Conference on Nonlinear Speech Processing (NOLISP) . Paris, France . May 22, 2007 - May 25, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Theory of Computation; Artificial Intelligence (incl. Robotics); Language Translation and Linguistics; Biometrics; Computer Appl. in Arts and Humanities; Image Processing and Computer Vision
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-77346-7
ISBN electrónico
978-3-540-77347-4
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Tabla de contenidos
Phase-Based Methods for Voice Source Analysis
Christophe d’Alessandro; Baris Bozkurt; Boris Doval; Thierry Dutoit; Nathalie Henrich; Vu Ngoc Tuan; Nicolas Sturmel
Voice source analysis is an important but difficult issue for speech processing. In this talk, three aspects of voice source analysis recently developed at LIMSI (Orsay, France) and FPMs (Mons, Belgium) are discussed. In a first part, time domain and spectral domain modelling of glottal flow signals are presented. It is shown that the glottal flow can be modelled as an anticausal filter (maximum phase) before the glottal closing, and as a causal filter (minimum phase) after the glottal closing. In a second part, taking advantage of this phase structure, causal and anticausal components of the speech signal are separated according to the location in the Z-plane of the zeros of the Z-Transform (ZZT) of the windowed signal. This method is useful for voice source parameters analysis and source-tract deconvolution. Results of a comparative evaluation of the ZZT and linear prediction for source/tract separation are reported. In a third part, glottal closing instant detection using the phase of the wavelet transform is discussed. A method based on the lines of maximum phase in the time-scale plane is proposed. This method is compared to EGG for robust glottal closing instant analysis.
- Non-Linear and Non-Conventional Techniques | Pp. 1-27
Some Experiments in Audio-Visual Speech Processing
G. Chollet; R. Landais; T. Hueber; H. Bredin; C. Mokbel; P. Perrot; L. Zouari
Natural speech is produced by the vocal organs of a particular talker. The acoustic features of the speech signal must therefore be correlated with the movements of the articulators (lips, jaw, tongue, velum,...). For instance, hearing impaired people (and not only them) improve their understanding of speech by lip reading. This chapter is an overview of audiovisual speech processing with emphasis on some experiments concerning recognition, speaker verification, indexing and corpus based synthesis from tongue and lips movements.
- Non-Linear and Non-Conventional Techniques | Pp. 28-56
Exploiting Nonlinearity in Adaptive Signal Processing
Phebe Vayanos; Mo Chen; Beth Jelfs; Danilo P. Mandic
performance criteria for the analysis of machine learning architectures and algorithms have been long established. However, the performance criteria, , nonlinearity assessment, are still emerging. To that end, we employ some recent developments in signal characterisation and derive criteria for the assessment of the changes in the nature of the processed signal. In addition, we also propose a novel online method for tracking the system nonlinearity. A comprehensive set of simulations in both the linear and nonlinear settings and their combination supports the analysis.
- Non-Linear and Non-Conventional Techniques | Pp. 57-77
Mixing HMM-Based Spanish Speech Synthesis with a CBR for Prosody Estimation
Xavi Gonzalvo; Ignasi Iriondo; Joan Claudi Socoró; Francesc Alías; Carlos Monzo
Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using an external machine learning technique to help improving the expressiveness. System performance is analysed objectively and subjectively. The experiments were conducted on a reliably labelled speech corpus, whose units were clustered using contextual factors based on the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing non-declarative short sentences while the durations accuracy is similar with the CBR or the HMM system.
- Speech Synthesis | Pp. 78-85
Objective and Subjective Evaluation of an Expressive Speech Corpus
Ignasi Iriondo; Santiago Planet; Joan-Claudi Socoró; Francesc Alías
This paper presents the validation of the expressiveness of an acted oral corpus produced to be used in speech synthesis. Firstly, an objective validation has been conducted by means of automatic emotion identification techniques using statistical features extracted from the prosodic parameters of speech. Secondly, a listening test has been performed with a subset of utterances. The relationship between both objective and subjective evaluations is analyzed and the obtained conclusions can be useful to improve the following steps related to expressive speech synthesis.
- Speech Synthesis | Pp. 86-94
On the Usefulness of Linear and Nonlinear Prediction Residual Signals for Speaker Recognition
Marcos Faundez-Zanuy
This paper compares the identification rates of a speaker recognition system using several parameterizations, with special emphasis on the residual signal obtained from linear and nonlinear predictive analysis. It is found that the residual signal is still useful even when using a high dimensional linear predictive analysis. On the other hand, it is shown that the residual signal of a nonlinear analysis contains less useful information, even for a prediction order of 10, than the linear residual signal. This shows the inability of the linear models to cope with nonlinear dependences present in speech signals, which are useful for recognition purposes.
- Speaker Recognition | Pp. 95-104
Multi Filter Bank Approach for Speaker Verification Based on Genetic Algorithm
Christophe Charbuillet; Bruno Gas; Mohamed Chetouani; Jean Luc Zarader
Speaker recognition systems usually need a feature extraction stage which aims at obtaining the best signal representation. State of the art speaker verification systems are based on cepstral features like MFCC, LFCC or LPCC. In this article, we propose a feature extraction system based on the combination of three feature extractors adapted to the speaker verification task. A genetic algorithm is used to optimise the features complementarity. This optimisation consists in designing a set of three non linear scaled filter banks. Experiments are carried out using a state of the art speaker verification system. Results show that the proposed method improves significantly the system performances on the 2005 Nist SRE Database. Furthermore, the obtained feature extractors show the importance of some specific spectral information for speaker verification.
- Speaker Recognition | Pp. 105-113
Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features
Lara Stoll; Joe Frankel; Nikki Mirghafori
We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-of-the-art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.
- Speaker Recognition | Pp. 114-123
Perceptron-Based Class Verification
Michael Gerber; Tobias Kaufmann; Beat Pfister
We present a method to use multilayer perceptrons (MLPs) for a verification task, i.e. to verify whether two vectors are from the same class or not. In tests with synthetic data we could show that the verification MLPs are almost optimal from a Bayesian point of view. With speech data we have shown that verification MLPs generalize well such that they can be deployed as well for classes which were not seen during the training.
- Speaker Recognition | Pp. 124-131
Manifold Learning-Based Feature Transformation for Phone Classification
Andrew Errity; John McKenna; Barry Kirkpatrick
This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of manifold learning techniques have been developed in recent years that attempt to discover this type of underlying geometric structure. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional representations produced by applying these techniques to MFCC feature vectors are evaluated in several phone classification tasks on the TIMIT corpus. Classification accuracy is analysed and compared to conventional MFCC features and those transformed with PCA, a linear dimensionality reduction method. It is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features. The best phone classification accuracy in general is demonstrated by feature transformation with Isomap.
- Speech Recognition | Pp. 132-141