Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Nonlinear Speech Processing: International Conference on Non-Linear Speech Processing, NOLISP 2007 Paris, France, May 22-25, 2007 Revised Selected Papers

Mohamed Chetouani ; Amir Hussain ; Bruno Gas ; Maurice Milgram ; Jean-Luc Zarader (eds.)

En conferencia: International Conference on Nonlinear Speech Processing (NOLISP) . Paris, France . May 22, 2007 - May 25, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Theory of Computation; Artificial Intelligence (incl. Robotics); Language Translation and Linguistics; Biometrics; Computer Appl. in Arts and Humanities; Image Processing and Computer Vision

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-77346-7

ISBN electrónico

978-3-540-77347-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Phase-Based Methods for Voice Source Analysis

Christophe d’Alessandro; Baris Bozkurt; Boris Doval; Thierry Dutoit; Nathalie Henrich; Vu Ngoc Tuan; Nicolas Sturmel

Voice source analysis is an important but difficult issue for speech processing. In this talk, three aspects of voice source analysis recently developed at LIMSI (Orsay, France) and FPMs (Mons, Belgium) are discussed. In a first part, time domain and spectral domain modelling of glottal flow signals are presented. It is shown that the glottal flow can be modelled as an anticausal filter (maximum phase) before the glottal closing, and as a causal filter (minimum phase) after the glottal closing. In a second part, taking advantage of this phase structure, causal and anticausal components of the speech signal are separated according to the location in the Z-plane of the zeros of the Z-Transform (ZZT) of the windowed signal. This method is useful for voice source parameters analysis and source-tract deconvolution. Results of a comparative evaluation of the ZZT and linear prediction for source/tract separation are reported. In a third part, glottal closing instant detection using the phase of the wavelet transform is discussed. A method based on the lines of maximum phase in the time-scale plane is proposed. This method is compared to EGG for robust glottal closing instant analysis.

- Non-Linear and Non-Conventional Techniques | Pp. 1-27

Some Experiments in Audio-Visual Speech Processing

G. Chollet; R. Landais; T. Hueber; H. Bredin; C. Mokbel; P. Perrot; L. Zouari

Natural speech is produced by the vocal organs of a particular talker. The acoustic features of the speech signal must therefore be correlated with the movements of the articulators (lips, jaw, tongue, velum,...). For instance, hearing impaired people (and not only them) improve their understanding of speech by lip reading. This chapter is an overview of audiovisual speech processing with emphasis on some experiments concerning recognition, speaker verification, indexing and corpus based synthesis from tongue and lips movements.

- Non-Linear and Non-Conventional Techniques | Pp. 28-56

Exploiting Nonlinearity in Adaptive Signal Processing

Phebe Vayanos; Mo Chen; Beth Jelfs; Danilo P. Mandic

performance criteria for the analysis of machine learning architectures and algorithms have been long established. However, the performance criteria, , nonlinearity assessment, are still emerging. To that end, we employ some recent developments in signal characterisation and derive criteria for the assessment of the changes in the nature of the processed signal. In addition, we also propose a novel online method for tracking the system nonlinearity. A comprehensive set of simulations in both the linear and nonlinear settings and their combination supports the analysis.

- Non-Linear and Non-Conventional Techniques | Pp. 57-77

Mixing HMM-Based Spanish Speech Synthesis with a CBR for Prosody Estimation

Xavi Gonzalvo; Ignasi Iriondo; Joan Claudi Socoró; Francesc Alías; Carlos Monzo

Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using an external machine learning technique to help improving the expressiveness. System performance is analysed objectively and subjectively. The experiments were conducted on a reliably labelled speech corpus, whose units were clustered using contextual factors based on the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing non-declarative short sentences while the durations accuracy is similar with the CBR or the HMM system.

- Speech Synthesis | Pp. 78-85

Objective and Subjective Evaluation of an Expressive Speech Corpus

Ignasi Iriondo; Santiago Planet; Joan-Claudi Socoró; Francesc Alías

This paper presents the validation of the expressiveness of an acted oral corpus produced to be used in speech synthesis. Firstly, an objective validation has been conducted by means of automatic emotion identification techniques using statistical features extracted from the prosodic parameters of speech. Secondly, a listening test has been performed with a subset of utterances. The relationship between both objective and subjective evaluations is analyzed and the obtained conclusions can be useful to improve the following steps related to expressive speech synthesis.

- Speech Synthesis | Pp. 86-94

On the Usefulness of Linear and Nonlinear Prediction Residual Signals for Speaker Recognition

Marcos Faundez-Zanuy

This paper compares the identification rates of a speaker recognition system using several parameterizations, with special emphasis on the residual signal obtained from linear and nonlinear predictive analysis. It is found that the residual signal is still useful even when using a high dimensional linear predictive analysis. On the other hand, it is shown that the residual signal of a nonlinear analysis contains less useful information, even for a prediction order of 10, than the linear residual signal. This shows the inability of the linear models to cope with nonlinear dependences present in speech signals, which are useful for recognition purposes.

- Speaker Recognition | Pp. 95-104

Multi Filter Bank Approach for Speaker Verification Based on Genetic Algorithm

Christophe Charbuillet; Bruno Gas; Mohamed Chetouani; Jean Luc Zarader

Speaker recognition systems usually need a feature extraction stage which aims at obtaining the best signal representation. State of the art speaker verification systems are based on cepstral features like MFCC, LFCC or LPCC. In this article, we propose a feature extraction system based on the combination of three feature extractors adapted to the speaker verification task. A genetic algorithm is used to optimise the features complementarity. This optimisation consists in designing a set of three non linear scaled filter banks. Experiments are carried out using a state of the art speaker verification system. Results show that the proposed method improves significantly the system performances on the 2005 Nist SRE Database. Furthermore, the obtained feature extractors show the importance of some specific spectral information for speaker verification.

- Speaker Recognition | Pp. 105-113

Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features

Lara Stoll; Joe Frankel; Nikki Mirghafori

We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-of-the-art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.

- Speaker Recognition | Pp. 114-123

Perceptron-Based Class Verification

Michael Gerber; Tobias Kaufmann; Beat Pfister

We present a method to use multilayer perceptrons (MLPs) for a verification task, i.e. to verify whether two vectors are from the same class or not. In tests with synthetic data we could show that the verification MLPs are almost optimal from a Bayesian point of view. With speech data we have shown that verification MLPs generalize well such that they can be deployed as well for classes which were not seen during the training.

- Speaker Recognition | Pp. 124-131

Manifold Learning-Based Feature Transformation for Phone Classification

Andrew Errity; John McKenna; Barry Kirkpatrick

This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of manifold learning techniques have been developed in recent years that attempt to discover this type of underlying geometric structure. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional representations produced by applying these techniques to MFCC feature vectors are evaluated in several phone classification tasks on the TIMIT corpus. Classification accuracy is analysed and compared to conventional MFCC features and those transformed with PCA, a linear dimensionality reduction method. It is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features. The best phone classification accuracy in general is demonstrated by feature transformation with Isomap.

- Speech Recognition | Pp. 132-141