Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Nonlinear Speech Processing: International Conference on Non-Linear Speech Processing, NOLISP 2007 Paris, France, May 22-25, 2007 Revised Selected Papers

Mohamed Chetouani ; Amir Hussain ; Bruno Gas ; Maurice Milgram ; Jean-Luc Zarader (eds.)

En conferencia: International Conference on Nonlinear Speech Processing (NOLISP) . Paris, France . May 22, 2007 - May 25, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Theory of Computation; Artificial Intelligence (incl. Robotics); Language Translation and Linguistics; Biometrics; Computer Appl. in Arts and Humanities; Image Processing and Computer Vision

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-77346-7

ISBN electrónico

978-3-540-77347-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

An Efficient VAD Based on a Generalized Gaussian PDF

Oscar Pernía; Juan M. Górriz; Javier Ramírez; Carios G. Puntonet; Ignacia Turias

The emerging applications of wireless speech communication are demanding increasing levels of performance in noise adverse environments together with the design of high response rate speech processing systems. This is a serious obstacle to meet the demands of modern applications and therefore these systems often needs a noise reduction algorithm working in combination with a precise voice activity detector (VAD). This paper presents a new voice activity detector (VAD) for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm defines an optimum likelihood ratio test (LRT) involving Multiple and correlated Observations (MCO). An analysis of the methodology for  = {2,3} shows the robustness of the proposed approach by means of a clear reduction of the classification error as the number of observations is increased. The algorithm is also compared to different VAD methods including the G.729, AMR and AFE standards, as well as recently reported algorithms showing a sustained advantage in speech/non-speech detection accuracy and speech recognition performance.

- Exploitation of non-linear techniques | Pp. 246-254

Estimating the Dispersion of the Biometric Glottal Signature in Continuous Speech

Pedro Gómez; Agustín Álvarez; Luis Miguel Mazaira; Roberto Fernández; Victoria Rodellar; Rafael Martínez; Cristina Muñoz

The biometric voice signature may be derived from voice as a whole, or from the separate vocal tract and glottal source after inverse filtering extraction. This last approach has been used by the authors in early work, where it has been shown that the biometric signature obtained from the glottal source provides a good description of speaker’s characteristics as gender or age. In the present work more accurate estimations of the singularities in the power spectral density of the glottal source are obtained using an adaptive version of the inverse filtering to carefully follow the spectral changes in continuous speech. Therefore the resulting biometric signature gives a better description of intra-speaker variability. Typical male and female samples chosen from a database of 100 normal speakers are used to determine certain gender specific patterns useful in pathology treatment availing. The low intra-speaker variability present in the biometric signature makes it suitable for speaker identification applications as well as for pathology detection and other fields of speech characterization.

- Exploitation of non-linear techniques | Pp. 255-262

Trajectory Mixture Density Networks with Multiple Mixtures for Acoustic-Articulatory Inversion

Korin Richmond

We have previously proposed a trajectory model which is based on a mixture density network (MDN) trained with target variables augmented with dynamic features together with an algorithm for estimating maximum likelihood trajectories which respects the constraints between those features. In this paper, we have extended that model to allow diagonal covariance matrices and multiple mixture components in the trajectory MDN output probability density functions. We have evaluated this extended model on an inversion mapping task and found the trajectory model works well, outperforming smoothing of equivalent trajectories using low-pass filtering. Increasing the number of mixture components in the TMDN improves results further.

- Exploitation of non-linear techniques | Pp. 263-272

Application of Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Speech

Aitor Álvarez; Idoia Cearreta; Juan Miguel López; Andoni Arruti; Elena Lazkano; Basilio Sierra; Nestor Garay

The study of emotions in human-computer interaction is a growing research area. Focusing on automatic emotion recognition, work is being performed in order to achieve good results particularly in speech and facial gesture recognition. In this paper we present a study performed to analyze different machine learning techniques validity in automatic speech emotion recognition area. Using a bilingual affective database, different speech parameters have been calculated for each audio recording. Then, several machine learning techniques have been applied to evaluate their usefulness in speech emotion recognition, including techniques based on evolutive algorithms (EDA) to select speech feature subsets that optimize automatic emotion recognition success rate. Achieved experimental results show a representative increase in the success rate.

- Exploitation of non-linear techniques | Pp. 273-281