Catálogo de publicaciones - libros
Advances in Nonlinear Speech Processing: International Conference on Non-Linear Speech Processing, NOLISP 2007 Paris, France, May 22-25, 2007 Revised Selected Papers
Mohamed Chetouani ; Amir Hussain ; Bruno Gas ; Maurice Milgram ; Jean-Luc Zarader (eds.)
En conferencia: International Conference on Nonlinear Speech Processing (NOLISP) . Paris, France . May 22, 2007 - May 25, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Theory of Computation; Artificial Intelligence (incl. Robotics); Language Translation and Linguistics; Biometrics; Computer Appl. in Arts and Humanities; Image Processing and Computer Vision
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-77346-7
ISBN electrónico
978-3-540-77347-4
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Tabla de contenidos
An Efficient VAD Based on a Generalized Gaussian PDF
Oscar Pernía; Juan M. Górriz; Javier Ramírez; Carios G. Puntonet; Ignacia Turias
The emerging applications of wireless speech communication are demanding increasing levels of performance in noise adverse environments together with the design of high response rate speech processing systems. This is a serious obstacle to meet the demands of modern applications and therefore these systems often needs a noise reduction algorithm working in combination with a precise voice activity detector (VAD). This paper presents a new voice activity detector (VAD) for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm defines an optimum likelihood ratio test (LRT) involving Multiple and correlated Observations (MCO). An analysis of the methodology for = {2,3} shows the robustness of the proposed approach by means of a clear reduction of the classification error as the number of observations is increased. The algorithm is also compared to different VAD methods including the G.729, AMR and AFE standards, as well as recently reported algorithms showing a sustained advantage in speech/non-speech detection accuracy and speech recognition performance.
- Exploitation of non-linear techniques | Pp. 246-254
Estimating the Dispersion of the Biometric Glottal Signature in Continuous Speech
Pedro Gómez; Agustín Álvarez; Luis Miguel Mazaira; Roberto Fernández; Victoria Rodellar; Rafael Martínez; Cristina Muñoz
The biometric voice signature may be derived from voice as a whole, or from the separate vocal tract and glottal source after inverse filtering extraction. This last approach has been used by the authors in early work, where it has been shown that the biometric signature obtained from the glottal source provides a good description of speaker’s characteristics as gender or age. In the present work more accurate estimations of the singularities in the power spectral density of the glottal source are obtained using an adaptive version of the inverse filtering to carefully follow the spectral changes in continuous speech. Therefore the resulting biometric signature gives a better description of intra-speaker variability. Typical male and female samples chosen from a database of 100 normal speakers are used to determine certain gender specific patterns useful in pathology treatment availing. The low intra-speaker variability present in the biometric signature makes it suitable for speaker identification applications as well as for pathology detection and other fields of speech characterization.
- Exploitation of non-linear techniques | Pp. 255-262
Trajectory Mixture Density Networks with Multiple Mixtures for Acoustic-Articulatory Inversion
Korin Richmond
We have previously proposed a trajectory model which is based on a mixture density network (MDN) trained with target variables augmented with dynamic features together with an algorithm for estimating maximum likelihood trajectories which respects the constraints between those features. In this paper, we have extended that model to allow diagonal covariance matrices and multiple mixture components in the trajectory MDN output probability density functions. We have evaluated this extended model on an inversion mapping task and found the trajectory model works well, outperforming smoothing of equivalent trajectories using low-pass filtering. Increasing the number of mixture components in the TMDN improves results further.
- Exploitation of non-linear techniques | Pp. 263-272
Application of Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Speech
Aitor Álvarez; Idoia Cearreta; Juan Miguel López; Andoni Arruti; Elena Lazkano; Basilio Sierra; Nestor Garay
The study of emotions in human-computer interaction is a growing research area. Focusing on automatic emotion recognition, work is being performed in order to achieve good results particularly in speech and facial gesture recognition. In this paper we present a study performed to analyze different machine learning techniques validity in automatic speech emotion recognition area. Using a bilingual affective database, different speech parameters have been calculated for each audio recording. Then, several machine learning techniques have been applied to evaluate their usefulness in speech emotion recognition, including techniques based on evolutive algorithms (EDA) to select speech feature subsets that optimize automatic emotion recognition success rate. Achieved experimental results show a representative increase in the success rate.
- Exploitation of non-linear techniques | Pp. 273-281