Catálogo de publicaciones - libros
Verbal and Nonverbal Communication Behaviours: COST Action 2102 International Workshop, Vietri sul Mare, Italy, March 29-31, 2007, Revised Selected and Invited Papers
Anna Esposito ; Marcos Faundez-Zanuy ; Eric Keller ; Maria Marinaro (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Communication Studies; Artificial Intelligence (incl. Robotics); User Interfaces and Human Computer Interaction; Information Systems Applications (incl. Internet); Computers and Society; Computers and Education
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-76441-0
ISBN electrónico
978-3-540-76442-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
On the Use of NonVerbal Speech Sounds in Human Communication
Nick Campbell
Recent work investigating the interaction of the speech signal with the meaning of the verbal content has revealed interactions not yet modelled in either speech recognition technology or in contemporary linguistic science. In this paper we describe paralinguistic speech features that co-exist alongside linguistic content and propose a model of their function and usage, and discuss methods for ncorporating them into real-world applications and devices.
- IV – Analysis and Algorithms for Verbal and Nonverbal Speech | Pp. 117-128
Speech Spectrum Envelope Modeling
Robert Vích; Martin Vondra
A new method for speech analysis is described. It is based on extremes finding in the magnitude spectrum of a speech frame followed by interpolation. The interpolated spectrum envelope can be used for speech synthesis and also for the estimation of the excitation and background noise. In the contribution the proposed method is illustrated using a noisy speech frame and compared with LPC spectrum and spectrum obtained by classical and hidden cepstral smoothing.
- IV – Analysis and Algorithms for Verbal and Nonverbal Speech | Pp. 129-137
Using Prosody in Fixed Stress Languages for Improvement of Speech Recognition
György Szaszák; Klára Vicsi
In this chapter we examine the usage of prosodic features in speech recognition, with a special attention payed to agglutinating and fixed stress languages. Current knowledge in speech prosody exploitation is addressed in the introduction. The used prosodic features, acoustic-prosodic pre-processing, and segmentation in terms of prosodic units are presented in details. We use the expression “prosodic unit” in order to differentiate them from prosodic phrases, which are usually longer. We trained a HMM-based prosodic segmenter relying on fundamental frequency and intensity of speech. The output of this prosodic segmenter is used for N-best lattice rescoring in parallel with a simplified bigram language model in a continuous speech recognizer, in order to improve speech recognition performance. Experiments for Hungarian language show a WER reduction of about 4% using a simple lattice rescoring. The performance of the prosodic segmenter is also investigated in comparison with our earlier experiments.
- IV – Analysis and Algorithms for Verbal and Nonverbal Speech | Pp. 138-149
Single-Channel Noise Suppression by Wavelets in Spectral Domain
Zdeněk Smékal; Petr Sysel
The paper describes the design of a new single-channel method for speech enhancement that employs the wavelet transform. Signal decomposition is currently performed in the time domain while noise is removed on individual decomposition levels using thresholding techniques. Here the wavelet transform is applied in the spectral domain. Used as the basis is the method of spectral subtraction, which is suitable for real-time implementation because of its simplicity. The greatest problem in the spectral subtraction method is a trustworthy noise estimate, in particular when non-stationary noise is concerned. Using the wavelet transform we can achieve a more accurate power spectral density also of noise that is non-stationary. Listening tests and SNR measurements yield satisfactory results in comparison with earlier reported experience.
- IV – Analysis and Algorithms for Verbal and Nonverbal Speech | Pp. 150-164
Voice Source Change During Fundamental Frequency Variation
Peter J. Murphy
Prosody refers to certain properties of the speech signal including audible changes in pitch, loudness, and syllable length. The acoustic manifestation of prosody is typically measured in terms of fundamental frequency (f0), amplitude and duration. These three cues have formed the basis for extensive studies of prosody in natural speech. The present work seeks to go beyond this level of representation and to examine additional factors that arise as a result of the underlying production mechanism. For example, intonation is studied with reference to the f0 contour. However, to change f0 requires changes in the laryngeal configuration that results in glottal flow parameter changes. These glottal changes may serve as important psychoacoustic markers in addition to (or in conjunction with) the f0 targets. The present work examines changes in open quotient with f0 in connected speech using electroglottogram and volume velocity at the lips signals. This preliminary study suggests that individual differences may exist in terms of glottal changes for a particular f0 variation.
- IV – Analysis and Algorithms for Verbal and Nonverbal Speech | Pp. 165-173
A Gesture-Based Concept for Speech Movement Control in Articulatory Speech Synthesis
Bernd J. Kröger; Peter Birkholz
An articulatory speech synthesizer comprising a three-dimensional vocal tract model and a gesture-based concept for control of articulatory movements is introduced and discussed in this paper. A modular learning concept based on speech perception is outlined for the creation of gestural control rules. The learning concept includes on sensory feedback information for articulatory states produced by the model itself, and auditory and visual information of speech items produced by external speakers. The complete model (control module and synthesizer) is capable of producing high-quality synthetic speech signals and introduces a scheme for the natural speech production and speech perception processes.
- IV – Analysis and Algorithms for Verbal and Nonverbal Speech | Pp. 174-189
A Novel Psychoacoustically Motivated Multichannel Speech Enhancement System
Amir Hussain; Simone Cifani; Stefano Squartini; Francesco Piazza; Tariq Durrani
The ubiquitous noise reduction / speech enhancement problem has gained an increasing interest in recent years. This is due both to progress made by microphone-array systems and to the successful introduction of perceptual models. In the last decade, several methods incorporating psychoacoustic criteria in single channel speech enhancement systems have been proposed, however very few works exploit these features in the multichannel case. In this paper we present a novel psychoacoustically motivated, multichannel speech enhancement system that exploits spatial information and psychoacoustic concepts. The proposed framework offers enhanced flexibility allowing for a multitude of perceptually-based post-filtering solutions. Moreover, the system has been devised on a frame-by-frame basis to facilitate real-time implementation. Objective performance measures and informal subjective listening tests for the case of speech signals corrupted with real car and F-16 cockpit noise demonstrate enhanced performance of the proposed speech enhancement system in terms of musical residual noise reduction compared to conventional multichannel techniques.
- IV – Analysis and Algorithms for Verbal and Nonverbal Speech | Pp. 190-199
Analysis of Verbal and Nonverbal Acoustic Signals with the Dresden UASR System
Rüdiger Hoffmann; Matthias Eichner; Matthias Wolff
During the last few years, a framework for the development of algorithms for speech analysis and synthesis was implemented. The algorithms are connected to common databases on the different levels of a hierarchical structure. This framework which is called UASR (Unified Approach for Speech Synthesis and Recognition) and some related experiments and applications are described. Special focus is directed to the suitability of the system for processing nonverbal signals. This part is related to the analysis methods which are addressed in the COST 2102 initiative now. A potential application field in interaction research is discussed.
- IV – Analysis and Algorithms for Verbal and Nonverbal Speech | Pp. 200-218
VideoTRAN: A Translation Framework for Audiovisual Face-to-Face Conversations
Jerneja Žganec Gros
Face-to-face communication remains the most powerful human interaction. Electronic devices can never fully replace the intimacy and immediacy of people conversing in the same room, or at least via a videophone. There are many subtle cues provided by facial expressions and vocal intonation that let us know how what we are saying is affecting the other person. Transmission of these nonverbal cues is very important when translating conversations from a source language into a target language. This chapter describes VideoTRAN, a conceptual framework for translating audiovisual face-to-face conversations. A simple method for audiovisual alignment in the target language is proposed and the process of audiovisual speech synthesis is described. The VideoTRAN framework has been tested in a translating videophone. An H.323 software client translating videophone allows for the transmission and translation of a set of multimodal verbal and nonverbal clues in a multilingual face-to-face communication setting.
- V – Machine Multimodal Interaction | Pp. 219-226
Spoken and Multimodal Communication Systems in Mobile Settings
Markku Turunen; Jaakko Hakulinen
Mobile devices, such as smartphones, have become powerful enough to implement efficient speech-based and multimodal interfaces, and there is an increasing need for such systems. This chapter gives an overview of design and development issues necessary to implement mobile speech-based and multimodal systems. The chapter reviews infrastructure design solutions that make it possible to distribute the user interface between servers and mobile devices, and support user interface migration from server-based to distributed services. An example is given on how an existing server-based spoken timetable application is turned into a multimodal distributed mobile application.
- V – Machine Multimodal Interaction | Pp. 227-241