Catálogo de publicaciones - libros

Compartir en
redes sociales


Sound Capture for Human: Practical Aspects of Microphone Array Signal Processing

Wolfgang Herbordt

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-23954-3

ISBN electrónico

978-3-540-31592-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin/Heidelberg 2005

Tabla de contenidos

1 Introduction

Wolfgang Herbordt

With a continuously increasing desire for natural and comfortable human/machine interaction, the acoustic interface of any terminal for multimedia or telecommunication services is challenged to allow seamless and hands-free audio communication in such diverse acoustic environments as passenger cabins of cars or office and home environments. Typical applications include audio- /video-conferencing, dialog systems, computer games, command-and-control interfaces, dictation systems, or high-quality audio recordings. Compared to speech and audio capture using a microphone next to the desired source, seamless audio interfaces cause the desired source signal to be impaired by reverberation due to reflective acoustic environments, local interference and noise, and acoustic echoes from loudspeakers. All these interferers are not only annoying to human listeners but, more importantly, they are detrimental for example when speech recognition is involved.

Pp. 1-4

2 Space-Time Signals

Wolfgang Herbordt

In single-channel techniques for hands-free acoustic human/machine interfaces, we deal with waveforms which are functions of the continuous time. The aim of multi-channel sound capture is to exploit the structure of propagating waves, i.e., spatial and temporal properties in order to better meet the requirements of speech enhancement. The received signals are thus deterministic functions of position and of time, and, therefore, are called or . They have properties which are governed by the law of physics, in particular the wave equation. Just as temporal filtering can be described by temporal impulse responses, the wave propagation in acoustic environments can be modeled using space-time filters which are described by spatio-temporal impulse responses. Often, the deterministic model of space-time signals cannot be applied to acoustic signals, since audio signals can hardly be described by functions where each time instance is assigned a unique numerical value. The deterministic model of room impulse responses is not appropriate if the spatial extension of the source cannot be neglected since such spatio-temporal impulse responses of acoustic environments can generally not be described analytically. In such situations, it is more convenient to use statistical random .elds which are the ex tension of stochastic processes to multi-dimensional parameter spaces.

Pp. 5-24

3 Optimum Linear Filtering

Wolfgang Herbordt

In this chapter, we introduce the concept of optimum linear filtering for multiple-input multiple-output (MIMO) digital systems for solving multichannel linear digital filtering problems. A linear MIMO system is depicted in Fig. 3.1.

Pp. 25-39

4 Optimum Beamforming for Wideband Non-stationary Signals

Wolfgang Herbordt

Array processing techniques strive for extraction of maximum information from a propagating wave field using groups of sensors, which are located at distinct spatial locations. The sensors transduce propagating waves into signals describing both a finite spatial and a temporal aperture. In accordance with temporal sampling which leads to the discrete time domain, spatial sampling by sensor arrays forms the discrete space domain. Thus, with sensor arrays, signal processing operates in a multi-dim ensional space-time domain. The processor which combines temporal and spatial filtering using sensor arrays is called a beamformer. Many properties and techniques which are known from temporal FIR filtering directly translate t o beamforming based on finite spatial apertures.

Pp. 41-97

5 A Practical Audio Acquisition System Using a Robust GSC (RGSC)

Wolfgang Herbordt

In the preceding chapter, we have seen that data-dependent beamformers can be efficiently realized in GSC structures. However, GSCs, or more general LCLSE/LCMV beamformers, are sensitive to steering errors, to array perturbations and to reverberation w.r.t. the desired signal. Therefore, the performance of LCLSE/LCMV beamformers may decrease in realistic implementations: The desired signal may be distorted, a nd/or interference rejection may be reduced. In order to overcome these problems and to fully profit from the advantages of the GSC, we derive in this chapter a robust version of the GSC (RGSC). The problems of LCLSE/LCMV beamformers are resolved by using time-varying spatio-temporal constraints instead of purely spatial constraints [HK02a, HK03]. In contrast to spatial constraints, these s patiotemporal constraints explicitly take multi-pat h propagation w.r.t. the desired source into account by modeling the propagation by room impulse responses, such that the direct signal path and the secondary signal paths are cancelled at the output of the blocking matrix of the RGSC. Therefore, the RGSC is robust against cancellation of the desired signal in reverberant acoustic environments. In Chap. 2, it was shown that not only the propagation in a reflective environment can be modeled by impulse responses, but that n on-perfect transducer characteristics, i.e., directional-dependent and time-varying tolerances of the complex gain of the transducers, can be captured in impulse responses, too. Therefore, the RGSC is robust against tolerances of the characteristics or positions of the transducers without requiring explicit calibration. Timevarying spatio-temporal constraints take into account the temporal variations of the impulse responses between the desired source and the sensor signals due to speaker movements, changes in the acoustic environment, or time-varying complex gains of the sensors.

Pp. 99-132

6 Beamforming Combined with Multi-channel Acoustic Echo Cancellation

Wolfgang Herbordt

For audio signal acquisition, beamforming microphone arrays can be efficiently used for enhancing a desired signal while suppressing interference-plus-noise. For full-duplex communication systems, not only local interferers and noise corrupt the desired signal, but also acoustic echoes of loudspeaker signals. So far, we did not distinguish between local interferers and acoustic echoes. However, for suppressing acoustic echoes, more efficient techniques exist, which exploit the available loudspeaker signals as reference information. These methods are called (AECs) [SK91, BDH+99, GB00, BH03]. To cancel the acoustic echoes in the sensor channels, replicas of the echo signals are estimated and subtracted from the sensor signals. Acoustic echo cancellation is an application of system identification (Sect. 3.2.1). While the problem of monophonic acoustic echo cancellation has been studied f or many years now, acoustic echo cancellation was only recently extended to more than one reproduction channel [SMH95, BBK03].

Pp. 133-161

7 Efficient Real-Time Realization of an Acoustic Human/Machine Front-End

Wolfgang Herbordt

In the previous chapters, we discussed options for data-dependent optimum beamforming for acoustic human/machine front-ends on cost-sensitive platforms of limited dimension. The main difficulties for realizing adaptive datadependent optimum beamforming were studied, which resulted in an attractive solution for practical audio signal acquisition systems using a robust generalized sidelobe canceller with spatio-temporal co nstraints. Various techniques for combining the RGSC with multi-channel acoustic echo cancellation as a complementary speech enhancement technique for full-duplex applications were analyzed.

Pp. 163-203

8 Summary and Conclusions

Wolfgang Herbordt

Convenient human/machine interaction requires acoustic front-ends which allow seamless and hands-free audio communication. For suppressing interference, noise, and acoustic echoes of loudspeakers, interference/noise-reduction and acoustic echo cancellation should be integrated into the human/machine interface for maximum output signal quality and for optimum performance of speech recognizers.

Pp. 205-208

A Estimation of Signal-to-Interference-Plus-Noise Ratios (SINRs) Exploiting Non-stationarity

Wolfgang Herbordt

Our discussion about optimum data-dependent beamforming has shown that the second-order statistics of the sensor signals w.r.t. the desired signal and w.r.t. interference-plus-noise are required for realizing robust optimum datadependent beamformers. For realizations in the DFT domain, especially, estimates of power spectral densities and of spatio-spectral correlation matrices w.r.t. the desired signal and w.r.t. interference-pl us-noise are necessary. Consider, e.g., the optimum MMSE beamformer in the DTFT domain after (4.110) that requires the PSD and the spatio-spectral c orrelation matrix of interference-plus-noise. Such estimates can be obtained with, e.g., the minimum statistics after [Mar01a] using the generalization of [Bit02] for estimating spatio-spectral correlation matrices w.r.t. interference-plus-noise. However, these methods assume a slowly time-varying PSD of i nterference-plus-noise relative to a strongly time-varying PSD of the desired signal.

Pp. 209-223

B Experimental Setups and Acoustic Environments

Wolfgang Herbordt

In this chapter, the experimental setup and the acoustic environments are described, which are used in this work for illustrating the properties of the proposed algorithms. For our experiments, we use a linear microphone array with a variable number of sensors ∈{4 , 8 , 12, 16}. The geometry is illustrated in Fig. B.1.

Pp. 225-228