Catálogo de publicaciones - libros

Compartir en
redes sociales


DSP for In-Vehicle and Mobile Systems

Hüseyin Abut ; John H.L. Hansen ; Kazuya Takeda (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-0-387-22978-2

ISBN electrónico

978-0-387-22979-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Science + Business Media, Inc. 2005

Tabla de contenidos

Speech Enhancement Based on F-Norm Constrained Truncated SVD Algorithm

Guo Chen; Soo Ngee Koh; Ing Yann Soon

Traditional singular value decomposition (SVD) based speech enhancement algorithms are usually limited by the use of a fixed order of retained singular values which may not be optimal for time-varying noise corrupted speech signals. In this chapter, we propose the use of a Frobenius-norm (F-norm) constrained truncated (FCTSVD) algorithm in an analysis-by-synthesis procedure for choosing the appropriate order of retained singular values for speech enhancement. It allows for self-adaptation in time and for different noise and noisy speech characteristics. Also, it leads to the best approximation of original speech in terms of SNR. The proposed algorithm has been tested and compared with a traditional SVD algorithm for different noise types and levels. Simulation results show that it achieves higher SNR improvements for both additive white noise and colored noise as compared to a traditional SVD algorithm.

Pp. 169-178

Verbkey - A Single-Chip Speech Control for the Automobile Environment

Rico Petrick; Diane Hirschfeld; Thomas Richter; Rüdiger Hoffmann

The article deals with a novel speech recognizer technology which has the potential to overcome some problems of in-car speech control. The recognizer bases on the Associative-Dynamic (ASD) algorithm which differs from established techniques as HMM or DTW. The speech recognition technology is designed to run on a 16 bit, fixed point DSP platform. It enables high recognition performance and robustness. At the same time, it is highly cost efficient due to its low memory consumption and its less calculation complexity. Typical applications such as dialling, word spotting or menu structures for the device control are processed by the continuous, real-time recognition engine with an accuracy higher 98% for a 20 words vocabulary. The article describes a hardware prototype for command & control applications and the measures taken to improve the robustness against environmental noises. Finally, the authors discuss some ergonomic aspects to obtain a higher level of traffic safety.

Pp. 179-192

Real-Time Transmission of H.264 Video over 802.11B-Based Wireless ad hoc Networks

E. Masala; C. F. Chiasserini; M. Meo; J. C. De Martin

This chapter aims at evaluating a number of Quality of Service (QoS) indices of a real-time video transmission over an 802.11b ad hoc wireless network. Video is coded according to the state-of-the-art ITU-T H.264 encoder and its transmission is simulated by means of the network simulator. Objective quality measurements are presented. Moreover, the impact of different parameters — both at the encoder and at the MAC level —, of background interfering traffic and of the number of relay nodes, is studied, showing the various trade-offs involved.

Pp. 193-207

DWT Image Compression for Mobile Communication

Lifeng Zhang; Tahaharu Kouda; Hiroshi Kondo; Teruo Shimomura

DWT image compression for mobile communication is presented. Discrete wavelet transform (DWT) with Haar mother function is utilized in this paper. The exact location information of the important DWT coefficients is generally needed for reconstructing the image. In this work, however, such information is not needed because it can be obtained from the DWT- approximation. Through a one dimensional directional difference operator not only the exact location information of the DWT coefficients but also the rough estimate of the coefficient itself can be obtained from the DWT-approximation when Haar mother wavelet function is utilized. The direction of the difference operator are different each other according to the DWT-details (horizontal, vertical and diagonal detail). This paper shows highly efficient image compression can be achieved when such DWT-approximation information is utilized well.

Pp. 209-218

Link-Adaptive Variable Bit-Rate Speech Transmission over 802.11 Wireless LANs

Antonio Servetti; Juan Carlos De Martin

We present an adaptive technique to transmit speech over 802.11 wireless packet networks. According to the proposed scheme, the speech coding rate of a networkdriven variable bit-rate coder is selected to match the instantaneous wireless channel conditions: higher rates (i.e., larger packets) for low error rates, lower rates (i.e., smaller packets) when the channel is noisy. Packet size is, in fact, directly related to the probability of retransmission, one of the major sources of delay and losses in contention-based medium access control. Network simulations using the 3GPP GSM-AMR speech coding standard show that the adaptive approach can address the stringent quality of service requirements for two-way interactive speech applications over wireless packet networks, reducing packet loss rates and end-to-end delays.

Pp. 219-235

Joint Audio-Video Processing for Robust Biometric Speaker Identification in Car

Engin Erzin; Yücel Yemez; A. Murat Tekalp

In this chapter, we present our recent results on the multilevel Bayesian decision fusion scheme for multimodal audio-visual speaker identification problem. The objective is to improve the recognition performance over conventional decision fusion schemes. The proposed system decomposes the information existing in a video stream into three components: speech, lip trace and face texture. Lip trace features are extracted based on 2D-DCT transform of the successive active lip frames. The mel-frequency cepstral coefficients (MFCC) of the corresponding speech signal are extracted in parallel to the lip features. The resulting two parallel and synchronous feature vectors are used to train and test a two stream Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Reliability based ordering in multilevel decision fusion is observed to be significantly robust at all SNR levels.

Pp. 237-256

Is Our Driving Behavior Unique?

Kei Igarashi; Kazuya Takeda; Fumitada Itakura; Hüseyin Abut

In this chapter, uniqueness of driver behavior in vehicles and the possibility to use in personal identification has been investigated with the objectives to achieve safer driving, to assist the driver in case of emergencies, and to be part of a multi-mode biometric signature for driver identification. Towards that end, the distributions and the spectra of pressure readings from the accelerator and brake pedals of drivers are measured. We have attempted to use the linear combination of these pedal pressure signals as the feature set. Preliminary results indicate that drivers apply pressure to pedals differently. Are they distinctly unique to be used an independent biometric to identify the individual? Even though our findings at this time are not conclusive, additional features, time-series analysis of the collected data and/or integration these features with audio and video inputs are being investigated.

Pp. 257-274

Robust ASR inside a Vehicle Using Blind Probabilistic Based Under-Determined Convolutive Mixture Separation Technique

Shubha Kadambe

Spoken dialogue based information retrieval systems are being used in mobile environments such as cars. However, the car environment is noisy and the user’s speech signal gets corrupted due to dynamically changing acoustic environment and the number of interference signals inside the car. The interference signals get mixed with speech signals convolutively due to the chamber impulse response. This tends to degrade the performance of a speech recognition system which is an integral part of a spoken dialogue based information retrieval system. One solution to alleviate this problem is to enhance speech signals such that the recognition accuracy does not degrade much. In this Chapter, we describe a blind source separation technique that would enhance convolutively mixed speech signals by separating the interference signals from the genuine speech. This technique is applicable for under-determined case i.e., the number of microphones is less than the number of signal sources and uses a probabilistic approach in a sparse transformed domain. We have collected speech data inside a car with variable number of interference sources such as wipers on, radio on, A/C on. We have applied our blind convolutive mixture separation technique to enhance the mixed speech signals. We conducted experiments to obtain speech recognition accuracy using with and without enhanced speech signals. For these experiments we used a continuous speech recognizer. Our results indicate 15–35 % improvement in speech recognition accuracy.

Pp. 276-292

In-car Speech Recognition Using Distributed Microphones

Tetsuya Shinde; Kazuya Takeda; Fumitada Itakura

In this paper, we describe a method for multichannel noisy speech recognition that can adapt to various in-car noise situations during driving. Our proposed technique enables us to estimate the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by a set of distributed microphones. Through clustering of the spatial noise distributions under various driving conditions, the regression weights for MRLS are effectively adapted to the driving conditions. The experimental evaluation shows an average error rate reduction of 43 % in isolated word recognition under 15 different driving conditions.

Pp. 293-307