Catálogo de publicaciones - libros

Compartir en
redes sociales


DSP for In-Vehicle and Mobile Systems

Hüseyin Abut ; John H.L. Hansen ; Kazuya Takeda (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-0-387-22978-2

ISBN electrónico

978-0-387-22979-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Science + Business Media, Inc. 2005

Tabla de contenidos

Construction and Analysis of a Multi-Layered In-car Spoken Dialogue Corpus

Nobuo Kawaguchi; Shigeki Matsubara; Itsuki Kishida; Yuki Irie; Hiroya Murao; Yukiko Yamaguchi; Kazuya Takeda; Fumitada Itakura

In this chapter, we will discuss the construction of the multi-layered in-car spoken dialogue corpus and the preliminary result of the analysis. We have developed the system specially built in a Data Collection Vehicle (DCV) which supports synchronous recording of multi-channel audio data from 16 microphones that can be placed in flexible positions, multi-channel video data from 3 cameras and the vehicle related data. Multimedia data has been collected for three sessions of spoken dialogue with different types of navigator in about 60-minute drive by each of 800 subjects. We have defined the Layered Intention Tag for the analysis of dialogue structure for each of speech unit. Then we have marked the tag to all of the dialogues for over 35,000 speech units. By using the dialogue sequence viewer we have developed, we can analyze the basic dialogue strategy of the human-navigator. We also report the preliminary analysis of the relation between the intention and linguistic phenomenon.

Pp. 1-17

CU-Move: Advanced In-Vehicle Speech Systems for Route Navigation

John H.L. Hansen; Xianxian Zhang; Murat Akbacak; Umit H. Yapanel; Bryan Pellom; Wayne Ward; Pongtep Angkititrakul

In this chapter, we present our recent advances in the formulation and development of an in-vehicle hands-free route navigation system. The system is comprised of a multi-microphone array processing front-end, environmental sniffer (for noise analysis), robust speech recognition system, and dialog manager and information servers. We also present our recently completed speech corpus for in-vehicle interactive speech systems for route planning and navigation. The corpus consists of five domains which include: digit strings, route navigation expressions, street and location sentences, phonetically balanced sentences, and a route navigation dialog in a human Wizard-of-Oz like scenario. A total of 500 speakers were collected from across the United States of America during a six month period from April-Sept. 2001. While previous attempts at in-vehicle speech systems have generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused on natural conversational interaction between the user and in-vehicle system. After presenting our proposed in-vehicle speech system, we consider advances in multi-channel array processing, environmental noise sniffing and tracking, new and more robust acoustic front-end representations and built-in speaker normalization for robust ASR, and our back-end dialog navigation information retrieval sub-system connected to the WWW. Results are presented in each sub-section with a discussion at the end of the chapter.

Pp. 19-45

A Spoken Dialog Corpus for Car Telematics Services

Masahiko Tateishi; Katsushi Asami; Ichiro Akahori; Scott Judy; Yasunari Obuchi; Teruko Mitamura; Eric Nyberg; Nobuo Hataoka

Spoken corpora provide a critical resource for research, development and evaluation of spoken dialog systems. This chapter describes the spoken dialog corpus used in the design of CAMMIA (Conversational Agent for Multimedia Mobile Information Access), which employs a novel dialog management system that allows users to switch dialog tasks in a flexible manner. The corpus for car telematics services was collected from 137 male and 113 female speakers. The age distribution of speakers is balanced in the five age brackets of 20’s, 30’s, 40’s, 50’s, and 60’s. Analysis of the gathered dialogs reveals that the average number of dialog tasks per speaker was 8.1. The three most frequentlyrequested types of information in the corpus were traffic information, tourist attraction information, and restaurant information. Analysis of speaker utterances shows that the implied vocabulary size is approximately 5,000 words. The results are used for development and evaluation of automatic speech recognition (ASR) and dialog management software.

Pp. 47-64

Experiences of Multi-Speaker Dialogue System for Vehicular Information Retrieval

Hsien-Chang Wang; Jhing-Fa Wang

Currently, most spoken dialogue systems only deal with the interaction between the system and one speaker. In some situations, interactions may occur between several speakers and the system. New functions and improvements need to be made in order to handle a multi-user situation. Studies of the human computer interaction system that involve multiple users are in their initial stages and any papers, lectures or studies on the subject are very limited. For these reasons we are motivated to conduct a further study on multi-speaker dialogue systems. In this chapter, the interactions between the multiple speakers and the system are classified into three types: independent, cooperative, and conflicting interactions. An algorithm for the multi-speaker dialogue management is proposed to determine the interaction type, and to keep the interaction going smoothly. The experimental results show that the proposed algorithm can properly handle the interaction that occurs in a multi-speaker dialogue system, and provides useful vehicular information to the speakers.

Pp. 65-81

Robust Dialog Management Architecture Using VoiceXML for Car Telematics Systems

Yasunari Obuchi; Eric Nyberg; Teruko Mitamura; Scott Judy; Michael Duggan; Nobuo Hataoka

This chapter describes a dialog management architecture for car telematics systems. The system supports spontaneous user utterances and variable communication conditions between the in-car client and the remote server. The communication is based on VoiceXML over HTTP, and the design of the server-side application is based on DialogXML and ScenarioXML, which are layered extensions of VoiceXML. These extensions provide support for state-and-transition dialog programming, access to dynamic external databases, and sharing of commonly-used dialogs via templates. The client system includes a set of small grammars and lexicons for various tasks; only relevant grammars and lexicons are activated under the control of the dialog manager. The serverside applications are integrated via an abstract interface, and the client system may include compact versions of the same applications. The VoiceXML interpreter can switch between applications on both sides intelligently. This helps to reduce bandwidth utilization, and allows the system to continue even if the communication channel is lost.

Pp. 83-96

Use of Multiple Speech Recognition Units in an In-car Assistance System

Alessio Brutti; Paolo Coletti; Luca Cristoforetti; Petra Geutner; Alessandro Giacomini; Mirko Maistrello; Marco Matassoni; Maurizio Omologo; Frank Steffens; Piergiorgio Svaizer

This chapter presents an advanced dialogue system based on in-car hands-free voice interaction, conceived for obtaining driving assistance and for accessing tourist information while driving. Part of the related activities aimed at developing this “Virtual Intelligent Codriver” are being conducted under the European VICO project. The architecture of the dialogue system is here presented, with a description of its main modules: Front-end Speech Processing, Recognition Engine, Natural Language Understanding, Dialogue Manager and Car Wide Web. The use of a set of HMM recognizers, running in parallel, is being investigated within this project in order to ensure low complexity, modularity, fast response, and to allow a real-time reconfiguration of the language models and grammars according to the dialogue context. A corpus of spontaneous speech interactions was collected at ITC-irst using the Wizard-of-Oz method in a real driving situation. Multiple recognition units specialized on geographical subdomains and simpler language models were experimented using the resulting corpus. This investigation shows that, in presence of large lists of names (e.g. cities, streets, hotels), the choice of the output with maximum likelihood among the active units, although a simple approach, provides better results than the use of a single comprehensive language model.

Pp. 97-111

Hi-Speed Error Correcting Code LSI for Mobile Phone

Yuuichi Hamasuna; Masayasu Hata; Ichi Takumi

In recent years, the transmission speed of a cellular phone of the next generation has reached 100Mbps, and the transmission speed of optical communication amounts to 40 Gbps. Accordingly, demand for robust error correction code with high-speed processing of a Gbps class is increasing. The proposed code “High dimensional torus knot code” performs well in a field with many errors. In a performance comparison with the Reed-Solomon code, the performance of the proposed code is better than the Reed-Solomon code in an environment with 10-10 error. Moreover, doing a simulation in a CDMA communication environment, fluttering of error property has not occurred with the product code of convolutional codes (as inner code: the rate is 1/2) and the proposed code (as outer code: the rate is 0.53). Alternatively, under the same conditions, a fluttering error occurred in the Turbo cord. By applying the LSI technology, we developed ASIC of the proposed code, and FPGA for the high-speed MPEG communication device. We developed the three-dimensional, size-nine 3Dm9 and 4Dm5 chip. More specifically, the 3Dm9-code chip (developed in 2001) having a rate of 0.70 and block length of 729 bits was burnt onto a 100-kilogate, 0.35-micron-order LSI chip, and the 4Dm5-code chip (r=0.41, block=625, developed in 1999) was burnt onto a 50-kilogate, 0.6-micron-order LSI chip. Moreover, the 3Dm9-code chip was operated at a clock speed of 66.6MHz with throughput of 48Gbps. Finally, after applying the developed FPGA, the high-speed MPEG communication device can transmit a movie signal of 33Mbps.

Pp. 113-122

Modified Cerebellar Model Articulation Controller (MCMAC) as an Amplitude Spectral Estimator for Speech Enhancement

Abdul Wahab; Tan Eng Chong; Hüseyin Abut

In this chapter, we present a modified cerebellar model articulation controller (MCMAC) to be used together with the amplitude spectral estimator (ASE) for enhancing noisy speech. The MCMAC training overcomes the limitations of the CMAC technique we have employed noise/echo cancellation in a vehicular environment. While the CMAC in the training mode has trained only the trajectory it has visited by controlling the reference input, the modified MCMAC-ASE system architecture proposed in this work includes multiple MCMAC memory trainable for different noise sources.

Pp. 123-137

Noise Robust Speech Recognition Using Prosodic Information

Koji Iwano; Takahiro Seki; Sadaoki Furui

This paper proposes a noise robust speech recognition method for Japanese utterances using prosodic information. In Japanese, the fundamental frequency () contour conveys phrase intonation and word accent information. Consequently, it also conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust extraction method using the Hough transform, which achieves high extraction accuracy under various noise environments. Then it proposes a robust speech recognition method using syllable HMMs which model both segmental spectral features and contours. We use two prosodic features combined with ordinary cepstral parameters: a derivative of the time function of log (Δ log) and a maximum accumulated voting value of the Hough transform representing a measure of continuity. Speaker-independent experiments were conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. It was confirmed that both prosodic features improve the recognition accuracy in all noise conditions, and the effects are additive. When using both prosodic features, the best absolute improvement of digit accuracy is about 4.5%. This improvement was achieved by improving the digit boundary detection by using the robust prosodic information.

Pp. 139-152

Reduction of Diffuse Noise in Mobile and Vehicular Applications

Hamid Sheikhzadeh; Hamid Reza Abutalebi; Robert L. Brennan; George H. Freeman

In this chapter, we describe a hybrid subband adaptive speech enhancement system, implemented on an efficient ultra-low resource hardware platform utilizing oversampled generalized DFT filterbanks. Two analysis filterbanks decompose the two inputs (reference noise and noisy speech) into two sets of subband signals. In each subband, a subband adaptive filtering noise reduction block processes the two subband signals to reduce the noise producing a single signal which is followed by further noise reduction through Wiener filtering. Next, a synthesis filterbank converts the processed subband signals back into the time-domain. We have evaluated the performance of the hybrid noise reduction system in various real-life noise fields occurring in mobile and vehicular applications. Two closely spaced microphones make recordings in these noise fields. Signals from one microphone are used directly and represent the reference noise signal while signals from the other microphone are added to speech materials chosen from the TIMIT database before being used as the contaminated primary signal. It is demonstrated that all the noise recordings closely obey a diffuse noise field model. As the hybrid enhancement system is specifically designed to handle diffuse noise fields, it outperforms both the SAF and standard Wiener filtering in all sets of recordings. The superiority of the hybrid system is especially noted in the case of lowpass noise and intense noise conditions.

Pp. 153-168