Catálogo de publicaciones - libros

Compartir en
redes sociales

Advances in Multimedia Information Processing: 7th Pacific Rim Conference on Multimedia, Hangzhou, China, November 2-4, 2006, Proceedings

Yueting Zhuang ; Shi-Qiang Yang ; Yong Rui ; Qinming He (eds.)

En conferencia: 7º Pacific-Rim Conference on Multimedia (PCM) . Hangzhou, China . November 2, 2006 - November 4, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer Applications; Multimedia Information Systems; Information Storage and Retrieval; Computer Communication Networks; Information Systems Applications (incl. Internet); Image Processing and Computer Vision

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-48766-1

ISBN electrónico

978-3-540-48769-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11922162_26

Varying Microphone Patterns for Meeting Speech Segmentation Using Spatial Audio Cues

Eva Cheng; Ian Burnett; Christian Ritz

Meetings, common to many business environments, generally involve stationary participants. Thus, participant location information can be used to segment meeting speech recordings into each speaker’s ‘turn’. The authors’ previous work proposed the use of spatial audio cues to represent the speaker locations. This paper studies the validity of using spatial audio cues for meeting speech segmentation by investigating the effect of varying microphone pattern on the spatial cues. Experiments conducted on recordings of a real acoustic environment indicate that the relationship between speaker location and spatial audio cues strongly depends on the microphone pattern.

Pp. 221-228

doi: 10.1007/11922162_27

Region-Based Sub-pixel Motion Estimation from Noisy, Blurred, and Down-Sampled Sequences

Osama A. Omer; Toshihisa Tanaka

Motion estimation is one of the most important steps in super-resolution algorithms for a video sequence, which require estimating motion from a noisy, blurred, and down-sampled sequence; therefore the motion estimation has to be robust. In this paper, we propose a robust sub-pixel motion estimation algorithm based on region matching. Non-rectangular regions are first extracted by using a so-called watershed transform. For each region, the best matching region in a previous frame is found to get the integer-pixel motion vector. Then in order to refine the accuracy of the estimated motion vector, we search the eight sub-pixels around the estimated motion vector for a sub-pixel motion vector. Performance of our proposed algorithm is compared with the well known full search with both integer-pixel and sup-pixel accuracy. Also it is compared with the integer-pixel region matching algorithm for several noisy video sequences with various noise variances. The results show that our proposed algorithm is the most suitable for noisy, blurred, and down-sampled sequences among these conventional algorithms.

Pp. 229-236

doi: 10.1007/11922162_28

Differential Operation Based Palmprint Authentication for Multimedia Security

Xiangqian Wu; Kuanquan Wang; David Zhang

This paper presents a novel approach of palmprint authentication for multimedia security by using the differential operation. In this approach, a differential operation is first conducted to the palmprint image in horizontal direction. And then the palmprint is encoded according to the sign of the value of each pixel of the differential image. This code is called DiffCode of the palmprint. The size of DiffCode is 128 bytes, which is the smallest one among the existing palmprint features and suitable for multimedia security. The similarity of two DiffCode is measured using their Hamming distance. This approach is tested on the public PolyU Palmprint Database and the EER is 0.6%, which is comparable with the existing palmprint recognition methods.

Pp. 237-244

doi: 10.1007/11922162_29

A Broadcast Model for Web Image Annotation

Jia Li; Ting Liu; Weiqiang Wang; Wen Gao

Automatic annotation of Web image has great potential in improving the performance of web image retrieval. This paper presents a Broadcast Model (BM) for Web image annotation. In this model, pages are divided into blocks and the annotation of image is realized through the interaction of information from blocks and relevant web pages. Broadcast means each block will receive information (just like signals) from relevant web pages and modify its feature vector according to this information. Compared with most existing image annotation systems, the proposed algorithm utilizes the associated information not only from the page where images locate, but also from other related pages. Based on generated annotations, a retrieval application is implemented to evaluate the proposed annotation algorithm. The preliminary experimental result shows that this model is effective for the annotation of web image and will reduce the number of the result images and the time cost in the retrieval.

Pp. 245-251

doi: 10.1007/11922162_30

An Approach to the Compression of Residual Data with GPCA in Video Coding

Lei Yao; Jian Liu; Jiangqin Wu

Generalized Principle Component Analysis (GPCA) is a global solution to identify a mixture of linear models for signals. This method has been proved to be efficient in compressing natural images. In this paper we try to introduce GPCA into video coding. We focus on encoding residual frames with GPCA in place of classical DCT, and also propose to use it in MCTF based scalable video coding. Experiments show that GPCA really gets better PSNR with the same amount of data components as DCT, and this method is promising in our scalable video coding scheme.

Pp. 252-261

doi: 10.1007/11922162_31

A Robust Approach for Object Recognition

Yuanning Li; Weiqiang Wang; Wen Gao

In this paper, we present a robust and unsupervised approach for recognition of object categories, RTSI-pLSA, which overcomes the weakness of TSI-pLSA in recognizing rotated objects in images. Our approach uses radial template to describe spatial information (position, scale and orientation) of an object. A bottom up heuristical and unsupervised scheme is also proposed to estimate spatial parameters of object. Experimental results show the RTSI-pLSA can effectively recognize object categories, especially in recognizing rotated, translated, or scaled objects in images. It lowers the error rate by about 10%, compared with TSI-pLSA. Thus, it is a more robust approach for unsupervised object recognition.

Pp. 262-269

doi: 10.1007/11922162_32

A Novel Method for Spoken Text Feature Extraction in Semantic Video Retrieval

Juan Cao; Jintao Li; Yongdong Zhang; Sheng Tang

We propose a novel method for extracting text feature from the automatic speech recognition (ASR) results in semantic video retrieval. We combine HowNet-rule-based knowledge with statistic information to build special concept lexicons, which can rapidly narrow the vocabulary and improve the retrieval precision. Furthermore, we use the term precision (TP) weighting method to analyze ASR texts. This weighting method is sensitive to the sparse but important terms in the relevant documents. Experiments show that the proposed method is effective for semantic video retrieval.

Pp. 270-278

doi: 10.1007/11922162_33

A Semantic Image Category for Structuring TV Broadcast Video Streams

Jinqiao Wang; Lingyu Duan; Hanqing Lu; Jesse S. Jin

TV broadcast video stream consists of various kinds of programs such as sitcoms, news, sports, commercials, weather, etc. In this paper, we propose a semantic image category, named as Program Oriented Informative Images (POIM), to facilitate the segmentation, indexing and retrieval of different programs. The assumption is that most stations tend to insert lead-in/-out video shots for explicitly introducing the current program and indicating the transitions between consecutive programs within TV streams. Such shots often utilize the overlapping of text, graphics, and storytelling images to create an image sequence of POIM as a visual representation for the current program. With the advance of post-editing effects, POIM is becoming an effective indicator to structure TV streams, and also is a fairly common “prop” in program content production. We have attempted to develop a POIM recognizer involving a set of global/local visual features and supervised/unsupervised learning. Comparison experiments have been carried out. A promising result, F1 = 90.2%, has been achieved on a part of TRECVID 2005 video corpus. The recognition of POIM, together with other audiovisual features, can be used to further determine program boundaries.

Pp. 279-286

doi: 10.1007/11922162_34

Markov Chain Monte Carlo Super-Resolution Image Reconstruction with Simultaneous Adaptation of the Prior Image Model

Jing Tian; Kai-Kuang Ma

In our recent work, the (MCMC) technique has been successfully exploited and shown as an effective approach to perform super-resolution image reconstruction. However, one major challenge lies at the selection of the hyperparameter of the prior image model, which affects the degree of regularity imposed by the prior image model, and consequently, the quality of the estimated high-resolution image. To tackle this challenge, in this paper, we propose a novel approach to automatically adapt the model’s hyperparameter during the MCMC process, rather than the exhaustive, off-line search. Experimental results presented show that the proposed hyperparameter adaptation method yields extremely close performance to that of the optimal prior image model case.

Pp. 287-294

doi: 10.1007/11922162_36

Robust Mandarin Speech Recognition for Car Navigation Interface

Pei Ding; Lei He; Xiang Yan; Rui Zhao; Jie Hao

This paper presents a robust automatic speech recognition (ASR) system as multimedia interface for car navigation. In front-end, we use the minimum-mean square error (MMSE) enhancement to suppress the background in-car noise and then compensate the spectrum components distorted by noise over-reduction by smoothing technologies. In acoustic model training, an immunity learning scheme is adopted, in which pre-recorded car noises are artificially added to clean training utterances to imitate the in-car environment. The immunity scheme makes the system robust to both residual noise and speech enhancement distortion. In the context of Mandarin speech recognition, a special issue is the diversification of Chinese dialects, i.e. the pronunciation difference among accents decreases the recognition performance if the acoustic models are trained with an unmatched accented database. We propose to train the models with multiple accented Mandarin databases to solve this problem. The efficiency of the proposed ASR system is confirmed in evaluations.

Pp. 302-309