Catálogo de publicaciones - libros
Advances in Multimedia Information Processing: 7th Pacific Rim Conference on Multimedia, Hangzhou, China, November 2-4, 2006, Proceedings
Yueting Zhuang ; Shi-Qiang Yang ; Yong Rui ; Qinming He (eds.)
En conferencia: 7º Pacific-Rim Conference on Multimedia (PCM) . Hangzhou, China . November 2, 2006 - November 4, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Computer Applications; Multimedia Information Systems; Information Storage and Retrieval; Computer Communication Networks; Information Systems Applications (incl. Internet); Image Processing and Computer Vision
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-48766-1
ISBN electrónico
978-3-540-48769-2
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Cobertura temática
Tabla de contenidos
doi: 10.1007/11922162_26
Varying Microphone Patterns for Meeting Speech Segmentation Using Spatial Audio Cues
Eva Cheng; Ian Burnett; Christian Ritz
Meetings, common to many business environments, generally involve stationary participants. Thus, participant location information can be used to segment meeting speech recordings into each speaker’s ‘turn’. The authors’ previous work proposed the use of spatial audio cues to represent the speaker locations. This paper studies the validity of using spatial audio cues for meeting speech segmentation by investigating the effect of varying microphone pattern on the spatial cues. Experiments conducted on recordings of a real acoustic environment indicate that the relationship between speaker location and spatial audio cues strongly depends on the microphone pattern.
Pp. 221-228
doi: 10.1007/11922162_27
Region-Based Sub-pixel Motion Estimation from Noisy, Blurred, and Down-Sampled Sequences
Osama A. Omer; Toshihisa Tanaka
Motion estimation is one of the most important steps in super-resolution algorithms for a video sequence, which require estimating motion from a noisy, blurred, and down-sampled sequence; therefore the motion estimation has to be robust. In this paper, we propose a robust sub-pixel motion estimation algorithm based on region matching. Non-rectangular regions are first extracted by using a so-called watershed transform. For each region, the best matching region in a previous frame is found to get the integer-pixel motion vector. Then in order to refine the accuracy of the estimated motion vector, we search the eight sub-pixels around the estimated motion vector for a sub-pixel motion vector. Performance of our proposed algorithm is compared with the well known full search with both integer-pixel and sup-pixel accuracy. Also it is compared with the integer-pixel region matching algorithm for several noisy video sequences with various noise variances. The results show that our proposed algorithm is the most suitable for noisy, blurred, and down-sampled sequences among these conventional algorithms.
Pp. 229-236
doi: 10.1007/11922162_28
Differential Operation Based Palmprint Authentication for Multimedia Security
Xiangqian Wu; Kuanquan Wang; David Zhang
This paper presents a novel approach of palmprint authentication for multimedia security by using the differential operation. In this approach, a differential operation is first conducted to the palmprint image in horizontal direction. And then the palmprint is encoded according to the sign of the value of each pixel of the differential image. This code is called DiffCode of the palmprint. The size of DiffCode is 128 bytes, which is the smallest one among the existing palmprint features and suitable for multimedia security. The similarity of two DiffCode is measured using their Hamming distance. This approach is tested on the public PolyU Palmprint Database and the EER is 0.6%, which is comparable with the existing palmprint recognition methods.
Pp. 237-244
doi: 10.1007/11922162_29
A Broadcast Model for Web Image Annotation
Jia Li; Ting Liu; Weiqiang Wang; Wen Gao
Automatic annotation of Web image has great potential in improving the performance of web image retrieval. This paper presents a Broadcast Model (BM) for Web image annotation. In this model, pages are divided into blocks and the annotation of image is realized through the interaction of information from blocks and relevant web pages. Broadcast means each block will receive information (just like signals) from relevant web pages and modify its feature vector according to this information. Compared with most existing image annotation systems, the proposed algorithm utilizes the associated information not only from the page where images locate, but also from other related pages. Based on generated annotations, a retrieval application is implemented to evaluate the proposed annotation algorithm. The preliminary experimental result shows that this model is effective for the annotation of web image and will reduce the number of the result images and the time cost in the retrieval.
Pp. 245-251
doi: 10.1007/11922162_30
An Approach to the Compression of Residual Data with GPCA in Video Coding
Lei Yao; Jian Liu; Jiangqin Wu
Generalized Principle Component Analysis (GPCA) is a global solution to identify a mixture of linear models for signals. This method has been proved to be efficient in compressing natural images. In this paper we try to introduce GPCA into video coding. We focus on encoding residual frames with GPCA in place of classical DCT, and also propose to use it in MCTF based scalable video coding. Experiments show that GPCA really gets better PSNR with the same amount of data components as DCT, and this method is promising in our scalable video coding scheme.
Pp. 252-261
doi: 10.1007/11922162_31
A Robust Approach for Object Recognition
Yuanning Li; Weiqiang Wang; Wen Gao
In this paper, we present a robust and unsupervised approach for recognition of object categories, RTSI-pLSA, which overcomes the weakness of TSI-pLSA in recognizing rotated objects in images. Our approach uses radial template to describe spatial information (position, scale and orientation) of an object. A bottom up heuristical and unsupervised scheme is also proposed to estimate spatial parameters of object. Experimental results show the RTSI-pLSA can effectively recognize object categories, especially in recognizing rotated, translated, or scaled objects in images. It lowers the error rate by about 10%, compared with TSI-pLSA. Thus, it is a more robust approach for unsupervised object recognition.
Pp. 262-269
doi: 10.1007/11922162_32
A Novel Method for Spoken Text Feature Extraction in Semantic Video Retrieval
Juan Cao; Jintao Li; Yongdong Zhang; Sheng Tang
We propose a novel method for extracting text feature from the automatic speech recognition (ASR) results in semantic video retrieval. We combine HowNet-rule-based knowledge with statistic information to build special concept lexicons, which can rapidly narrow the vocabulary and improve the retrieval precision. Furthermore, we use the term precision (TP) weighting method to analyze ASR texts. This weighting method is sensitive to the sparse but important terms in the relevant documents. Experiments show that the proposed method is effective for semantic video retrieval.
Pp. 270-278
doi: 10.1007/11922162_33
A Semantic Image Category for Structuring TV Broadcast Video Streams
Jinqiao Wang; Lingyu Duan; Hanqing Lu; Jesse S. Jin
TV broadcast video stream consists of various kinds of programs such as sitcoms, news, sports, commercials, weather, etc. In this paper, we propose a semantic image category, named as Program Oriented Informative Images (POIM), to facilitate the segmentation, indexing and retrieval of different programs. The assumption is that most stations tend to insert lead-in/-out video shots for explicitly introducing the current program and indicating the transitions between consecutive programs within TV streams. Such shots often utilize the overlapping of text, graphics, and storytelling images to create an image sequence of POIM as a visual representation for the current program. With the advance of post-editing effects, POIM is becoming an effective indicator to structure TV streams, and also is a fairly common “prop” in program content production. We have attempted to develop a POIM recognizer involving a set of global/local visual features and supervised/unsupervised learning. Comparison experiments have been carried out. A promising result, F1 = 90.2%, has been achieved on a part of TRECVID 2005 video corpus. The recognition of POIM, together with other audiovisual features, can be used to further determine program boundaries.
Pp. 279-286
doi: 10.1007/11922162_34
Markov Chain Monte Carlo Super-Resolution Image Reconstruction with Simultaneous Adaptation of the Prior Image Model
Jing Tian; Kai-Kuang Ma
In our recent work, the (MCMC) technique has been successfully exploited and shown as an effective approach to perform super-resolution image reconstruction. However, one major challenge lies at the selection of the hyperparameter of the prior image model, which affects the degree of regularity imposed by the prior image model, and consequently, the quality of the estimated high-resolution image. To tackle this challenge, in this paper, we propose a novel approach to automatically adapt the model’s hyperparameter during the MCMC process, rather than the exhaustive, off-line search. Experimental results presented show that the proposed hyperparameter adaptation method yields extremely close performance to that of the optimal prior image model case.
Pp. 287-294
doi: 10.1007/11922162_36
Robust Mandarin Speech Recognition for Car Navigation Interface
Pei Ding; Lei He; Xiang Yan; Rui Zhao; Jie Hao
This paper presents a robust automatic speech recognition (ASR) system as multimedia interface for car navigation. In front-end, we use the minimum-mean square error (MMSE) enhancement to suppress the background in-car noise and then compensate the spectrum components distorted by noise over-reduction by smoothing technologies. In acoustic model training, an immunity learning scheme is adopted, in which pre-recorded car noises are artificially added to clean training utterances to imitate the in-car environment. The immunity scheme makes the system robust to both residual noise and speech enhancement distortion. In the context of Mandarin speech recognition, a special issue is the diversification of Chinese dialects, i.e. the pronunciation difference among accents decreases the recognition performance if the acoustic models are trained with an unmatched accented database. We propose to train the models with multiple accented Mandarin databases to solve this problem. The efficiency of the proposed ASR system is confirmed in evaluations.
Pp. 302-309