Catálogo de publicaciones - libros

Compartir en
redes sociales


Intelligent Multimedia Processing with Soft Computing

Yap-Peng Tan ; Kim Hui Yap ; Lipo Wang (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-23053-3

ISBN electrónico

978-3-540-32367-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Human-Centered Computing for Image and Video Retrieval

L. Guan; P. Muneesawang; J. Lay; T. Amin; I. Lee

In this chapter, we present retrieval techniques using content-based and concept-based technologies, for digital image and video database applications. We first deal with the state-of-the-art methods in a content-based framework including: Laplacian mixture model for content characterization, nonlinear relevance feedback, combining audio and visual features for video retrieval, and designing automatic relevance feedback in distributed digital libraries. We then take an elevated post, to review the defining characteristic and usefulness of the current content-based approaches and to articulate any required extension in order to support semantic queries.

Pp. 1-31

Vector Color Image Indexing and Retrieval within A Small-World Framework

D. Androutsos; P. Androutsos; K. N. Plataniotis; A. N. Venetsanopoulos

In this chapter, we present a novel and robust scheme for extracting, indexing and retrieving color image data. We use color segmentation to extract regions of prominent and perceptually relevant color and use representative vectors from these extracted regions in the image indices. Our similarity measure for retrieval is based on the angular distance between query color vectors and the indexed representative vectors. Furthermore, we extend theory and present an alternative approach to centralized image indices using a distributed rationale where images are not restricted to reside locally but can be located anywhere on a network.

Pp. 33-54

A Perceptual Subjectivity Notion in Interactive Content-Based Image Retrieval Systems

Kui Wu; Kim-Hui Yap

This chapter presents a new framework called fuzzy relevance feedback in interactive content-based image retrieval (CBIR) systems. Conventional binary labeling scheme in relevance feedback requires a hard-decision to be made on the relevance of each retrieved image. This is inflexible as user interpretation varies with respect to different information needs and perceptual subjectivity. In addition, users tend to learn from the retrieval results to further refine their information priority. It is, therefore, inadequate to describe the users’ fuzzy perception of image similarity with crisp logic. In view of this, a fuzzy framework is introduced to integrate the users’ imprecise interpretation of visual contents into relevance feedback. An efficient learning approach is developed using a fuzzy radial basis function network (FRBFN). The network is constructed based on hierarchical clustering algorithm. The underlying network parameters are optimized by adopting a gradient-descent-based training strategy due to its computational efficiency. Experimental results using a database of 10,000 images demonstrate the effectiveness of the proposed method.

Pp. 55-73

A Scalable Bootstrapping Framework for Auto-Annotation of Large Image Collections

Tat-Seng Chua; Huamin Feng

Image annotation aims to assign semantic concepts to images based on their visual contents. It has received much attention recently as huge dynamic collections of images/videos become available on the Web. Most recent approaches employ supervised learning techniques, which have the limitation that a large set of labeled training samples is required for effective learning. This is both tedious and time consuming to obtain. This chapter explores the use of a bootstrapping framework to tackle this problem by employing three complementary strategies. First, we train two “view independent” classifiers based on probabilistic SVM using two orthogonal sets of content features and incorporate the classifiers in the co-training framework to annotate regions. Second, at the image level, we employ two different segmentation methods to segment the image into different sets of possibly overlapping regions and devise a contextual model to disambiguate the concepts learned from different regions. Third, we incorporate active learning in order to ensure that the framework is scalable to large image collections. Our experiments on a mid-sized image collection demonstrate that our bootstrapping cum active learning framework is effective. As compared to the traditional supervised learning approach, it is able to improve the accuracy of annotation by over 4% in F measure without active learning, and by over 18% when active learning is incorporated. Most importantly, the bootstrapping framework has the added benefit that it requires only a small set of training samples to kick start the learning process, making it suitable to practical applications.

Pp. 75-90

Moderate Vocabulary Visual Concept Detection for the TRECVID 2002

Milind R. Naphade; John R. Smith

The explosion in multimodal content availability underlines the necessity for content management at a semantic level. We have cast the problem of detecting semantics in multimedia content as a pattern classification problem and the problem of building models of multimodal semantics as a learning problem. Recent trends show increasing use of statistical machine learning providing a computational framework for mapping low level media features to high level semantic concepts. In this chapter we expose the challenges that these techniques face. We show that if a lexicon of visual concepts is identified a priori, a statistical framework can be used to build visual feature models for the concepts in the lexicon. Using support vector machine (SVM) classification we build models for 34 semantic concepts for the TREC 2002 benchmark corpus. We study the effect of number of examples available for training with respect to their impact on detection. We also examine low level feature fusion as well as parameter sensitivity with SVM classifiers.

Pp. 91-107

Automatic Visual Concept Training Using Imperfect Cross-Modality Information

Xiaodan Song; Ching-Yung Lin; Ming-Ting Sun

In this chapter, we show an autonomous learning scheme to automatically build visual semantic concept models from video sequences or the searched data of Internet search engines without any manual labeling work. First of all, system users specify some specific concept models to be learned automatically. Example videos or images can be obtained from the large video databases based on the result of keyword search on the automatic speech recognition transcripts. Another alternative method is to gather them by using the Internet search engines. Then, we propose to model the searched results as a term of “Quasi-Positive Bags” in the Multiple-Instance Learning (MIL). We call this as the generalized MIL (GMIL). In some of the scenarios, there is also no “Negative Bags” in the GMIL. We propose an algorithm called “Bag K-Means” to find out the maximum Diverse Density (DD) without the existence of negative bags. A cost function is found as K-Means with special “Bag Distance”. We also show a solution called “Uncertain Labeling Density” (ULD) which describes the target density distribution of instances in the case of quasi-positive bags. A “Bag Fuzzy K-Means” is presented to get the maximum of ULD. Utilizing this generalized MIL with ULD framework, the model for a particular concept can then be learned through general supervised learning methods. Experiments show that our algorithm get correct models for the concepts we are interested in.

Pp. 109-128

Audio-Visual Event Recognition with Application in Sports Video

Ziyou Xiong; Regunathan Radhakrishnan; Ajay Divakaran; Thomas S. Huang

We summarize our recent work on “highlight” events detection and recognition in sports video. We have developed two different joint audio-visual fusion frameworks for this task, namely “audio-visual coupled hidden Markov model” and “audio classification then visual hidden Markov model verification”. Our comparative study of these two frameworks shows that the second approach outperforms the first approach by a large margin. Our study also suggests the importance of modeling the so-called middle-level features such as audience reactions and camera patterns in sports video.

Pp. 129-149

Fuzzy Logic Methods for Video Shot Boundary Detection and Classification

Ralph M. Ford

A fuzzy logic system for the detection and classification of shot boundaries in uncompressed video sequences is presented. It integrates multiple sources of information and knowledge of editing procedures to detect shot boundaries. Furthermore, the system classifies the editing process employed to create the shot boundary into one of the following categories: abrupt cut, fade-in, fade-out, or dissolve. This system was tested on a database containing a wide variety of video classes. It achieved combined recall and precision rates that significantly exceed those of existing threshold-based techniques, and it correctly classified a high percentage of the detected boundaries.

Pp. 151-169

Rate-Distortion Optimal Video Summarization and Coding

Zhu Li; Aggelos K. Katsaggelos; Guido M. Schuster

The demand for video summarization originates from a viewing time constraint as well as bit budget constraint from communication and storage limitations, in security, military, and entertainment applications. In this chapter we formulate and solve the video summarization problems as rate-distortion optimization problems. Effective new summarization distortion metric is developed. Several optimal algorithms are presented along with some effective heuristic solutions.

Pp. 171-204

Video Compression by Neural Networks

Daniele Vigliano; Raffaele Parisi; Aurelio Uncini

In this chapter a general overview of most common approaches to video compression is first provided. Standardization issues are briefly discussed and most recent neural compression techniques reviewed. In addition, a particularly effective novel neural paradigm is introduced and described. The new approach is based on a proper quad-tree segmentation of video frames and is capable to yield a considerable improvement with respect to existing standards in high quality video compression. Experimental tests are described to demonstrate the efficacy of the proposed solution.

Pp. 205-234