Catálogo de publicaciones - libros
Image and Video Retrieval: 5th Internatinoal Conference, CIVR 2006, Tempe, AZ, USA, July 13-15, 2006, Proceedings
Hari Sundaram ; Milind Naphade ; John R. Smith ; Yong Rui (eds.)
En conferencia: 5º International Conference on Image and Video Retrieval (CIVR) . Tempe, AZ, USA . July 13, 2006 - July 15, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Computer Graphics; Information Storage and Retrieval; Database Management; Information Systems Applications (incl. Internet); Multimedia Information Systems; Image Processing and Computer Vision
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-36018-6
ISBN electrónico
978-3-540-36019-3
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Cobertura temática
Tabla de contenidos
doi: 10.1007/11788034_11
Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation
Rui Shi; Tat-Seng Chua; Chin-Hui Lee; Sheng Gao
We propose a novel Bayesian learning framework of hierarchical mixture model by incorporating prior hierarchical knowledge into concept representations of multi-level concept structures in images. Characterizing image concepts by mixture models is one of the most effective techniques in automatic image annotation (AIA) for concept-based image retrieval. However it also poses problems when large-scale models are needed to cover the wide variations in image samples. To alleviate the potential difficulties arising in estimating too many parameters with insufficient training images, we treat the mixture model parameters as random variables characterized by a joint conjugate prior density of the mixture model parameters. This facilitates a statistical combination of the likelihood function of the available training data and the prior density of the concept parameters into a well-defined posterior density whose parameters can now be estimated via a maximum a posteriori criterion. Experimental results on the Corel image dataset with a set of 371 concepts indicate that the proposed Bayesian approach achieved a maximum F_1 measure of 0.169, which outperforms many state-of-the-art AIA algorithms.
Palabras clave: Mixture Model; Prior Density; Bayesian Learn; Concept Hierarchy; Primitive Concept.
- Session O4: Learning and Classification | Pp. 102-112
doi: 10.1007/11788034_12
Efficient Margin-Based Rank Learning Algorithms for Information Retrieval
Rong Yan; Alexander G. Hauptmann
Learning a good ranking function plays a key role for many applications including the task of (multimedia) information retrieval. While there are a few rank learning methods available, most of them need to explicitly model the relations between every pair of relevant and irrelevant documents, and thus result in an expensive training process for large collections. The goal of this paper is to propose a general rank learning framework based on the margin-based risk minimization principle and develop a set of efficient rank learning approaches that can model the ranking relations with much less training time. Its flexibility allows a number of margin-based classifiers to be extended to their rank learning counterparts such as the ranking logistic regression developed in this paper. Experimental results show that this efficient learning algorithm can successfully learn a highly effective retrieval function for multimedia retrieval on the TRECVID’03-’05 collections.
Palabras clave: Information Retrieval; Mean Average Precision; Retrieval Task; Ranking Feature; Video Retrieval.
- Session O4: Learning and Classification | Pp. 113-122
doi: 10.1007/11788034_13
Leveraging Active Learning for Relevance Feedback Using an Information Theoretic Diversity Measure
Charlie K. Dagli; Shyamsundar Rajaram; Thomas S. Huang
Interactively learning from a small sample of unlabeled examples is an enormously challenging task. Relevance feedback and more recently active learning are two standard techniques that have received much attention towards solving this interactive learning problem. How to best utilize the user’s effort for labeling, however, remains unanswered. It has been shown in the past that labeling a diverse set of points is helpful, however, the notion of diversity has either been dependent on the learner used, or computationally expensive. In this paper, we intend to address these issues by proposing a fundamentally motivated, information-theoretic view of diversity and its use in a fast, non-degenerate active learning-based relevance feedback setting. Comparative testing and results are reported and thoughts for future work are presented.
Palabras clave: Image Retrieval; Relevance Feedback; Query Point; Entropic Diversity; Query Concept.
- Session O5: Image and Video Retrieval Metrics | Pp. 123-132
doi: 10.1007/11788034_14
Video Clip Matching Using MPEG-7 Descriptors and Edit Distance
Marco Bertini; Alberto Del Bimbo; Walter Nunziati
Video databases require that clips are represented in a compact and discriminative way, in order to perform efficient matching and retrieval of documents of interest. We present a method to obtain a video representation suitable for this task, and show how to use this representation in a matching scheme. In contrast with existing works, the proposed approach is entirely based on features and descriptors taken from the well established MPEG-7 standard. Different clips are compared using an edit distance, in order to obtain high similarity between videos that differ for some subsequences, but are essentially related to the same content. Experimental validation is performed using a prototype application that retrieves TV commercials recorded from different TV sources in real time. Results show excellent performances both in terms of accuracy, and in terms of computational performances.
Palabras clave: Edit Distance; News Video; Video Database; Prototype Application; Edge Histogram Descriptor.
- Session O5: Image and Video Retrieval Metrics | Pp. 133-142
doi: 10.1007/11788034_15
Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting
Shi-Yong Neo; Jin Zhao; Min-Yen Kan; Tat-Seng Chua
Recent research in video retrieval has focused on automated, high-level feature indexing on shots or frames. One important application of such indexing is to support precise video retrieval. We report on extensions of this semantic indexing on news video retrieval. First, we utilize extensive query analysis to relate various high-level features and query terms by matching the textual description and context in a time-dependent manner. Second, we introduce a framework to effectively fuse the relation weights with the detectors’ confidence scores. This results in individual high level features that are weighted on a per-query basis. Tests on the TRECVID 2005 dataset show that the above two enhancements yield significant improvement in performance over a corresponding state-of-the-art video retrieval baseline.
Palabras clave: Automatic Speech Recognition; News Article; Query Term; Expanded Query; Mean Average Precision.
- Session O5: Image and Video Retrieval Metrics | Pp. 143-152
doi: 10.1007/11788034_16
Annotating News Video with Locations
Jun Yang; Alexander G. Hauptmann
The location of video scenes is an important semantic descriptor especially for broadcast news video. In this paper, we propose a learning-based approach to annotate shots of news video with locations extracted from video transcript, based on features from multiple video modalities including syntactic structure of transcript sentences, speaker identity, temporal video structure, and so on. Machine learning algorithms are adopted to combine multi-modal features to solve two sub-problems: (1) whether the location of a video shot is mentioned in the transcript, and if so, (2) among many locations in the transcript, which are correct one(s) for this shot. Experiments on TRECVID dataset demonstrate that our approach achieves approximately 85% accuracy in correctly labeling the location of any shot in news video.
Palabras clave: Support Vector Machine; Noun Phrase; True Location; Candidate Location; Parse Tree.
- Session O6: Machine Tagging | Pp. 153-162
doi: 10.1007/11788034_17
Automatic Person Annotation of Family Photo Album
Ming Zhao; Yong Wei Teo; Siliang Liu; Tat-Seng Chua; Ramesh Jain
Digital photographs are replacing tradition films in our daily life and the quantity is exploding. This stimulates the strong need for efficient management tools, in which the annotation of “who” in each photo is essential. In this paper, we propose an automated method to annotate family photos using evidence from face, body and context information. Face recognition is the first consideration. However, its performance is limited by the uncontrolled condition of family photos. In family album, the same groups of people tend to appear in similar events, in which they tend to wear the same clothes within a short time duration and in nearby places. We could make use of social context information and body information to estimate the probability of the persons’ presence and identify other examples of the same recognized persons. In our approach, we first use social context information to cluster photos into events. Within each event, the body information is clustered, and then combined with face recognition results using a graphical model. Finally, the clusters with high face recognition confidence and context probabilities are identified as belonging to specific person. Experiments on a photo album containing over 1500 photos demonstrate that our approach is effective.
- Session O6: Machine Tagging | Pp. 163-172
doi: 10.1007/11788034_18
Finding People Frequently Appearing in News
Derya Ozkan; Pınar Duygulu
We propose a graph based method to improve the performance of person queries in large news video collections. The method benefits from the multi-modal structure of videos and integrates text and face information. Using the idea that a person appears more frequently when his/her name is mentioned, we first use the speech transcript text to limit our search space for a query name. Then, we construct a similarity graph with nodes corresponding to all of the faces in the search space, and the edges corresponding to similarity of the faces. With the assumption that the images of the query name will be more similar to each other than to other images, the problem is then transformed into finding the densest component in the graph corresponding to the images of the query name. The same graph algorithm is applied for detecting and removing the faces of the anchorpeople in an unsupervised way. The experiments are conducted on 229 news videos provided by NIST for TRECVID 2004. The results show that proposed method outperforms the text only based methods and provides cues for recognition of faces on the large scale.
Palabras clave: Face Recognition; Interest Point; Dense Component; News Video; Video Retrieval.
- Session O6: Machine Tagging | Pp. 173-182
doi: 10.1007/11788034_19
A Novel Framework for Robust Annotation and Retrieval in Video Sequences
Arasanathan Anjulan; Nishan Canagarajah
This paper describes a method for automatic video annotation and scene retrieval based on local region descriptors. A novel framework is proposed for combined video segmentation, content extraction and retrieval. A similarity measure, previously proposed by the authors based on local region features, is used for video segmentation. The local regions are tracked throughout a shot and stable features are extracted. The conventional key frame method is replaced with these stable local features to characterise different shots. Compared to previous video annotation approaches, the proposed method is highly robust to camera and object motions and can withstand severe illumination changes and spatial editing. We apply the proposed framework to shot cut detection and scene retrieval applications and demonstrate superior performance compared to existing methods. Furthermore as segmentation and content extraction are performed within the same step, the overall computational complexity of the system is considerably reduced.
Palabras clave: Video Sequence; Video Segmentation; Video Annotation; Maximally Stable Extremal Region; Region Descriptor.
- Session P1: Poster I | Pp. 183-192
doi: 10.1007/11788034_20
Feature Re-weighting in Content-Based Image Retrieval
Gita Das; Sid Ray; Campbell Wilson
Relevance Feedback (RF) is a useful technique in reducing semantic gap which is a bottleneck in Content-Based Image Retrieval (CBIR). One of the classical approaches to implement RF is feature re-weighting where weights in the similarity measure are modified using feedback samples as returned by the user. The main issues in RF are learning the system parameters from feedback samples and the high-dimensional feature space. We addressed the second problem in our previous work, here, we focus on the first problem. In this paper, we investigated different weight update schemes and compared the retrieval results. We proposed a new feature re-weighting method which we tested on three different image databases of size varying between 2000 and 8365, and having number of categories between 10 and 98. The experimental results with scope values of 20 and 100 demonstrated the superiority of our method in terms of retrieval accuracy.
- Session P1: Poster I | Pp. 193-200