Catálogo de publicaciones - libros
Semantic Multimedia: Second International Conference on Semantic and Digital Media Technologies, SAMT 2007, Genoa, Italy, December 5-7, 2007. Proceedings
Bianca Falcidieno ; Michela Spagnuolo ; Yannis Avrithis ; Ioannis Kompatsiaris ; Paul Buitelaar (eds.)
En conferencia: 2º International Conference on Semantic and Digital Media Technologies (SAMT) . Genoa, Italy . December 5, 2007 - December 7, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Popular Computer Science; Multimedia Information Systems; Computer Communication Networks; Information Systems Applications (incl. Internet); Data Mining and Knowledge Discovery; Document Preparation and Text Processing
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-77033-6
ISBN electrónico
978-3-540-77051-0
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
Improving the Accuracy of Global Feature Fusion Based Image Categorisation
Ville Viitaniemi; Jorma Laaksonen
In this paper we consider the task of categorising images of the Corel collection into semantic classes. In our earlier work, we demonstrated that state-of-the-art accuracy of supervised categorising of these images could be improved significantly by fusion of a large number of global image features. In this work, we preserve the general framework, but improve the components of the system: we modify the set of image features to include interest point histogram features, perform elementary feature classification with support vector machines (SVM) instead of self-organising map (SOM) based classifiers, and fuse the classification results with either an additive, multiplicative or SVM-based technique. As the main result of this paper, we are able to achieve a significant improvement of image categorisation accuracy by applying these generic state-of-the-art image content analysis techniques.
- Knowledge Based Content Processing | Pp. 1-14
Stopping Region-Based Image Segmentation at Meaningful Partitions
Tomasz Adamek; Noel E. O’Connor
This paper proposes a new stopping criterion for automatic image segmentation based on region merging. The criterion is dependent on image content itself and when combined with the recently proposed approaches to segmentation can produce results aligned with the most salient semantic regions/objects present in the scene across heterogeneous image collections. The method identifies a single iteration from the merging process as the stopping point, based on the evolution of an accumulated merging cost during the complete merging process. The approach is compared to three commonly used stopping criteria: (i) required number of regions, (ii) value of the least link cost, and (iii) Peak Signal to Noise Ratio (PSNR). For comparison, the stopping criterion is also evaluated for a segmentation approach that does not use syntactic extensions. All experiments use a manually generated segmentation ground truth and spatial accuracy measures. Results show that the proposed stopping criterion improves segmentation performance towards reflecting real-world scene content when integrated into a syntactic segmentation framework.
- Knowledge Based Content Processing | Pp. 15-27
Hierarchical Long-Term Learning for Automatic Image Annotation
Donn Morrison; Stéphane Marchand-Maillet; Eric Bruno
This paper introduces a hierarchical process for propagating image annotations throughout a partially labelled database. Long-term learning, where users’ query and browsing patterns are retained over multiple sessions, is used to guide the propagation of keywords onto image regions based on low-level feature distances. We demonstrate how singular value decomposition (SVD), normally used with latent semantic analysis (LSA), can be used to reconstruct a noisy image-session matrix and associate images with query concepts. These associations facilitate hierarchical filtering where image regions are matched based on shared parent concepts. A simple distance-based ranking algorithm is then used to determine keywords associated with regions.
- Semantic Multimedia Annotation I | Pp. 28-40
LSA-Based Automatic Acquisition of Semantic Image Descriptions
Roberto Basili; Riccardo Petitti; Dario Saracino
Web multimedia documents are characterized by visual and linguistic information expressed by structured pages of images and texts. The suitable combinations able to generalize semantic aspects of the overall multimedia information clearly depend on applications. In this paper, an unsupervised image classification technique combining features from different media levels is proposed. In particular linguistic descriptions derived through Information Extraction from Web pages are here integrated with visual features by means of Latent Semantic Analysis. Although the higher expressivity increases the complexity of the learning process, the dimensionality reduction implied by LSA makes it largely applicable. The evaluation over an image classification task confirms that the proposed model outperforms other methods acting on the individual levels. The resulting method is cost-effective and can be easily applied to semi-automatic image semantic labeling tasks as foreseen in collaborative annotation scenarios.
- Semantic Multimedia Annotation I | Pp. 41-55
Ontology-Driven Semantic Video Analysis Using Visual Information Objects
Georgios Th. Papadopoulos; Vasileios Mezaris; Ioannis Kompatsiaris; Michael G. Strintzis
In this paper, an ontology-driven approach for the semantic analysis of video is proposed. This approach builds on an ontology infrastructure and in particular a multimedia ontology that is based on the notions of Visual Information Object (VIO) and Multimedia Information Object (MMIO). The latter constitute extensions of the Information Object (IO) design pattern, previously proposed for refining and extending the DOLCE core ontology. This multimedia ontology, along with the more domain-specific parts of the developed knowledge infrastructure, supports the analysis of video material, models the content layer of video, and defines generic as well as domain-specific concepts whose detection is important for the analysis and description of video of the specified domain. The signal-level video processing that is necessary for linking the developed ontology infrastructure with the signal domain includes the combined use of a temporal and a spatial segmentation algorithm, a layered structure of Support Vector Machines (SVMs)-based classifiers and a classifier fusion mechanism. A Genetic Algorithm (GA) is introduced for optimizing the performed information fusion step. These processing methods support the decomposition of visual information, as specified by the multimedia ontology, and the detection of the defined domain-specific concepts that each piece of video signal, treated as a VIO, is related to. Experimental results in the domain of disaster news video demonstrate the efficiency of the proposed approach.
- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 56-69
On the Selection of MPEG-7 Visual Descriptors and Their Level of Detail for Nature Disaster Video Sequences Classification
Javier Molina; Evaggelos Spyrou; Natasa Sofou; José M. Martínez
In this paper, we present a study on the discrimination capabilities of colour, texture and shape MPEG-7 [1] visual descriptors, within the context of video sequences. The target is to facilitate the recognition of certain visual cues which would then allow the classification of natural disaster-related concepts. Low-level visual features are extracted using the MPEG-7 “eXperimentation Module” (XM) [2]. The extraction times associated to the levels of detail of the descriptors are measured. The pattern sets obtained as combination of significant levels of detail of different descriptors are the input to a Support Vector Machine (SVM), resulting on the classification accuracies. Preliminary results indicate that this approach could be useful for the implementation of real-time spatial regions classifiers.
- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 70-73
A Region Thesaurus Approach for High-Level Concept Detection in the Natural Disaster Domain
Evaggelos Spyrou; Yannis Avrithis
This paper presents an approach on high-level feature detection using a region thesaurus. MPEG-7 features are locally extracted from segmented regions and for a large set of images. A hierarchical clustering approach is applied and a relatively small number of region types is selected. This set of region types defines the region thesaurus. Using this thesaurus, low-level features are mapped to high-level concepts as model vectors. This representation is then used to train support vector machine-based feature detectors. As a next step, latent semantic analysis is applied on the model vectors, to further improve the analysis performance. High-level concepts detected derive from the natural disaster domain.
- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 74-77
Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition
Marijn Huijbregts; Roeland Ordelman; Franciska de Jong
This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life archive of news-related genres. Performance figures for this type of content are compared to figures for broadcast news test data.
- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 78-90
A Model-Based Iterative Method for Caption Extraction in Compressed MPEG Video
Daniel Márquez; Jesús Bescós
We here describe a method for caption extraction that totally works in the MPEG compressed domain. As opposed to other compressed domain methods; it does not need to refine their results in the pixel domain. It consists of two phases: first, a selection of candidate frames with captions, based on a rigorous statistical design of an AC coefficients mask; second, an extraction of caption boxes from the pre-selected set of candidate frames. Caption extraction relies on a model-based approach to obtaining the caption mask, robust enough to avoid the use of any subsequent refinement.
- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 91-94
Automatic Recommendations for Machine-Assisted Multimedia Annotation: A Knowledge-Mining Approach
Mónica Díez; Paulo Villegas
Recommender systems apply knowledge discovery techniques to help in finding associated information. In this paper, we investigate the use of association rule mining as an underlying technology for a recommender system aimed at improving the annotation process of multimedia news documents. The accuracy of these systems is very sensitive to the number of already annotated news items (the ”cold-start”- problem); ontology-based semantic relations are being used to alleviate this situation.
- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 95-98