Catálogo de publicaciones - libros

Compartir en
redes sociales

Semantic Multimedia: Second International Conference on Semantic and Digital Media Technologies, SAMT 2007, Genoa, Italy, December 5-7, 2007. Proceedings

Bianca Falcidieno ; Michela Spagnuolo ; Yannis Avrithis ; Ioannis Kompatsiaris ; Paul Buitelaar (eds.)

En conferencia: 2º International Conference on Semantic and Digital Media Technologies (SAMT) . Genoa, Italy . December 5, 2007 - December 7, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Popular Computer Science; Multimedia Information Systems; Computer Communication Networks; Information Systems Applications (incl. Internet); Data Mining and Knowledge Discovery; Document Preparation and Text Processing

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-77033-6

ISBN electrónico

978-3-540-77051-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ingeniería eléctrica, electrónica e informática

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-77051-0_1

Improving the Accuracy of Global Feature Fusion Based Image Categorisation

Ville Viitaniemi; Jorma Laaksonen

In this paper we consider the task of categorising images of the Corel collection into semantic classes. In our earlier work, we demonstrated that state-of-the-art accuracy of supervised categorising of these images could be improved significantly by fusion of a large number of global image features. In this work, we preserve the general framework, but improve the components of the system: we modify the set of image features to include interest point histogram features, perform elementary feature classification with support vector machines (SVM) instead of self-organising map (SOM) based classifiers, and fuse the classification results with either an additive, multiplicative or SVM-based technique. As the main result of this paper, we are able to achieve a significant improvement of image categorisation accuracy by applying these generic state-of-the-art image content analysis techniques.

- Knowledge Based Content Processing | Pp. 1-14

doi: 10.1007/978-3-540-77051-0_2

Stopping Region-Based Image Segmentation at Meaningful Partitions

Tomasz Adamek; Noel E. O’Connor

This paper proposes a new stopping criterion for automatic image segmentation based on region merging. The criterion is dependent on image content itself and when combined with the recently proposed approaches to segmentation can produce results aligned with the most salient semantic regions/objects present in the scene across heterogeneous image collections. The method identifies a single iteration from the merging process as the stopping point, based on the evolution of an accumulated merging cost during the complete merging process. The approach is compared to three commonly used stopping criteria: (i) required number of regions, (ii) value of the least link cost, and (iii) Peak Signal to Noise Ratio (PSNR). For comparison, the stopping criterion is also evaluated for a segmentation approach that does not use syntactic extensions. All experiments use a manually generated segmentation ground truth and spatial accuracy measures. Results show that the proposed stopping criterion improves segmentation performance towards reflecting real-world scene content when integrated into a syntactic segmentation framework.

- Knowledge Based Content Processing | Pp. 15-27

doi: 10.1007/978-3-540-77051-0_3

Hierarchical Long-Term Learning for Automatic Image Annotation

Donn Morrison; Stéphane Marchand-Maillet; Eric Bruno

This paper introduces a hierarchical process for propagating image annotations throughout a partially labelled database. Long-term learning, where users’ query and browsing patterns are retained over multiple sessions, is used to guide the propagation of keywords onto image regions based on low-level feature distances. We demonstrate how singular value decomposition (SVD), normally used with latent semantic analysis (LSA), can be used to reconstruct a noisy image-session matrix and associate images with query concepts. These associations facilitate hierarchical filtering where image regions are matched based on shared parent concepts. A simple distance-based ranking algorithm is then used to determine keywords associated with regions.

- Semantic Multimedia Annotation I | Pp. 28-40

doi: 10.1007/978-3-540-77051-0_4

LSA-Based Automatic Acquisition of Semantic Image Descriptions

Roberto Basili; Riccardo Petitti; Dario Saracino

Web multimedia documents are characterized by visual and linguistic information expressed by structured pages of images and texts. The suitable combinations able to generalize semantic aspects of the overall multimedia information clearly depend on applications. In this paper, an unsupervised image classification technique combining features from different media levels is proposed. In particular linguistic descriptions derived through Information Extraction from Web pages are here integrated with visual features by means of Latent Semantic Analysis. Although the higher expressivity increases the complexity of the learning process, the dimensionality reduction implied by LSA makes it largely applicable. The evaluation over an image classification task confirms that the proposed model outperforms other methods acting on the individual levels. The resulting method is cost-effective and can be easily applied to semi-automatic image semantic labeling tasks as foreseen in collaborative annotation scenarios.

- Semantic Multimedia Annotation I | Pp. 41-55

doi: 10.1007/978-3-540-77051-0_5

Ontology-Driven Semantic Video Analysis Using Visual Information Objects

Georgios Th. Papadopoulos; Vasileios Mezaris; Ioannis Kompatsiaris; Michael G. Strintzis

In this paper, an ontology-driven approach for the semantic analysis of video is proposed. This approach builds on an ontology infrastructure and in particular a multimedia ontology that is based on the notions of Visual Information Object (VIO) and Multimedia Information Object (MMIO). The latter constitute extensions of the Information Object (IO) design pattern, previously proposed for refining and extending the DOLCE core ontology. This multimedia ontology, along with the more domain-specific parts of the developed knowledge infrastructure, supports the analysis of video material, models the content layer of video, and defines generic as well as domain-specific concepts whose detection is important for the analysis and description of video of the specified domain. The signal-level video processing that is necessary for linking the developed ontology infrastructure with the signal domain includes the combined use of a temporal and a spatial segmentation algorithm, a layered structure of Support Vector Machines (SVMs)-based classifiers and a classifier fusion mechanism. A Genetic Algorithm (GA) is introduced for optimizing the performed information fusion step. These processing methods support the decomposition of visual information, as specified by the multimedia ontology, and the detection of the defined domain-specific concepts that each piece of video signal, treated as a VIO, is related to. Experimental results in the domain of disaster news video demonstrate the efficiency of the proposed approach.

- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 56-69

doi: 10.1007/978-3-540-77051-0_6

On the Selection of MPEG-7 Visual Descriptors and Their Level of Detail for Nature Disaster Video Sequences Classification

Javier Molina; Evaggelos Spyrou; Natasa Sofou; José M. Martínez

In this paper, we present a study on the discrimination capabilities of colour, texture and shape MPEG-7 [1] visual descriptors, within the context of video sequences. The target is to facilitate the recognition of certain visual cues which would then allow the classification of natural disaster-related concepts. Low-level visual features are extracted using the MPEG-7 “eXperimentation Module” (XM) [2]. The extraction times associated to the levels of detail of the descriptors are measured. The pattern sets obtained as combination of significant levels of detail of different descriptors are the input to a Support Vector Machine (SVM), resulting on the classification accuracies. Preliminary results indicate that this approach could be useful for the implementation of real-time spatial regions classifiers.

- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 70-73

doi: 10.1007/978-3-540-77051-0_7

A Region Thesaurus Approach for High-Level Concept Detection in the Natural Disaster Domain

Evaggelos Spyrou; Yannis Avrithis

This paper presents an approach on high-level feature detection using a region thesaurus. MPEG-7 features are locally extracted from segmented regions and for a large set of images. A hierarchical clustering approach is applied and a relatively small number of region types is selected. This set of region types defines the region thesaurus. Using this thesaurus, low-level features are mapped to high-level concepts as model vectors. This representation is then used to train support vector machine-based feature detectors. As a next step, latent semantic analysis is applied on the model vectors, to further improve the analysis performance. High-level concepts detected derive from the natural disaster domain.

- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 74-77

doi: 10.1007/978-3-540-77051-0_8

Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

Marijn Huijbregts; Roeland Ordelman; Franciska de Jong

This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life archive of news-related genres. Performance figures for this type of content are compared to figures for broadcast news test data.

- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 78-90

doi: 10.1007/978-3-540-77051-0_9

A Model-Based Iterative Method for Caption Extraction in Compressed MPEG Video

Daniel Márquez; Jesús Bescós

We here describe a method for caption extraction that totally works in the MPEG compressed domain. As opposed to other compressed domain methods; it does not need to refine their results in the pixel domain. It consists of two phases: first, a selection of candidate frames with captions, based on a rigorous statistical design of an AC coefficients mask; second, an extraction of caption boxes from the pre-selected set of candidate frames. Caption extraction relies on a model-based approach to obtaining the caption mask, robust enough to avoid the use of any subsequent refinement.

- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 91-94

doi: 10.1007/978-3-540-77051-0_10

Automatic Recommendations for Machine-Assisted Multimedia Annotation: A Knowledge-Mining Approach

Mónica Díez; Paulo Villegas

Recommender systems apply knowledge discovery techniques to help in finding associated information. In this paper, we investigate the use of association rule mining as an underlying technology for a recommender system aimed at improving the annotation process of multimedia news documents. The accuracy of these systems is very sensitive to the number of already annotated news items (the ”cold-start”- problem); ontology-based semantic relations are being used to alleviate this situation.

- Domain-Restricted Generation of Semantic Metadata from Multimodal Sources | Pp. 95-98