Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in XML Information Retrieval: 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004

Norbert Fuhr ; Mounia Lalmas ; Saadia Malik ; Zoltán Szlávik (eds.)

En conferencia: 3º International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX) . Dagstuhl Castle, Germany . December 6, 2004 - December 8, 2004

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Database Management; Information Systems Applications (incl. Internet)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26166-7

ISBN electrónico

978-3-540-32053-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

DocBase – The INEX Evaluation Experience

Sriram Mohan; Arijit Sengupta

Can a system designed primarily for the purpose of database-type storage and retrieval be used for information-retrieval tasks? This was one of the questions that led us to participate in the INEX 2004 initiative. DocBase, a prototype database system developed initially for SGML, and adapted to work with XML, was used for the purpose of answering the queries. DocBase uses DSQL, an adaptation of SQL to provide a mechanism for querying XML using existing database and indexing technologies. The INEX evaluation experience was encouraging - although it did show the limitations of database query languages for classic information retrieval tasks, it also demonstrated that several interesting results can be obtained by using database query languages for information retrieval, especially for queries involving both content and structure. Our results demonstrate the adaptability and scalability of a database system for processing IR queries.

- Ad Hoc Retrieval | Pp. 261-275

TIJAH at INEX 2004 Modeling Phrases and Relevance Feedback

Vojkan Mihajlović; Georgina Ramírez; Arjen P. de Vries; Djoerd Hiemstra; Henk Ernst Blok

This paper discusses our participation in INEX using the TIJAH XML-IR system. We have enriched the TIJAH system, which follows a standard layered database architecture, with several new features. An extensible conceptual level processing unit has been added to the system. The algebra on the logical level and the implementation on the physical level have been extended to support phrase search and relevance feedback. The conceptual processing unit is capable of rewriting NEXI content-only and content-and-structure queries into the internal form, based on the retrieval model parameter specification, that is either predefined or based on relevance feedback. Relevance feedback parameters are produced based on the data fusion of result element score values and sizes, and relevance assessments. The introduction of new operators supporting phrase search in score region algebra on the logical level is discussed in the paper, as well as their implementation on the physical level using the pre-post numbering scheme. The framework for structural relevance feedback is also explained in the paper. We conclude with a preliminary analysis of the system performance based on INEX 2004 runs.

- Ad Hoc Retrieval and Relevance Feedback | Pp. 276-291

Flexible Retrieval Based on the Vector Space Model

Carolyn J. Crouch; Aniruddha Mahajan; Archana Bellamkonda

This paper describes the current state of our system for structured retrieval. The system itself is based on an extension of the vector space model initially proposed by Fox [5]. The basic functions are performed using the Smart experimental retrieval system [11]. The major advance achieved this year is the inclusion of a flexible capability, which allows the system to retrieve at a desired level of granularity (i.e., at the element level). The quality of the resultant statistics is largely dependent on issues (in particular, ranking) which have yet to be resolved.

- Ad Hoc Retrieval and Relevance Feedback | Pp. 292-302

Relevance Feedback for XML Retrieval

Yosi Mass; Matan Mandelbrod

Relevance Feedback (RF) algorithms were studied in the context of traditional IR systems where the returned unit is an entire document. In this paper we describe a component ranking algorithm for XML retrieval and show how to apply known RF algorithms from traditional IR on top of it to achieve Relevance Feedback for XML. We then give two examples of known RF algorithms and show results of applying them to our XML retrieval system in the INEX’04 RF Track.

- Relevance Feedback | Pp. 303-310

A Universal Model for XML Information Retrieval

Maria Izabel M. Azevedo; Lucas Pantuza Amorim; Nívio Ziviani

This paper presents an approach for extending the vector space model (VSM) to perform XML retrieval. The model is extended to support important aspects of XML structural and semantic information such as element nesting level, matching tag names in the query and the collection and the relation between tag names and content of an element. Potential use of the model for heterogeneous as well as for the unstructured collection is also shown. We compared our model with the standard vector space model and obtained a gain for unstructured and structured queries. For unstructured collections the vector space model effectiveness is preserved.

- Ad Hoc Retrieval and Heterogeneous Document Collection | Pp. 311-321

Cheshire II at INEX ’04: Fusion and Feedback for the Adhoc and Heterogeneous Tracks

Ray R. Larson

This paper describes the retrieval approach used by UC Berkeley in the adhoc and heterogeneous tracks for the 2004 INEX evaluation. As in previous INEX evaluations, the main technique we are testing is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm in conjunction with Boolean constraints for some elements. We also describe some additional experiments, subsequent to INEX that promise further improvements in results.

- Ad Hoc Retrieval and Heterogeneous Document Collection | Pp. 322-336

Using a Relevance Propagation Method for Adhoc and Heterogeneous Tracks at INEX 2004

Karen Sauvagnat; Mohand Boughanem

This paper describes the evaluation of the XFIRM system in INEX 2004 framework. The XFIRM system uses a relevance propagation method to answer queries composed of content conditions and/or structure conditions. Runs were submitted to the ad-hoc (for both CO and VCAS task) and heterogeneous tracks.

- Ad Hoc Retrieval and Heterogeneous Document Collection | Pp. 337-348

Building and Experimenting with a Heterogeneous Collection

Zoltán Szlávik; Thomas Rölleke

Today’s integrated retrieval applications retrieve documents from disparate data sources. Therefore, as part of INEX 2004, we ran a heterogeneous track to explore the experimentation with a heterogeneous collection of documents. We built a collection comprising various sub-collections, re-used topics (queries) from the sub-collections and created new topics, and participants submitted the results of retrieval runs. The assessment proved difficult, since pooling the results and browsing the collection posed new challenges and requested more resources than available. This reports summarises the motivation, activities, results and findings of the track.

- Heterogeneous Document Collection | Pp. 349-357

A Test Platform for the INEX Heterogeneous Track

Serge Abiteboul; Ioana Manolescu; Benjamin Nguyen; Nicoleta Preda

This article presents our work within the INEX 2004 Heterogeneous Track. We focused on taming the structural diversity within the INEX heterogeneous bibliographic corpus.

We demonstrate how semantic models and associated inference techniques can be used to solve the problems raised by the structural diversity within a given XML corpus. The first step automatically extracts a set of from each class of INEX heterogeneous documents. An is then computed, which synthesizes the interesting concepts from the whole corpus. Individual corpora are connected to the unified set of concepts via . This approach is implemented as an application of the platform for peer-to-peer warehousing of XML documents. While this work caters to the structural aspects of XML information retrieval, the extensibility of the system makes it an interesting test platform in which components developed by several INEX participants could be plugged, exploiting the opportunities of peer-to-peer data and service distribution.

- Heterogeneous Document Collection | Pp. 358-371

EXTIRP 2004: Towards Heterogeneity

Miro Lehtonen

The effort around EXTIRP 2004 focused on the heterogeneity of XML document collections. The subcollections of the heterogeneous track (het-track) did not offer us a suitable testbed, but we successfully applied methods independent of any document type to the original INEX test collection. By closing our eyes to the element names defined in the DTD, we created comparable runs and discovered improvement in the results. This was anticipated evidence for our hypothesis that we do not need to know the element names when indexing the collection or when returning full-text answers to the Content-Only type queries. Some problematic areas were also identified. One of them is score combination which enables us to combine elements of any size into one ranked list of results given that we have the relevance scores of the leaf-level elements. However, finding a suitable score combination method remains part of our future work.

- Heterogeneous Document Collection | Pp. 372-381