Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in XML Information Retrieval: 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004

Norbert Fuhr ; Mounia Lalmas ; Saadia Malik ; Zoltán Szlávik (eds.)

En conferencia: 3º International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX) . Dagstuhl Castle, Germany . December 6, 2004 - December 8, 2004

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Database Management; Information Systems Applications (incl. Internet)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26166-7

ISBN electrónico

978-3-540-32053-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

TRIX 2004 – Struggling with the Overlap

Jaana Kekäläinen; Marko Junkkari; Paavo Arvola; Timo Aalto

In this paper, we present a new XML retrieval system prototype employing structural indices and a * weighting modification. We test retrieval methods that a) emphasize the part in weighting and b) allow overlap in run results to different degrees. It seems that increasing the overlap percentage leads to a better performance. Emphasizing the part enables us to increase exhaustiveness of the returned results.

- Ad Hoc Retrieval | Pp. 127-139

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

Roelof van Zwol; Frans Wiering; Virginia Dignum

Exploiting the structure of a document allows for more powerful information retrieval techniques. In this article a basic approach is discussed for the retrieval of XML document fragments. Based on a vector-space model for text retrieval we aim at investigating various strategies that influence the retrieval performance of an XML-based IR system.

The first extension of the system uses a schema-based approach that assumes that authors tag their text to emphasise on particular pieces of content that are of importance. Based on the schema used by the document collection, the system can easily derive the children of mixed content nodes. Our hypothesis is that those child nodes are more important than other nodes.

The second approach discussed here is based on a horizontal fragmentation of the inverse document frequencies, used by the vector space model. The underlying assumption states that the distribution of terms is related to the semantical structure of the document. However, we observed that the IEEE collection is not a good example of semantic tagging.

The third approach investigates how the performance of the retrieval system can improve for the ’Content Only’ task by using a set of a-priori defined cut-off nodes that define ‘logical’ document fragments that are of interest to a user.

- Ad Hoc Retrieval | Pp. 140-152

Hybrid XML Retrieval Revisited

Jovan Pehcevski; James A. Thom; S. M. M. Tahaghoghi; Anne-Marie Vercoustre

The widespread adoption of XML necessitates structure-aware systems that can effectively retrieve information from XML document collections. This paper reports on the participation of the RMIT group in the INEX 2004 ad hoc track, where we investigate different aspects of the XML retrieval task. Our preliminary analysis of CO and VCAS relevance assessments identifies three XML retrieval scenarios: , and . Further analysis of the relevance assessments under the General retrieval scenario reveals two categories of CO and VCAS topics: and . We design runs that follow a hybrid XML approach and implement two retrieval heuristics with different levels of overlap among the answer elements. For the Original retrieval scenario we show that the overlap CO runs outperform the non-overlap CO runs, and the VCAS run that uses queries with structural constraints and no explicitly specified target element performs best. In both CO and VCAS cases, runs that implement the retrieval heuristic that favours less specific over more specific answer elements produce most effective retrieval. Importantly, we present results which show that, for the General retrieval scenario where users prefer less specific and non-overlapping answers to their queries, the choice of using a plain full-text search engine is a very effective choice for XML retrieval.

- Ad Hoc Retrieval | Pp. 153-167

Analyzing the Properties of XML Fragments Decomposed from the INEX Document Collection

Kenji Hatano; Hiroko Kinutani; Toshiyuki Amagasa; Yasuhiro Mori; Masatoshi Yoshikawa; Shunsuke Uemura

In current keyword-based XML fragment retrieval systems, various granules of XML fragments are returned as retrieval results. The number of the XML fragments is huge, so this adversely affects the index construction time and query processing time of the XML fragment retrieval systems if they cannot extract only the answer XML fragments with certainty. In this paper, we propose a method for determining XML fragments that are appropriate in keyword-based XML fragment retrieval. This would help to improve overall performance of XML fragment retrieval systems. The proposed method utilizes and analyzes statistical information of XML fragments based on a technique of the dynamics of terminology in quantitative linguistics. Moreover, our keyword-based XML fragment retrieval system runs on a relational database system. In this paper, we briefly explain the implementation of our system.

- Ad Hoc Retrieval | Pp. 168-182

A Voting Method for XML Retrieval

Gilles Hubert

This paper describes the retrieval approach proposed by the SIG/EVI group of the IRIT research centre in INEX’2004 evaluation. The approach uses a voting method coupled with some processes to answer content only and content and structure queries. This approach is based on previous works we leaded in the context of automatic text categorization.

- Ad Hoc Retrieval | Pp. 183-195

Mixture Models, Overlap, and Structural Hints in XML Element Retrieval

Börkur Sigurbjörnsson; Jaap Kamps; Maarten de Rijke

We describe the INEX 2004 participation of the Informatics Institute of the University of Amsterdam. We completely revamped our XML retrieval system, now implemented as a mixture language model on top of a standard search engine. To speed up structural reasoning, we indexed the collection’s structure in a separate database. Our main findings are as follows. First, we show that blind feedback improves retrieval effectiveness, but increases overlap.Second, we see that removing overlap from the result set decreases retrieval effectiveness for all metrics except the XML cumulative gain measure.Third, we show that ignoring the structural constraints gives good results if measured in terms of mean average precision; the structural constraints are, however, useful for achieving high initial precision. Finally, we provide a detailed analysis of the characteristics of one of our runs. Based on this analysis we argue that a more explicit definition of the INEX retrieval tasks is needed.

- Ad Hoc Retrieval | Pp. 196-210

GPX – Gardens Point XML Information Retrieval at INEX 2004

Shlomo Geva

Traditional information retrieval (IR) systems respond to user queries with ranked lists of relevant documents. The separation of content and structure in XML documents allows individual XML elements to be selected in isolation. Thus, users expect XML-IR systems to return highly relevant results that are more precise than entire documents. In this paper we describe the implementation of a search engine for XML document collections. The system is keyword based and is built upon an XML inverted file system. We describe the approach that was adopted to meet the requirements of Content Only (CO) and Vague Content and Structure (VCAS) queries in INEX 2004.

- Ad Hoc Retrieval | Pp. 211-223

Hierarchical Language Models for XML Component Retrieval

Paul Ogilvie; Jamie Callan

Experiments using hierarchical language models for XML component retrieval are presented in this paper. The role of context is investigated through incorporation of the parent’s model. We find that context can improve the effectiveness of finding relevant components slightly. Additionally, biasing the results toward long components through the use of component priors improves exhaustivity but harms specificity, so care must be taken to find an appropriate trade-off.

- Ad Hoc Retrieval | Pp. 224-237

Ranked Retrieval of Structured Documents with the S-Term Vector Space Model

Felix Weigel; Klaus U. Schulz; Holger Meuss

This paper shows how the s-term ranking model [1] is extended and combined with index structures and algorithms for structured document retrieval to enhance both the effectiveness of the model and the retrieval efficiency. We explain in detail how previous work on ranked and exact retrieval can be integrated and optimized, and which adaptions are necessary. Our approach is evaluated experimentally at the INEX workshop 2004 [2]. The results are encouraging and give rise to a number of future enhancements.

- Ad Hoc Retrieval | Pp. 238-252

Merging XML Indices

Gianni Amati; Claudio Carpineto; Giovanni Romano

Using separate indices for each element and merging their results has proven to be a feasible way of performing XML element retrieval; however, there has been little work on evaluating how the main method parameters affect the results. We study the effect of using different weighting models for computing rankings at the single index level and using different merging techniques for combining such rankings. Our main findings are that (i) there are large variations on retrieval effectiveness when choosing different techniques for weighting and merging, with performance gains up to 102%, and (ii) although there does not seem to be any best weighting model, some merging schemes perform clearly better than others.

- Ad Hoc Retrieval | Pp. 253-260