Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in XML Information Retrieval: 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004

Norbert Fuhr ; Mounia Lalmas ; Saadia Malik ; Zoltán Szlávik (eds.)

En conferencia: 3º International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX) . Dagstuhl Castle, Germany . December 6, 2004 - December 8, 2004

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Database Management; Information Systems Applications (incl. Internet)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26166-7

ISBN electrónico

978-3-540-32053-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Overview of INEX 2004

Saadia Malik; Mounia Lalmas; Norbert Fuhr

The widespread use of the eXtensible Markup Language (XML) in scientific data repositories, digital libraries and on the web, brought about an explosion in the development of XML retrieval systems. These systems exploit the logical structure of documents, which is explicitly represented by the XML markup: instead of whole documents, only components thereof (the so-called XML elements) are retrieved in response to a user query. This means that an XML retrieval system needs not only to find relevant information in the XML documents, but also determine the appropriate level of granularity to return to the user, and this with respect to both content and structural conditions.

- Overview of INEX 2004 | Pp. 1-15

Narrowed Extended XPath I (NEXI)

Andrew Trotman; Börkur Sigurbjörnsson

INEX has through the years provided two types of queries: Content-Only queries (CO) and Content-And-Structure queries (CAS). The CO language has not changed much, but the CAS language has been more problematic. For the CAS queries, the INEX 02 query language proved insufficient for specifying problems for INEX 03. This was addressed by using an extended version of XPath, which, in turn, proved too complex to use correctly. Recently, an INEX working group identified the minimal set of requirements for a suitable query language for future workshops. From this analysis a new IR query language NEXI is introduced for upcoming workshops.

- Methodology | Pp. 16-40

NEXI, Now and Next

Andrew Trotman; Börkur Sigurbjörnsson

NEXI was introduced in INEX 2004 as a query language for specifying structured and unstructured queries on XML documents. A language expressive enough for INEX yet simple enough for users to get right. These goals have been achieved. In particular, the error rate in CAS queries has dropped from 63% in 2003 to 12% in 2004. This drop is shown to be a consequence of not only the language, but the tools introduced with it: the source code for a parser was downloaded by 13 IP addresses, while a web implementation was accessed 635 times from 71 addresses.

Although NEXI is suitable for the track, it is not sufficiently expressive enough for the heterogeneous track, or for question answering. The syntax necessary to extend to these purposes is proposed. This includes weighted terms and weighted paths. The new syntax is strictly an extension so does not invalidate any existing queries.

- Methodology | Pp. 41-53

If INEX Is the Answer, What Is the Question?

Richard A. O’Keefe

The INEX query languages allow the extraction of fragments from selected documents. This power is not much used in INEX queries. The paper suggests reasons why, and considers which kind of document collection this feature might be useful for.

- Methodology | Pp. 54-59

Reliability Tests for the XCG and inex-2002 Metrics

Gabriella Kazai; Mounia Lalmas; Arjen de Vries

In this paper we compare the effectiveness scores and system rankings obtained with the inex-2002 metric, the official measure of INEX 2004, and the XCG metrics proposed in [4] and further developed here. For the comparisons, we use simulated runs as we can easily derive the desired system rankings that a reliable measure should produce based on a predefined set of user preferences. The results indicate that the XCG metrics are better suited for comparing systems for the INEX content-only (CO) task, where systems aim to return the highest scoring elements according to the user preferences reflected in a quantisation function, while also aiming to avoid returning overlapping components.

- Methodology | Pp. 60-72

Component Ranking and Automatic Query Refinement for XML Retrieval

Yosi Mass; Matan Mandelbrod

Queries over XML documents challenge search engines to return the most relevant XML components that satisfy the query concepts. In a previous work we described a component ranking algorithm that performed relatively well in INEX’03. In this paper we show an improvement to that algorithm by introducing a document pivot that compensates for missing terms statistics in small components. Using this new algorithm we achieved improvements of 30%-50% in the Mean Average Precision over the previous algorithm. We then describe a general mechanism to apply known Query Refinement algorithms from traditional IR on top of this component ranking algorithm and demonstrate an example such algorithm that achieved top results in INEX’04.

- Ad Hoc Retrieval | Pp. 73-84

MultiText Experiments for INEX 2004

Charles L. A. Clarke; Philip L. Tilker

This is the first year that the MultiText Group participated in INEX, submitting three runs for the content-only adhoc retrieval task. To generate these runs, we combined our existing experience and tools with the advice and ideas found in recent INEX papers [4,1] to engineer a solid system capable of performing the basic task in a reasonable fashion.

- Ad Hoc Retrieval | Pp. 85-87

Logic-Based XML Information Retrieval for Determining the Best Element to Retrieve

Maryam Karimzadegan; Jafar Habibi; Farhad Oroumchian

This paper presents UOWD-Sharif team’s approach for XML information retrieval. This approach is an extension of PLIR which is an experimental knowledge-based information retrieval system. This system like PLIR utilizes plausible inferences to first infer the relevance of sentences in XML documents and then propagates the relevance to the other textual units in the document tree. Two approaches have been used for propagation of confidence. The first approach labeled “propagate-DS” first propagates the confidence from sentences to upper elements and then combines these evidences by applying Dempster-Shafer theory of evidence to estimate the confidence in that element. The second approach “DS-propagate” first applies the Dempster-Shafer theory of evidence to combine the evidences and then propagates the combined confidence to the parent element. The second approach performs relatively better than the first approach.

- Ad Hoc Retrieval | Pp. 88-99

An Algebra for Structured Queries in Bayesian Networks

Jean-Noël Vittaut; Benjamin Piwowarski; Patrick Gallinari

We present a system based on a Bayesian Network formalism for structured documents retrieval. The parameters of this model are learned from the document collection (documents, queries and assessments). The focus of the paper is on an algebra which has been designed for the interpretation of structured information queries and can be used within our Bayesian Network framework. With this algebra, the representation of the information demand is independent from the structured query language. It allows us to answer both vague and strict structured queries.

- Ad Hoc Retrieval | Pp. 100-112

IR of XML Documents – A Collective Ranking Strategy

Maha Salem; Alan Woodley; Shlomo Geva

Within the area of Information Retrieval (IR) the importance of appropriate ranking of results has increased markedly. The importance is magnified in the case of systems dedicated to XML retrieval, since users of these systems expect the retrieval of highly relevant and highly precise components, instead of the retrieval of entire documents. As an international, coordinated effort to evaluate the performance of Information Retrieval systems, the Initiative for the Evaluation of XML Retrieval (INEX) encourages participating organisation to run queries on their search engines and to submit their result for the annual INEX workshop. In previous INEX workshops the submitted results were manually assessed by participants and the search engines were ranked in terms of performance. This paper presents a Collective Ranking Strategy that outperforms all search engines it is based on. Moreover it provides a system that is trying to facilitate the ranking of participating search engines.

- Ad Hoc Retrieval | Pp. 113-126