Catálogo de publicaciones - libros
Advances in XML Information Retrieval: 3rd International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004
Norbert Fuhr ; Mounia Lalmas ; Saadia Malik ; Zoltán Szlávik (eds.)
En conferencia: 3º International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX) . Dagstuhl Castle, Germany . December 6, 2004 - December 8, 2004
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Information Storage and Retrieval; Database Management; Information Systems Applications (incl. Internet)
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-26166-7
ISBN electrónico
978-3-540-32053-1
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Cobertura temática
Tabla de contenidos
doi: 10.1007/11424550_21
DocBase – The INEX Evaluation Experience
Sriram Mohan; Arijit Sengupta
Can a system designed primarily for the purpose of database-type storage and retrieval be used for information-retrieval tasks? This was one of the questions that led us to participate in the INEX 2004 initiative. DocBase, a prototype database system developed initially for SGML, and adapted to work with XML, was used for the purpose of answering the queries. DocBase uses DSQL, an adaptation of SQL to provide a mechanism for querying XML using existing database and indexing technologies. The INEX evaluation experience was encouraging - although it did show the limitations of database query languages for classic information retrieval tasks, it also demonstrated that several interesting results can be obtained by using database query languages for information retrieval, especially for queries involving both content and structure. Our results demonstrate the adaptability and scalability of a database system for processing IR queries.
- Ad Hoc Retrieval | Pp. 261-275
doi: 10.1007/11424550_22
TIJAH at INEX 2004 Modeling Phrases and Relevance Feedback
Vojkan Mihajlović; Georgina Ramírez; Arjen P. de Vries; Djoerd Hiemstra; Henk Ernst Blok
This paper discusses our participation in INEX using the TIJAH XML-IR system. We have enriched the TIJAH system, which follows a standard layered database architecture, with several new features. An extensible conceptual level processing unit has been added to the system. The algebra on the logical level and the implementation on the physical level have been extended to support phrase search and relevance feedback. The conceptual processing unit is capable of rewriting NEXI content-only and content-and-structure queries into the internal form, based on the retrieval model parameter specification, that is either predefined or based on relevance feedback. Relevance feedback parameters are produced based on the data fusion of result element score values and sizes, and relevance assessments. The introduction of new operators supporting phrase search in score region algebra on the logical level is discussed in the paper, as well as their implementation on the physical level using the pre-post numbering scheme. The framework for structural relevance feedback is also explained in the paper. We conclude with a preliminary analysis of the system performance based on INEX 2004 runs.
- Ad Hoc Retrieval and Relevance Feedback | Pp. 276-291
doi: 10.1007/11424550_23
Flexible Retrieval Based on the Vector Space Model
Carolyn J. Crouch; Aniruddha Mahajan; Archana Bellamkonda
This paper describes the current state of our system for structured retrieval. The system itself is based on an extension of the vector space model initially proposed by Fox [5]. The basic functions are performed using the Smart experimental retrieval system [11]. The major advance achieved this year is the inclusion of a flexible capability, which allows the system to retrieve at a desired level of granularity (i.e., at the element level). The quality of the resultant statistics is largely dependent on issues (in particular, ranking) which have yet to be resolved.
- Ad Hoc Retrieval and Relevance Feedback | Pp. 292-302
doi: 10.1007/11424550_24
Relevance Feedback for XML Retrieval
Yosi Mass; Matan Mandelbrod
Relevance Feedback (RF) algorithms were studied in the context of traditional IR systems where the returned unit is an entire document. In this paper we describe a component ranking algorithm for XML retrieval and show how to apply known RF algorithms from traditional IR on top of it to achieve Relevance Feedback for XML. We then give two examples of known RF algorithms and show results of applying them to our XML retrieval system in the INEX’04 RF Track.
- Relevance Feedback | Pp. 303-310
doi: 10.1007/11424550_25
A Universal Model for XML Information Retrieval
Maria Izabel M. Azevedo; Lucas Pantuza Amorim; Nívio Ziviani
This paper presents an approach for extending the vector space model (VSM) to perform XML retrieval. The model is extended to support important aspects of XML structural and semantic information such as element nesting level, matching tag names in the query and the collection and the relation between tag names and content of an element. Potential use of the model for heterogeneous as well as for the unstructured collection is also shown. We compared our model with the standard vector space model and obtained a gain for unstructured and structured queries. For unstructured collections the vector space model effectiveness is preserved.
- Ad Hoc Retrieval and Heterogeneous Document Collection | Pp. 311-321
doi: 10.1007/11424550_26
Cheshire II at INEX ’04: Fusion and Feedback for the Adhoc and Heterogeneous Tracks
Ray R. Larson
This paper describes the retrieval approach used by UC Berkeley in the adhoc and heterogeneous tracks for the 2004 INEX evaluation. As in previous INEX evaluations, the main technique we are testing is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm in conjunction with Boolean constraints for some elements. We also describe some additional experiments, subsequent to INEX that promise further improvements in results.
- Ad Hoc Retrieval and Heterogeneous Document Collection | Pp. 322-336
doi: 10.1007/11424550_27
Using a Relevance Propagation Method for Adhoc and Heterogeneous Tracks at INEX 2004
Karen Sauvagnat; Mohand Boughanem
This paper describes the evaluation of the XFIRM system in INEX 2004 framework. The XFIRM system uses a relevance propagation method to answer queries composed of content conditions and/or structure conditions. Runs were submitted to the ad-hoc (for both CO and VCAS task) and heterogeneous tracks.
- Ad Hoc Retrieval and Heterogeneous Document Collection | Pp. 337-348
doi: 10.1007/11424550_28
Building and Experimenting with a Heterogeneous Collection
Zoltán Szlávik; Thomas Rölleke
Today’s integrated retrieval applications retrieve documents from disparate data sources. Therefore, as part of INEX 2004, we ran a heterogeneous track to explore the experimentation with a heterogeneous collection of documents. We built a collection comprising various sub-collections, re-used topics (queries) from the sub-collections and created new topics, and participants submitted the results of retrieval runs. The assessment proved difficult, since pooling the results and browsing the collection posed new challenges and requested more resources than available. This reports summarises the motivation, activities, results and findings of the track.
- Heterogeneous Document Collection | Pp. 349-357
doi: 10.1007/11424550_29
A Test Platform for the INEX Heterogeneous Track
Serge Abiteboul; Ioana Manolescu; Benjamin Nguyen; Nicoleta Preda
This article presents our work within the INEX 2004 Heterogeneous Track. We focused on taming the structural diversity within the INEX heterogeneous bibliographic corpus.
We demonstrate how semantic models and associated inference techniques can be used to solve the problems raised by the structural diversity within a given XML corpus. The first step automatically extracts a set of from each class of INEX heterogeneous documents. An is then computed, which synthesizes the interesting concepts from the whole corpus. Individual corpora are connected to the unified set of concepts via . This approach is implemented as an application of the platform for peer-to-peer warehousing of XML documents. While this work caters to the structural aspects of XML information retrieval, the extensibility of the system makes it an interesting test platform in which components developed by several INEX participants could be plugged, exploiting the opportunities of peer-to-peer data and service distribution.
- Heterogeneous Document Collection | Pp. 358-371
doi: 10.1007/11424550_30
EXTIRP 2004: Towards Heterogeneity
Miro Lehtonen
The effort around EXTIRP 2004 focused on the heterogeneity of XML document collections. The subcollections of the heterogeneous track (het-track) did not offer us a suitable testbed, but we successfully applied methods independent of any document type to the original INEX test collection. By closing our eyes to the element names defined in the DTD, we created comparable runs and discovered improvement in the results. This was anticipated evidence for our hypothesis that we do not need to know the element names when indexing the collection or when returning full-text answers to the Content-Only type queries. Some problematic areas were also identified. One of them is score combination which enables us to combine elements of any size into one ranked list of results given that we have the relevance scores of the leaf-level elements. However, finding a suitable score combination method remains part of our future work.
- Heterogeneous Document Collection | Pp. 372-381