Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Information Retrieval: 27th European Conference on IR Research, ECIR 2005, Santiago de Compostela, Spain, March 21-23, 2005, Proceedings

David E. Losada ; Juan M. Fernández-Luna (eds.)

En conferencia: 27º European Conference on Information Retrieval (ECIR) . Santiago de Compostela, Spain . March 21, 2005 - March 23, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Artificial Intelligence (incl. Robotics); Database Management; Information Systems Applications (incl. Internet); Multimedia Information Systems; Document Preparation and Text Processing

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-25295-5

ISBN electrónico

978-3-540-31865-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

A Probabilistic Logic for Information Retrieval

C. J. ‘Keith’ van Rijsbergen

One of the most important models for IR derives from the representation of documents and queries as vectors in a vector space. I will show how logic emerges from the geometry of such a vector space. As a consequence of looking at such a space in terms of states and observables I will show how an appropriate probability measure can be constructed on this space which may be the basis for a suitable probabilistic logic for information retrieval.

- Keynote Papers | Pp. 1-6

Applications of Web Query Mining

Ricardo Baeza-Yates

Server logs of search engines store traces of queries submitted by users, which include queries themselves along with Web pages selected in their answers. The same is true in Web site logs where queries and later actions are recorded from search engine referrers or from an internal search box. In this paper we present two applications based in analyzing and clustering queries. The first one suggest changes to improve the text and structure of a Web site and the second does relevance ranking boosting and query recommendation in search engines.

- Keynote Papers | Pp. 7-22

BuddyNet: History-Based P2P Search

Yilei Shao; Randolph Wang

Peer-to-peer file sharing has become a very popular Internet application. P2P systems such as Gnutella and Kazaa work well when the number of peers is small. Their performances degraded significantly when the number of peers scales. In order to overcome the scalability problem, numerous research groups have experimented with different approaches. We conduct a novel evaluation study on Kazaa traffic focusing on the interest-based locality. Our analysis shows that strong interest-based locality exist in P2P systems and can be exploited to improve performance. Based on our findings, we propose a history-based P2P search algorithm and topology adaptation mechanism. The resulting system naturally clusters peers with similar interests to each other and greatly improves the efficiency for searching. We test our design through simulations; the results show significant reduction in total system load and large speedup in search efficiency compared to random walk and interest shortcut schemes. In addition, we show that our system is more robust under dynamic situations.

- Peer-to-Peer | Pp. 23-37

A Suite of Testbeds for the Realistic Evaluation of Peer-to-Peer Information Retrieval Systems

Iraklis A. Klampanos; Victor Poznański; Joemon M. Jose; Peter Dickman

Peer-to-peer (PP) networking continuously gains popularity among computing science researchers. The problem of information retrieval (IR) over PP networks is being addressed by researchers attempting to provide valuable insight as well as solutions for its successful deployment. All published studies have, so far, been evaluated by simulation means, using well-known document collections (usually acquired from TREC). Researchers test their systems using divided collections whose documents have been previously distributed to a number of simulated peers. This practice leads to two problems: First, there is little justification in favour of the document distributions used by relevant studies and second, since different studies use different experimental testbeds, there is no common ground for comparing the solutions proposed. In this work, we contribute a number of different document testbeds for evaluating PP IR systems. Each of these has been deduced from TREC’s WT10g collection and corresponds to different potential PP IR application scenarios. We analyse each methodology and testbed with respect to the document distributions achieved as well as to the location of relevant items within each setting. This work marks the beginning of an effort to provide more realistic evaluation environments for PP IR systems as well as to create a common ground for comparisons of existing and future architectures.

- Peer-to-Peer | Pp. 38-51

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Jie Lu; Jamie Callan

Peer-to-peer architectures are a potentially powerful model for developing large-scale networks of text-based digital libraries, but peer-to-peer networks have so far provided very limited support for text-based federated search of digital libraries using relevance-based ranking. This paper addresses the problems of resource representation, resource ranking and selection, and result merging for federated search of text-based digital libraries in hierarchical peer-to-peer networks. Existing approaches to text-based federated search are adapted and new methods are developed for resource representation and resource selection according to the unique characteristics of hierarchical peer-to-peer networks. Experimental results demonstrate that the proposed approaches offer a better combination of accuracy and efficiency than more common alternatives for federated search in peer-to-peer networks.

- Peer-to-Peer | Pp. 52-66

‘Beauty’ of the World Wide Web—Cause, Goal, or Principle

Sándor Dominich; Júlia Góth; Mária Horváth; Tamás Kiezer

It is known that the degree distribution in the World Wide Web (WWW) obeys a power law whose degree exponent exhibits a fairly robust behaviour. The usual method, linear regression, used to construct the power law is not based on any, probably existing, intrinsic property of the WWW which it is assumed to reflect. In the present paper, statistical evidence is given to conjecture that at the heart of this robustness property lies the Golden Section. Applications of this conjecture are also presented and discussed.

- Information Retrieval Models (I) | Pp. 67-80

sPLMap: A Probabilistic Approach to Schema Matching

Henrik Nottelmann; Umberto Straccia

This paper introduces the first formal framework for learning mappings between heterogeneous schemas which is based on logics and probability theory. This task, also called “schema matching”, is a crucial step in integrating heterogeneous collections. As schemas may have different granularities, and as schema attributes do not always match precisely, a general-purpose schema mapping approach requires support for uncertain mappings, and mappings have to be learned automatically. The framework combines different classifiers for finding suitable mapping candidates (together with their weights), and selects that set of mapping rules which is the most likely one. Finally, the framework with different variants has been evaluated on two different data sets.

- Information Retrieval Models (I) | Pp. 81-95

Encoding XML in Vector Spaces

Vinay Kakade; Prabhakar Raghavan

We develop a framework for representing XML documents and queries in vector spaces and build indexes for processing text-centric semi-structured queries that support a proximity measure between XML documents. The idea of using vector spaces for XML retrieval is not new. In this paper we (i) unify prior approaches into a single framework; (ii) develop techniques to eliminate special purpose auxiliary computations (outside the vector space) used previously; (iii) give experimental evidence on benchmark queries that our approach is competitive in its retrieval quality and (iv) as an immediate consequence of the framework, are able to classify and cluster XML documents.

- Information Retrieval Models (I) | Pp. 96-111

Features Combination for Extracting Gene Functions from MEDLINE

Patrick Ruch; Laura Perret; Jacques Savoy

This paper describes and evaluates a summarization system that extracts the gene function textual descriptions (called GeneRIF) based on a MedLine record. Inputs for this task include both a locus (a gene in the LocusLink database), and a pointer to a MedLine record supporting the GeneRIF. In the suggested approach we merge two independent phrase extraction strategies. The first proposed strategy (LASt) uses argumentative, positional and structural features in order to suggest a GeneRIF. The second extraction scheme (LogReg) incorporates statistical properties to select the most appropriate sentence as the GeneRIF. Based on the TREC-2003 genomic collection, the basic extraction strategies are already competitive (52.78% for LASt and 52.28% for LogReg, respectively). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 55%.

- Text Summarization | Pp. 112-126

Filtering for Profile-Biased Multi-document Summarization

Sana Leila Châar; Olivier Ferret; Christian Fluhr

In this article, we present an information filtering method that selects from a set of documents their most significant excerpts in relation to a user profile. This method relies on both structured profiles and a topical analysis of documents. The topical analysis is also used for expanding a profile in relation to a particular document by selecting the terms of the document that are closely linked to those of the profile. This expansion is a way for selecting in a more reliable way excerpts that are linked to profiles but also for selecting excerpts that may bring new and interesting information about their topics. This method was implemented by the REDUIT system, which was successfully evaluated for document filtering and passage extraction.

- Text Summarization | Pp. 127-141