Catálogo de publicaciones - libros
Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers
Carol Peters ; Fredric C. Gey ; Julio Gonzalo ; Henning Müller ; Gareth J. F. Jones ; Michael Kluck ; Bernardo Magnini ; Maarten de Rijke (eds.)
En conferencia: 6º Workshop of the Cross-Language Evaluation Forum for European Languages (CLEF) . Vienna, Austria . September 21, 2005 - September 23, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Information Storage and Retrieval; Artificial Intelligence (incl. Robotics); Information Systems Applications (incl. Internet); Language Translation and Linguistics
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-45697-1
ISBN electrónico
978-3-540-45700-8
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11878773_101
GeoCLEF: The CLEF 2005 Cross-Language Geographic Information Retrieval Track Overview
Fredric Gey; Ray Larson; Mark Sanderson; Hideo Joho; Paul Clough; Vivien Petras
GeoCLEF was a new pilot track in CLEF 2005. GeoCLEF was to test and evaluate cross-language geographic information retrieval (GIR) of text. Geographic information retrieval is retrieval oriented toward the geographic specification in the description of the search topic and returns documents which satisfy this geographic information need. For GeoCLEF 2005, twenty-five search topics were defined for searching against the English and German ad-hoc document collections of CLEF. Topic languages were English, German, Portuguese and Spanish. Eleven groups submitted runs and about 25,000 documents (half English and half German) in the pooled runs were judged by the organizers. The groups used a variety of approaches, including geographic bounding boxes and external knowledge bases (geographic thesauri and ontologies and gazetteers). The results were encouraging but showed that additional work needs to be done to refine the task for GeoCLEF in 2006.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 908-919
doi: 10.1007/11878773_102
MIRACLE at GeoCLEF 2005: First Experiments in Geographical IR
Sara Lana-Serrano; José M. Goñi-Menoyo; José C. González-Cristóbal
This paper presents the 2005 MIRACLE team’s approach to Cross-Language Geographical Retrieval (GeoCLEF). The main goal of the GeoCLEF participation of the MIRACLE team was to test the effect that geographical information retrieval techniques have on information retrieval. The baseline approach is based on the development of named entity recognition and geospatial information retrieval tools and on its combination with linguistic techniques to carry out indexing and retrieval tasks.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 920-923
doi: 10.1007/11878773_103
University of Alicante at GeoCLEF 2005
Óscar Ferrández; Zornitsa Kozareva; Antonio Toral; Elisa Noguera; Andrés Montoyo; Rafael Muñoz; Fernando Llopis
For our participation in GeoCLEF 2005 we have developed a system made up of three modules. One of them is an Information Retrieval module and the others are Named Entity Recognition modules based on machine learning and based on knowledge. We have carried out several runs with different combinations of these modules for resolving the proposed tasks. The system scored second position for the tasks against German collections and third position for the tasks against English collections.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 924-927
doi: 10.1007/11878773_104
Evaluating Geographic Information Retrieval
András Kornai
The processing steps required for geographic information retrieval include many steps that are common to all forms of information retrieval, e.g. stopword filtering, stemming, vocabulary enrichment, understanding Booleans, and fluff removal. Only a few steps, in particular the detection of geographic entities and the assignment of bounding boxes to these, are specific to geographic IR. The paper presents the results of experiments designed to evaluate the geography-specificity of the GeoCLEF 2005 task, and suggests some methods to increase the sensitivity of the evaluation.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 928-938
doi: 10.1007/11878773_105
Using the WordNet Ontology in the GeoCLEF Geographical Information Retrieval Task
Davide Buscaldi; Paolo Rosso; Emilio Sanchis Arnal
This paper describes how we managed to use the WordNet ontology for the GeoCLEF 2005 English monolingual task. Both a query expansion method, based on the expansion of geographical terms by means of WordNet synonyms and meronyms, and a method based on the expansion of index terms, which exploits WordNet synonyms and holonyms. The obtained results show that the query expansion method was not suitable for the GeoCLEF track, while WordNet could be used in a more effective way during the indexing phase.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 939-946
doi: 10.1007/11878773_106
The GeoTALP-IR System at GeoCLEF 2005: Experiments Using a QA-Based IR System, Linguistic Analysis, and a Geographical Thesaurus
Daniel Ferrés; Alicia Ageno; Horacio Rodríguez
This paper describes GeoTALP-IR system, a Geographical Information Retrieval (GIR) system. The system is described and evaluated in the context of our participation in the CLEF 2005 GeoCLEF Monolingual English task.
The GIR system is based on and uses a modified version of the Passage Retrieval module of the TALP Question Answering (QA) system presented at CLEF 2004 and TREC 2004 QA evaluation tasks. We designed a Keyword Selection algorithm based on a Linguistic and Geographical Analysis of the topics. A Geographical Thesaurus (GT) has been built using a set of publicly available Geographical Gazetteers and a Geographical Ontology. Our experiments show that the use of a Geographical Thesaurus for Geographical Indexing and Retrieval has improved the performance of our GIR system.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 947-955
doi: 10.1007/11878773_107
CSUSM Experiments in GeoCLEF2005: Monolingual and Bilingual Tasks
Rocio Guillén
This paper presents the results of our initial experiments in the monolingual English task and the Bilingual Spanish → English task. We used the Terrier Information Retrieval Platform to run experiments for both tasks using the Inverse Document Frequency model with Laplace after-effect and normalization 2. Additional experiments were run with Indri, a retrieval engine that combines inference networks with language modelling. For the bilingual task we developed a component to first translate the topics from Spanish into English. No spatial analysis was carried out for any of the tasks. One of our goals is to have a baseline to compare further experiments with term translation of georeferences and spatial analysis. Another goal is to use ontologies for Integrated Geographic Information Systems adapted to the IR task. Our initial results show that the geographic information as provided does not improve significantly retrieval performance. We included the geographical terms appearing in all the fields. Duplication of terms might have decreased gain of information and affected the ranking.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 956-962
doi: 10.1007/11878773_108
Berkeley at GeoCLEF: Logistic Regression and Fusion for Geographic Information Retrieval
Ray R. Larson; Fredric C. Gey; Vivien Petras
In this paper we will describe the Berkeley (groups 1 and 2 combined) submissions and approaches to the GeoCLEF task for CLEF 2005. The two Berkeley groups used different systems and approaches for GeoCLEF with some common themes. For Berkeley group 1 (Larson) the main technique used was fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. The Berkeley group 2 (Gey and Petras) employed tested CLIR methods from previous CLEF evaluations using Logistic Regression with Blind Feedback. Both groups used multiple translations of queries in for cross-language searching, and the primary geographically-based approaches taken by both involved query expansion with additional place names. The Berkeley1 group used GIR indexing techniques to georeference proper nouns in the text using a gazetteer derived from the World Gazetteer (with both English and German names for each place), and automatically expanded place names in topics for regions or countries in the queries by the names of the countries or cities in those regions or countries. The Berkeley2 group used manual expansion of queries, adding additional place names.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 963-976
doi: 10.1007/11878773_109
Using Semantic Networks for Geographic Information Retrieval
Johannes Leveling; Sven Hartrumpf; Dirk Veiel
This paper describes our work for the participation at the GeoCLEF task of CLEF 2005. We employ multilayered extended semantic networks for the representation of background knowledge, queries, and documents for geographic information retrieval (GIR). In our approach, geographic concepts from the query network are expanded with concepts which are semantically connected via topological, directional, and proximity relations. We started with an existing geographic knowledge base represented as a semantic network and expanded it with concepts automatically extracted from the GEOnet Names Server.
Several experiments for GIR on German documents have been performed: a baseline corresponding to a traditional information retrieval approach; a variant expanding thematic, temporal, and geographic descriptors from the semantic network representation of the query; and an adaptation of a question answering (QA) algorithm based on semantic networks. The second experiment is based on a representation of the natural language description of a topic as a semantic network, which is achieved by a deep linguistic analysis. The semantic network is transformed into an intermediate representation of a database query explicitly representing thematic, temporal, and local restrictions. This experiment showed the best performance with respect to mean average precision: 10.53% using the topic title and description. The third experiment, adapting a QA algorithm, uses a modified version of the QA system InSicht. The system matches deep semantic representations of queries or their equivalent or similar variants to semantic networks for document sentences.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 977-986
doi: 10.1007/11878773_110
Experiments with Geo-Filtering Predicates for IR
Jochen L. Leidner
This paper describes a set of experiments for monolingual English retrieval at 2005, evaluating a technique for spatial retrieval based on named entity tagging, toponym resolution, and re-ranking by means of geographic filtering. To this end, a series of systematic experiments in the Vector Space paradigm are presented. Plain bag-of-words versus phrasal retrieval and the potential of meronymy query expansion as a recall-enhancing device are investigated, and three alternative geo-spatial filtering techniques based on spatial clipping are compared and evaluated on 25 monolingual English queries. Preliminary results show that always choosing toponym referents based on a simple “maximum population” heuristic to approximate the salience of a referent fails to outperform TF*IDF baselines with the 2005 dataset when combined with three geo-filtering predicates. Conservative geo-filtering outperforms more aggressive predicates. The evidence further seems to suggest that query expansion with WordNet meronyms is not effective in combination with the method described. A post-hoc analysis indicates that responsible factors for the low performance include sparseness of available population data, gaps in the gazetteer that associates Minimum Bounding Rectangles with geo-terms in the query, and the composition of the 2005 dataset itself.
- Part VIII. Cross-Language Geographical Retrieval (GeoCLEF) | Pp. 987-996