Catálogo de publicaciones - libros

Compartir en
redes sociales


Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers

Carol Peters ; Fredric C. Gey ; Julio Gonzalo ; Henning Müller ; Gareth J. F. Jones ; Michael Kluck ; Bernardo Magnini ; Maarten de Rijke (eds.)

En conferencia: 6º Workshop of the Cross-Language Evaluation Forum for European Languages (CLEF) . Vienna, Austria . September 21, 2005 - September 23, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Artificial Intelligence (incl. Robotics); Information Systems Applications (incl. Internet); Language Translation and Linguistics

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-45697-1

ISBN electrónico

978-3-540-45700-8

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Web Retrieval Experiments with the EuroGOV Corpus at the University of Hildesheim

Niels Jensen; René Hackl; Thomas Mandl; Robert Strötgen

This paper describes web retrieval experiments with the EuroGOV corpus carried out at the University of Hildesheim. For both the multi-lingual and the mixed mono-lingual task, several indexing strategies were tested, all of them based on one mixed language index. After stopword removal, word and n-gram based indexes were developed based on the full document content, part of the content and the document title. Boosting the original topic language with a higher weight in the query and punishing the English translation led to better results for most settings. A title only run gave the best results during post submission runs for the multi-lingual task.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 837-845

Danish and Greek Web Search Experiments with Hummingbird SearchServer at CLEF 2005

Stephen Tomlinson

Hummingbird participated in the WebCLEF mixed monolingual retrieval task of the Cross-Language Evaluation Forum (CLEF) 2005. In this task, the system was given 547 known-item queries from 11 languages (134 Spanish, 121 English, 59 Dutch, 59 Portuguese, 57 German, 35 Hungarian, 30 Danish, 30 Russian, 16 Greek, 5 Icelandic and 1 French). The goal was to find the desired page in the 82GB EuroGOV collection (3.4 million pages crawled from government sites of 27 European domains). Our experiments found that stopword processing was more important than anticipated, perhaps because words common in one language may tend to be overweighted by inverse document frequency in a mixed language collection. Extra weight on the document title helped significantly, and extra weight on less deep urls significantly helped home page queries. Stemming was of neutral impact on average, but it made a substantial difference for some individual queries. We analyze several Danish and Greek queries in detail.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 846-855

Combination Methods for Crosslingual Web Retrieval

Jaap Kamps; Maarten de Rijke; Börkur Sigurbjörnsson

We investigate a range of crosslingual web retrieval tasks using the test suite of the CLEF 2005 WebCLEF track, which features a stream of known-item topics in various languages. Our main findings are: (i) straightforward indexing and retrieval is effective for mixed monolingual web retrieval; (ii) standard machine translation methods are effective for bilingual web retrieval; but (iii) standard combination methods are ineffective for multilingual web retrieval; we analyze the failure and suggest an alternative Z-score normalization that leads to effective multilingual retrieval results.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 856-864

University of Alicante at the CLEF 2005 WebCLEF Track

Trinitario Martínez; Elisa Noguera; Rafael Muñoz; Fernando Llopis

This paper presents the first experiment done for the CLEF2005 WebCLEF Track. In the present work, we have focused our main efforts in the Spanish part of the Mixed Monolingual task, but we have also participated in several other languages tasks and in the Bilingual English-Spanish task. A passage-based IR system is applied in the retrieval phase. Also a language identifier has been created in order to build a fully automatic system without the need of knowing the topic language.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 865-868

MIRACLE at WebCLEF 2005: Combining Web Specific and Linguistic Information

Ángel Martínez-González; José Luis Martínez-Fernández; César de Pablo-Sánchez; Julio Villena-Román

This paper describes MIRACLE approach to WebCLEF. A set of independent indexes was constructed for each top level domain of the EuroGOV collection. Each index contains information extracted from the document, like URL, title, keywords, detected named entities or HTML headers. These indexes are queried to obtain partial document rankings, which are combined with various relative weights to test the value of each index. The final aim is to identify which index (or combination of them) is more relevant for a retrieval task, avoiding the construction of a full-text index.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 869-872

BUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF

David Pinto; Héctor Jiménez-Salazar; Paolo Rosso; Emilio Sanchis

In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system at the bilingual “English to Spanish” task. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the performance of our system. We evaluate different percentages of reduction over a subset of EuroGOV, in order to determine the best one. We observed that after reducing the 82.55% of the corpus, a Mean Reciprocal Rank of 0.0844 was obtained, compared with 0.0465 of such evaluation with full documents.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 873-879

Web Page Retrieval by Combining Evidence

Carlos G. Figuerola; José L. Alonso Berrocal; Angel F. Zazo; Emilio Rodríguez Vázquez de Aldana

The participation of the REINA Research Group in WebCLEF 2005 focused in the monolingual mixed task. Queries or topics are of two types: and . For both, we first perform a search by thematic contents; for the same query, we do a search in several elements of information from every page (title, some meta tags, anchor text) and then we combine the results. For queries about , we try to detect using a method based in some keywords and their patterns of use. After, a re-rank of the results of the thematic contents retrieval is performed, based on Page-Rank and Centrality coeficients.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 880-887

UNED at WebCLEF 2005

Javier Artiles; Víctor Peinado; Anselmo Peñas; Julio Gonzalo; Felisa Verdejo

This paper describes the experiments submitted by UNED’s NLP Group to the WebCLEF 2005 track in the bilingual English to Spanish task. We present two different runs: i) a simply search over the whole content of the documents; ii) a series of restricted searches over given fields according to their descriptiveness. Our newly developed approach for searching ordered fields performs 80% better than the baseline. We also describe a non-supervised approach to translate out-of-vocabulary words.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 888-891

Using the Web Information Structure for Retrieving Web Pages

Mirna Adriani; Rama Pandugita

We present a report on our participation in the mixed monolingual web task of the 2005 Cross-Language Evaluation Forum (CLEF). We compared the result of web page retrieval based on the page content, page title, and a combination of page content and page title. The result shows that using the combination of page title resulted in the best retrieval performance compared to using only page content or page title. Taking into account the number of links referring to a web page and the depth of the directory path in its URL did not result in any significant improvement to the retrieval performance.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 892-897

University of Glasgow at WebCLEF 2005: Experiments in Per-Field Normalisation and Language Specific Stemming

Craig Macdonald; Vassilis Plachouras; Ben He; Christina Lioma; Iadh Ounis

We participated in the WebCLEF 2005 monolingual task. In this task, a search system aims to retrieve relevant documents from a multilingual corpus of Web documents from Web sites of European governments. Both the documents and the queries are written in a wide range of European languages. A challenge in this setting is to detect the language of documents and topics, and to process them appropriately. We develop a language specific technique for applying the correct stemming approach, as well as for removing the correct stopwords from the queries. We represent documents using three fields, namely content, title, and anchor text of incoming hyperlinks. We use a technique called per-field normalisation, which extends the Divergence From Randomness (DFR) framework, to normalise the term frequencies, and to combine them across the three fields. We also employ the length of the URL path of Web documents. The ranking is based on combinations of both the language specific stemming, if applied, and the per-field normalisation. We use our Terrier platform for all our experiments. The overall performance of our techniques is outstanding, achieving the overall top four performing runs, as well as the top performing run without metadata in the monolingual task. The best run only uses per-field normalisation, without applying stemming.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 898-907