Catálogo de publicaciones - libros
Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers
Carol Peters ; Fredric C. Gey ; Julio Gonzalo ; Henning Müller ; Gareth J. F. Jones ; Michael Kluck ; Bernardo Magnini ; Maarten de Rijke (eds.)
En conferencia: 6º Workshop of the Cross-Language Evaluation Forum for European Languages (CLEF) . Vienna, Austria . September 21, 2005 - September 23, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Information Storage and Retrieval; Artificial Intelligence (incl. Robotics); Information Systems Applications (incl. Internet); Language Translation and Linguistics
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-45697-1
ISBN electrónico
978-3-540-45700-8
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11878773_11
Conceptual Indexing for Multilingual Information Retrieval
Jacques Guyot; Saïd Radhouani; Gilles Falquet
We present a translation-free technique for multilingual information retrieval. This technique is based on an ontological representation of documents and queries. For each language, we use a dictionary (set of lexical reference for concepts) to map a term to its corresponding concept. The same mapping is applied to each document and each query. Then, we use a classic vector space model based on concept for indexing and querying the document corpus. The main advantages of our approach are: no merging phase is required; no dependency on automatic translators between all pairs of languages; and adding a new language only requires a new mapping dictionary to be added into the multilingual ontology. Experimental results on the CLEF 2005 multi8 collection show that this approach is efficient, even with relatively small and low fidelity dictionaries and without word sense disambiguation.
- Cross-Language and More | Pp. 102-112
doi: 10.1007/11878773_12
SINAI at CLEF 2005: Multi-8 Two-Years-on and Multi-8 Merging-Only Tasks
Fernando Martínez-Santiago; Miguel A. García-Cumbreras; L. A. Ureña-López
This year, we participated in and CLEF tasks. Our main interest has been to test several standard CLIR techniques and investigate how they affect the final performance of the multilingual system. Specifically, we have evaluated the information retrieval (IR) model used to obtain each monolingual result, the merging algorithm, the translation approach and the application of query expansion techniques. The obtained results show that by means of improving merging algorithms and translation resources we reach better results than improving other CLIR modules such as IR engines or the expansion of queries.
- Cross-Language and More | Pp. 113-120
doi: 10.1007/11878773_13
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists
Luo Si; Jamie Callan
We participated in two tasks: Multi-8 two-years-on retrieval and Multi-8 results merging. For the multi-8 two-years-on retrieval work, algorithms are proposed to combine simple multilingual ranked lists into a more accurate ranked list. Empirical study shows that the approach of combining multilingual retrieval results can substantially improve the accuracies over single multilingual ranked lists. The Multi-8 results merging task is viewed as similar to the results merging task of federated search. Query-specific and language-specific models are proposed to calculate comparable document scores for a small amount of documents and estimate logistic models by using information of these documents. The logistic models are used to estimate comparable scores for all documents and thus the documents can be sorted into a final ranked list. Experimental results demonstrate the advantage of the query-specific and language-specific models against several other alternatives.
- Cross-Language and More | Pp. 121-130
doi: 10.1007/11878773_14
Monolingual, Bilingual, and GIRT Information Retrieval at CLEF-2005
Jacques Savoy; Pierre-Yves Berger
For our fifth participation in the CLEF evaluation campaigns, our first objective was to propose an effective and general stopword list as well as a light stemming procedure for the Hungarian, Bulgarian and Portuguese (Brazilian) languages. Our second objective was to obtain a better picture of the relative merit of various search engines when processing documents in those languages. To do so we evaluated our scheme using two probabilistic models and five vector-processing approaches. In the bilingual track, we evaluated both the machine translation and bilingual dictionary approaches applied to automatically translate a query submitted in English into various target languages. Finally, using the GIRT corpora (available in English, German and Russian), we investigated the variations in retrieval effectiveness that resulted when we included or excluded manually assigned keywords attached to the bibliographic records (mainly comprising a title and an abstract).
- Cross-Language and More | Pp. 131-140
doi: 10.1007/11878773_15
Socio-Political Thesaurus in Concept-Based Information Retrieval
Mikhail Ageev; Boris Dobrov; Natalia Loukachevitch
In CLEF 2005 experiments we used a bilingual Russian-English Socio-Political Thesaurus that we developed over more than 10 years as a tool for automatic text processing in information retrieval tasks. The same resource and the same algorithms were used for the ad-hoc and domain–specific task.
- Cross-Language and More | Pp. 141-150
doi: 10.1007/11878773_16
The Performance of a Machine Translation-Based English-Indonesian CLIR System
Mirna Adriani; Ihsan Wahyu
We describe our participation in the Indonesian-English bilingual task of the 2005 Cross-Language Evaluation Forum (CLEF). We translated an Indonesian query set into English using a commercial machine translation tool called and attempted to improve retrieval effectiveness using a query expansion technique. However, since our initial retrieval effectiveness was low, the query expansion technique had a negative impact on performance.
- Cross-Language and More | Pp. 151-154
doi: 10.1007/11878773_17
Exploring New Languages with HAIRCUT at CLEF 2005
Paul McNamee
JHU/APL has long espoused the use of language-neutral methods for cross-language information retrieval. This year we participated in the ad hoc cross-language track and submitted both monolingual and bilingual runs. We undertook our first investigations in the Bulgarian and Hungarian languages. In our bilingual experiments we used several non-traditional CLEF query languages such as Greek, Hungarian, and Indonesian, in addition to several western European languages. We found that character n-grams remain an attractive option for representing documents and queries in these new languages. In our monolingual tests n-grams were more effective than unnormalized words for retrieval in Bulgarian (+30%) and Hungarian (+63%). Our bilingual runs made use of , statistical translation of character n-grams using aligned corpora, when parallel data were available, and web-based machine translation, when no suitable data could be found.
- Cross-Language and More | Pp. 155-164
doi: 10.1007/11878773_18
Dublin City University at CLEF 2005: Multi-8 Two-Years-On Merging Experiments
Adenike M. Lam-Adesina; Gareth J. F. Jones
This year Dublin City University participated in the CLEF 2005 Mulit-8 Two-Years-On multilingual merging task. The objective of our experiments was to test a range of standard techniques for merging ranked lists of retrieved documents to see if consistent trends emerge for lists generated using different information retrieval systems. Our results show that the success of merging techniques can be dependent on the retrieval system used, and in consequence the best merging techniques to adopt cannot be recommended independent of knowing the retrieval system to be used.
- Cross-Language and More | Pp. 165-169
doi: 10.1007/11878773_19
Applying Light Natural Language Processing to Ad-Hoc Cross Language Information Retrieval
Christina Lioma; Craig Macdonald; Ben He; Vassilis Plachouras; Iadh Ounis
In the CLEF 2005 Ad-Hoc Track we addressed the problem of retrieving information in morphologically rich languages, by experimenting with language-specific morphosyntactic processing and light Natural Language Processing (NLP). The diversity of the languages processed, namely Bulgarian, French, Italian, English, and Greek, allowed us to measure the effect of system-specific features upon the retrieval of these languages, and to juxtapose that effect to the role of language resources in Cross Language Information Retrieval (CLIR) in general.
- Cross-Language and More | Pp. 170-178
doi: 10.1007/11878773_20
Four Stemmers and a Funeral: Stemming in Hungarian at CLEF 2005
Anna Tordai; Maarten de Rijke
We developed algorithmic stemmers for Hungarian and used them for the ad-hoc monolingual task for CLEF 2005. Our goal was to determine what degree of stemming is the most effective. Although on average the stemmers did not perform as well as the the best -gram, we found that stemming over a broad range of suffixes especially on nouns is highly useful.
- Monolingual Experiments | Pp. 179-186