Catálogo de publicaciones - libros

Compartir en
redes sociales


Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers

Carol Peters ; Fredric C. Gey ; Julio Gonzalo ; Henning Müller ; Gareth J. F. Jones ; Michael Kluck ; Bernardo Magnini ; Maarten de Rijke (eds.)

En conferencia: 6º Workshop of the Cross-Language Evaluation Forum for European Languages (CLEF) . Vienna, Austria . September 21, 2005 - September 23, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Artificial Intelligence (incl. Robotics); Information Systems Applications (incl. Internet); Language Translation and Linguistics

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-45697-1

ISBN electrónico

978-3-540-45700-8

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

UB at CLEF 2005: Bilingual CLIR and Medical Image Retrieval Tasks

Miguel E. Ruiz; Silvia B. Southwick

This paper presents the results of the State University of New York at Buffalo in the Cross Language Evaluation Forum 2005 (CLEF 2005). We participated in monolingual Portuguese, bilingual English-Portuguese and in the medical image retrieval tasks. We used the SMART retrieval system for text retrieval in the mono and bilingual retrieval tasks on Portuguese documents. The main goal of this part was to test formally the support for Portuguese that had been added to our system. Our results show an acceptable level of performance in the monolingual task. For the retrieval of medical images with multilingual annotations our main goal was to explore the combination of Content-Based Image Retrieval (CBIR) and text retrieval to retrieve medical images that have clinical annotations in English, French and German. We used a system that combines the content based image retrieval systems GIFT and the well known SMART system for text retrieval. Translation of English topics to French was performed by mapping the English text to UMLS concepts using MetaMap and the UMLS Metathesaurus. Our results on this task confirms that the combination of CBIR and text retrieval improves results significantly with respect to using either image or text retrieval alone.

- Part V. Cross-Language Retrieval In Image Collections (ImageCLEF) | Pp. 737-743

Overview of the CLEF-2005 Cross-Language Speech Retrieval Track

Ryen W. White; Douglas W. Oard; Gareth J. F. Jones; Dagobert Soergel; Xiaoli Huang

The task for the CLEF-2005 cross-language speech retrieval track was to identify topically coherent segments of English interviews in a known-boundary condition. Seven teams participated, performing both monolingual and cross-language searches of ASR transcripts, automatically generated metadata, and manually generated metadata. Results indicate that monolingual search technology is sufficiently accurate to be useful for some purposes (the best mean average precision was 0.13) and cross-language searching yielded results typical of those seen in other applications (with the best systems approximating monolingual mean average precision).

- Part VI. Cross-Language Speech Retrieval (CL-SR) | Pp. 744-759

Using Various Indexing Schemes and Multiple Translations in the CL-SR Task at CLEF 2005

Diana Inkpen; Muath Alzghool; Aminul Islam

We present the participation of the University of Ottawa in the Cross-Language Spoken Document Retrieval task at CLEF 2005. In order to translate the queries, we combined the results of several online Machine Translation tools. For the Information Retrieval component we used the SMART system [1], with several weighting schemes for indexing the documents and the queries. One scheme in particular led to better results than other combinations. We present the results of the submitted runs and of many un-official runs. We compare the effect of several translations from each language. We present results on phonetic transcripts of the collection and queries and on the combination of text and phonetic transcripts. We also include the results when the manual summaries and keywords are indexed.

- Part VI. Cross-Language Speech Retrieval (CL-SR) | Pp. 760-768

The University of Alicante at CL-SR Track

Rafael M. Terol; Manuel Palomar; Patricio Martinez-Barco; Fernando Llopis; Rafael Muñoz; Elisa Noguera

In this paper, the new features that IR-n system applies on the topic processing for CL-SR are described. This set of features are based on applying logic forms to topics with the aim of incrementing the weight of topic terms according to a set of syntactic rules.

- Part VI. Cross-Language Speech Retrieval (CL-SR) | Pp. 769-772

Pitt at CLEF05: Data Fusion for Spoken Document Retrieval

Daqing He; Jae-Wook Ahn

This paper describes an investigation of data fusion techniques for spoken document retrieval. The effectiveness of retrievals solely based on the outputs from automatic speech recognition (ASR) is subject to the recognition errors introduced by the ASR process. This is especially true for retrievals on Malach test collection, whose ASR outputs have average word error rate (WER) of 35%. To overcome the problem, in this year CLEF experiments, we explored data fusion techniques for integrating the manually generated metadata information, which is provided for every Malach document, with the ASR outputs. We concentrated our effort on the post-search data fusion techniques, where multiple retrieval results using automatic generated outputs or human metadata were combined. Our initial studies indicated that a simple unweighted combination method (i.e., CombMNZ) that had demonstrated to be useful in written text retrieval environment only generated significant 38% relative decrease in retrieval effectiveness (measured by Mean Average Precision) for our task by comparing to a simple retrieval baseline where all manual metadata and ASR outputs are put together. This motivated us to explore a more elaborated weighted data fusion model, where the weights are associated with each retrieval result, and can be specified by the user in advance. We also explored multiple iterations of data fusion in our weighted fusion model, and obtained further improvement at 2nd iteration. In total, our best run on data fusion obtained 31% significant relative improvement over the simple fusion baseline, and 4% relative improvement over the manual-only baseline, which is a significant difference.

- Part VI. Cross-Language Speech Retrieval (CL-SR) | Pp. 773-782

UNED@CL-SR CLEF 2005: Mixing Different Strategies to Retrieve Automatic Speech Transcriptions

Fernando López-Ostenero; Víctor Peinado; Valentín Sama; Felisa Verdejo

In this paper we describe UNED’s participation in the CLEF CL-SR 2005 track. First, we explain how we tried several strategies to clean up the automatic transcriptions. Then, we describe how we performed 84 different runs mixing these strategies with named entity recognition and different pseudo-relevance feedback approaches, in order to study the influence of each method in the retrieval process, both in monolingual and cross-lingual environments. We noticed that the influence of named entity recognition was higher in the cross-lingual environment, where MAP scores double when we take advantage of an entity recognizer. The best pseudo-relevance feedback approach was the one using manual keywords. The effects of the different cleaning strategies were very similar, except for character 3-grams, which obtained poor scores compared with other approaches.

- Part VI. Cross-Language Speech Retrieval (CL-SR) | Pp. 783-791

Dublin City University at CLEF 2005: Cross-Language Speech Retrieval (CL-SR) Experiments

Adenike M. Lam-Adesina; Gareth J. F. Jones

The Dublin City University participation in the CLEF 2005 CL-SR task concentrated on exploring the application of our existing information retrieval methods based on the Okapi model to the conversational speech data set. This required an approach to determining approximate sentence boundaries within the free-flowing automatic transcription provided to enable us to use our summary-based pseudo relevance feedback (PRF). We also performed exploratory experiments on the use of the metadata provided with the document transcriptions for indexing and relevance feedback. Topics were translated into English using Systran V3.0 machine translation. In most cases Title field only topic statements performed better than combined Title and Description topics. PRF using our adapted method is shown to be affective, and absolute performance is improved by combining the automatic document transcriptions with additional metadata fields.

- Part VI. Cross-Language Speech Retrieval (CL-SR) | Pp. 792-799

CLEF-2005 CL-SR at Maryland: Document and Query Expansion Using Side Collections and Thesauri

Jianqiang Wang; Douglas W. Oard

This paper reports results for the University of Maryland’s participation in the CLEF-2005 Cross-Language Speech Retrieval track. Techniques that were tried include: (1) document expansion with manually created metadata (thesaurus keywords and segment summaries) from a large side collection, (2) query refinement with pseudo-relevance feedback, (3) keyword expansion with thesaurus synonyms, and (4) cross-language speech retrieval using translation knowledge obtained from the statistics of a large parallel corpus. The results show that document expansion and query expansion using blind relevance feedback were effective, although optimal parameter choices differed somewhat between the training and evaluation sets. Document expansion in which manually assigned keywords were augmented with thesaurus synonyms yielded marginal gains on the training set, but no improvement on the evaluation set. Cross-language retrieval with French queries yielded 79% of monolingual mean average precision when searching manually assigned metadata despite a substantial domain mismatch between the parallel corpus and the retrieval task. Detailed failure analysis indicates that speech recognition errors for named entities were an important factor that substantially degraded retrieval effectiveness.

- Part VI. Cross-Language Speech Retrieval (CL-SR) | Pp. 800-809

Overview of WebCLEF 2005

Börkur Sigurbjörnsson; Jaap Kamps; Maarten de Rijke

We describe WebCLEF, the multilingual web track, that was introduced at CLEF 2005. We provide details of the tasks, the topics, and the results of WebCLEF participants. The mixed monolingual task proved an interesting addition to the range of tasks in cross-language information retrieval. Although it may be too early to talk about a solved problem, effective web retrieval techniques seem to carry over to the mixed monolingual setting. The multilingual task, in contrast, is still very far from being a solved problem. Remarkably, using non-translated English queries proved more successful than using translations of the English queries.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 810-824

EuroGOV: Engineering a Multilingual Web Corpus

Börkur Sigurbjörnsson; Jaap Kamps; Maarten de Rijke

is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian governmental web sites. The corpus contains over 3 million documents written in more than 20 different European languages. In this paper we provide a detailed description of the collection.

- Part VII. Multilingual Web Track (WebCLEF) | Pp. 825-836