Catálogo de publicaciones - libros

Compartir en
redes sociales


Research and Advanced Technology for Digital Libraries: 11th European Conference, ECDL 2007, Budapest, Hungary, September 16-21, 2007. Proceedings

László Kovács ; Norbert Fuhr ; Carlo Meghini (eds.)

En conferencia: 11º International Conference on Theory and Practice of Digital Libraries (ECDL) . Budapest, Hungary . September 16, 2007 - September 21, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Theory of Computation; Library Science; Database Management; Information Systems Applications (incl. Internet); Multimedia Information Systems

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74850-2

ISBN electrónico

978-3-540-74851-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Thesaurus-Based Feedback to Support Mixed Search and Browsing Environments

Edgar Meij; Maarten de Rijke

We propose and evaluate a query expansion mechanism that supports searching and browsing in collections of annotated documents. Based on generative language models, our feedback mechanism uses document-level annotations to bias the generation of expansion terms and to generate browsing suggestions in the form of concepts selected from a controlled vocabulary (as typically used in digital library settings). We provide a detailed formalization of our feedback mechanism and evaluate its effectiveness using the TREC 2006 Genomics track test set. As to the retrieval effectiveness, we find a 20% improvement in mean average precision over a query-likelihood baseline, whilst increasing precision at 10. When we base the parameter estimation and feedback generation of our algorithm on a large corpus, we also find an improvement over state-of-the-art relevance models. The browsing suggestions are assessed along two dimensions: relevancy and specifity. We present an account of per-topic results, which helps understand for what type of queries our feedback mechanism is particularly helpful.

- User Interfaces | Pp. 247-258

Named Entity Identification and Cyberinfrastructure

Alison Babeu; David Bamman; Gregory Crane; Robert Kummer; Gabriel Weaver

Well-established instruments such as authority files and a growing set of data structures such as CIDOC CRM, FRBRoo, and MODS provide the foundation for emerging, new digital services. While solid, these instruments alone neither capture the essential data on which traditional scholarship depends nor enable the services which we can already identify as fundamental to any eResearch, cyberinfrastructure or virtual research environment for intellectual discourse. This paper describes a general model for primary sources, entities and thematic topics, the gap between this model and emerging infrastructure, and the tasks necessary to bridge it.

- Document Linking | Pp. 259-270

Finding Related Papers in Literature Digital Libraries

Nattakarn Ratprasartporn; Gultekin Ozsoyoglu

This paper is about searching literature digital libraries to find “related” publications of a given publication. Existing approaches do not take into account publication topics in the relatedness computation, allowing topic diffusion across query output publications. In this paper, we propose a new way to measure “relatedness” by incorporating “contexts” (representing topics) of publications. We utilize existing ontology terms as contexts for publications, i.e., publications are assigned to their relevant contexts, where a context characterizes one or more publication topics. We define three ways of context-based relatedness, namely, (a) relatedness between two contexts ( relatedness) by using publications that are assigned to the contexts and the context structures in the context hierarchy, (b) relatedness between a context and a paper ( relatedness), which is used to rank the relatedness of contexts with respect to a paper, and (c) relatedness between two papers ( relatedness) by using both paper-to-context and context-to-context relatedness measurements.

Using existing biomedical ontology terms as contexts for genomics-oriented publications, our experiments indicate that the context-based approach is accurate, and solves the topic diffusion problem by effectively classifying and ranking related papers of a given paper based on the selected contexts of the paper.

- Document Linking | Pp. 271-284

Extending Semantic Matching Towards Digital Library Contexts

László Kovács; András Micsik

Matching users’ goals with available offers is a traditional research topic for electronic market places and service-oriented architectures. The new area of Semantic Web Services introduced the possibility of semantic matching between user goals and services. Authors show in the paper what kind of benefits semantic matching may provide for digital libraries. Various practical examples are given for the usefulness of semantic matching, and a novel algorithm is introduced for computing semantic matches. The implementation and operation of matching are explained using a digital document search scenario.

- Document Linking | Pp. 285-296

Towards a Unified Approach Based on Affinity Graph to Various Multi-document Summarizations

Xiaojun Wan; Jianguo Xiao

This paper proposes a unified extractive approach based on affinity graph to both generic and topic-focused multi-document summarizations. By using an asymmetric similarity measure, the relationships between sentences are reflected in a directed affinity graph for generic summarization. For topic-focused summarization, the topic information is incorporated into the affinity graph using a topic-sensitive affinity measure. Based on the affinity graph, the information richness of sentences is computed by the graph-ranking algorithm on differentiated intra-document links and inter-document links between sentences. Lastly, the greedy algorithm is employed to impose diversity penalty on sentences and the sentences with both high information richness and high information novelty are chosen into the summary. Experimental results on the tasks of DUC 2002-2005 demonstrate the excellent performances of the proposed approaches to both generic and topic-focused multi-document summarization tasks.

- Information Retrieval | Pp. 297-308

Large-Scale Clustering and Complete Facet and Tag Calculation

Bolette Ammitzbøll Madsen

The State and University Library of Denmark is developing an integrated search system called Summa, and as part of the Summa project a clustering module and a facet module. Simple clusters have been created for a collection of more than six and a half million library metadata records using a linear clustering algorithm. The created clusters are used to enrich the metadata records, and search results are presented to the user using a faceted browsing interface alongside a ranked result list. The most frequent tags in the different facets in the search result can be calculated and presented at a rate of approximately three million records per second per machine.

- Information Retrieval | Pp. 309-320

Annotation-Based Document Retrieval with Probabilistic Logics

Ingo Frommholz

Annotations are an important part in today’s digital libraries and Web information systems as an instrument for interactive knowledge creation. Annotation-based document retrieval aims at exploiting annotations as a rich source of evidence for document search. The POLAR framework supports annotation-based document search by translating POLAR programs into four-valued probabilistic datalog and applying a retrieval strategy called knowledge augmentation, where the content of a document is augmented with the content of its attached annotations. In order to evaluate this approach and POLAR’s performance in document search, we set up a test collection based on a snapshot of ZDNet News, containing IT-related articles and attached discussion threads. Our evaluation shows that knowledge augmentation has the potential to increase retrieval effectiveness when applied in a moderate way.

- Information Retrieval | Pp. 321-332

Evaluation of Visual Aid Suite for Desktop Searching

Schubert Foo; Douglas Hendry

The task of searching for documents is becoming more challenging as the volumes of data stored continues to increase, and retrieval systems produce longer results list. Graphical visualisations can assist users to more efficiently and effectively understand large volumes of information. This work investigates the use of multiple visualisations in a desktop search tool. These visualisations include a List View, Tree View, Map View, Bubble View, Tile View and Cloud View. A preliminary evaluation was undertaken by 94 participants to gauge its potential usefulness and to detect usability issues with its interface and graphical presentations. The evaluation results show that these visualisations made it easier and quicker for them to find relevant documents. All of the evaluators found at least one of the visualisations useful and over half of them found at least three of the visualisations to be useful. The evaluation results support the research premise that a combination of integrated visualisations will result in a more effective search tool. The next stage of work is to improve the current views in light of the evaluation findings in preparation for the scalability and longitudinal tests for a series of increasingly larger result sets of documents.

- Personal Information Management | Pp. 333-344

Personal Environment Management

Anna Zacchi; Frank Shipman

We report on a study of the practices people employ to organize resources for their activities on their computers. Today the computer is the main working environment for many people. People use computers to do an increasing number of tasks. We observed different patterns of organization of resources across the desktop and the folder structure. We describe several strategies that people employ to customize the environment in order to easily perform their activities, access their resources, and overview their current tasks.

- Personal Information Management | Pp. 345-356

Empirical Evaluation of Semi-automated XML Annotation of Text Documents with the GoldenGATE Editor

Guido Sautter; Klemens Böhm; Frank Padberg; Walter Tichy

Digitized scientific documents should be marked up according to domain-specific XML schemas, to make maximum use of their content. Such markup allows for advanced, semantics-based access to the document collection. Many NLP applications have been developed to support automated annotation. But NLP results often are not accurate enough; and manual corrections are indispensable. We therefore have developed the GoldenGATE editor, a tool that integrates NLP applications and assistance features for manual XML editing. Plain XML editors do not feature such a tight integration: Users have to create the markup manually or move the documents back and forth between the editor and (mostly command line) NLP tools. This paper features the first empirical evaluation of how users benefit from such a tight integration when creating semantically rich digital libraries. We have conducted experiments with humans who had to perform markup tasks on a document collection from a generic domain. The results show clearly that markup editing assistance in tight combination with NLP functionality significantly reduces the user effort in annotating documents.

- Personal Information Management | Pp. 357-367