Catálogo de publicaciones - libros

Compartir en
redes sociales


Research and Advanced Technology for Digital Libraries: 9th European Conference, ECDL 2005, Vienna, Austria, September 18-23, 2005, Proceedings

Andreas Rauber ; Stavros Christodoulakis ; A Min Tjoa (eds.)

En conferencia: 9º International Conference on Theory and Practice of Digital Libraries (ECDL) . Vienna, Austria . September 18, 2005 - September 23, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Database Management; Information Systems Applications (incl. Internet); Multimedia Information Systems; User Interfaces and Human Computer Interaction; Document Preparation and Text Processing

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-28767-4

ISBN electrónico

978-3-540-31931-3

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Evaluation of the NSDL and Google for Obtaining Pedagogical Resources

Frank McCown; Johan Bollen; Michael L. Nelson

We describe an experiment that measures the pedagogical usefulness of the results returned by the National Science Digital Library (NSDL) and Google. Eleven public school teachers from the state of Virginia (USA) were used to evaluate a set of 38 search terms and search results based on the Standards of Learning (SOL) for Virginia Public Schools. Evaluations of search results were obtained from the NSDL (572 evaluations) and Google (650 evaluations). In our experiments, teachers ranked the links returned by Google as more relevant to the SOL than the links returned by the NSDL. Furthermore, Google’s ranking of educational material also showed some correlation with expert judgments.

Palabras clave: Search Term; Search Result; Digital Library; Educational Content; Public School Teacher.

- Digital Libraries and e-Learning | Pp. 344-355

Policy Model for University Digital Collections

Alexandros Koulouris; Sarantos Kapidakis

The access and reproduction policies of the digital collections of ten leading university digital libraries worldwide are classified according to factors such as the creation type of the material, acquisition method, copyright ownership etc. The relationship of these factors is analyzed, showing how acquisition methods and copyright ownership affect the access and reproduction policies of digital collections. We conclude with rules about which factors lead to specific policies. For example, when the library has the copyright of the material, the reproduction for private use is usually provided free with a credit to the source or otherwise mostly under fair use provisions, but the commercial reproduction needs written permission and fees are charged. The extracted rules, which show the common practice on access and reproduction policies, constitute the policy model. Finally, conventional policies are mapped onto digital policies.

- Digital Libraries and e-Learning | Pp. 356-367

Importance of HTML Structural Elements and Metadata in Automated Subject Classification

Koraljka Golub; Anders Ardö

The aim of the study was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification. The data collection that was used comprised 1000 Web pages in engineering, to which Engineering Information classes had been manually assigned. The significance indicators were derived using several different methods: (total and partial) precision and recall, semantic distance and multiple regression. It was shown that for best results all the elements have to be included in the classification process. The exact way of combining the significance indicators turned out not to be overly important: using the F1 measure, the best combination of significance indicators yielded no more than 3% higher performance results than the baseline.

- Text Classification in Digital Libraries | Pp. 368-378

DL Meets P2P – Distributed Document Retrieval Based on Classification and Content

Wolf-Tilo Balke; Wolfgang Nejdl; Wolf Siberski; Uwe Thaden

Peer-to-peer architectures are a potentially powerful paradigm for retrieving documents over networks of digital libraries avoiding single points of failure by massive federation of (independent) information sources. Today sharing files over P2P infrastructures is already immensely successful, but restricted to simple metadata matching. But when it comes to the retrieval of complex documents, capabilities as provided by digital libraries are needed. Digital libraries have to cope with compound documents. Though some document parts (like embedded images) can efficiently be retrieved using metadata matching, the text-based information needs different methods like full text search. However, for effective querying of texts, also information like inverted document frequencies are essential. But due to the distributed characteristics of P2P networks such ’collection-wide’ information poses severe problems, e.g. that central updates whenever changes in any document collection occur use up valuable bandwidth. We will present a novel indexing technique that allows to query using collection-wide information with respect to different classifications and show the effectiveness of our scheme for practical applications. We will in detail discuss our findings and present simulations for the scheme’s efficiency and scalability.

Palabras clave: Digital Library; Document Collection; Query Term; Category Index; Query Index.

- Text Classification in Digital Libraries | Pp. 379-390

Automatic Detection of Survey Articles

Hidetsugu Nanba; Manabu Okumura

We propose a method for detecting survey articles in a multilingual database. Generally, a survey article cites many important papers in a research domain. Using this feature, it is possible to detect survey articles. We applied HITS, which was devised to retrieve Web pages using the notions of authority and hub. We can consider that important papers and survey articles correspond to authorities and hubs, respectively. It is therefore possible to detect survey articles, by applying HITS to databases and by selecting papers with outstanding hub scores. However, HITS does not take into account the contents of each paper, so the algorithm may detect a paper citing many principal papers in mistake for survey articles. We therefore improve HITS by analysing the contents of each paper. We conducted an experiment and found that HITS was useful for the detection of survey articles, and that our method could improve HITS.

Palabras clave: Automatic Detection; Authority Score; Test Collection; Survey Article; Bibliographic Information.

- Text Classification in Digital Libraries | Pp. 391-401

Focused Crawling Using Latent Semantic Indexing – An Application for Vertical Search Engines

George Almpanidis; Constantine Kotropoulos; Ioannis Pitas

Vertical search engines and web portals are gaining ground over the general-purpose engines due to their limited size and their high precision for the domain they cover. The number of vertical portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain specific web documents. We compare its efficiency with other well-known web information retrieval techniques. Our implementation presents a different approach to focused crawling and aims to overcome the size limitations of the initial training data while maintaining a high recall/precision ratio.

- Searching | Pp. 402-413

Active Support for Query Formulation in Virtual Digital Libraries: A Case Study with DAFFODIL

André Schaefer; Matthias Jordan; Claus-Peter Klas; Norbert Fuhr

Daffodil is a front-end to federated, heterogeneous digital libraries targeting at strategic support of users during the information seeking process. This is done by offering a variety of functions for searching, exploring and managing digital library objects. However, the distributed search increases response time and the conceptual model of the underlying search processes is inherently weaker. This makes query formulation harder and the resulting waiting times can be frustrating. In this paper, we investigate the concept of proactive support during the user’s query formulation. For improving user efficiency and satisfaction, we implemented annotations, proactive support and error markers on the query form itself. These functions decrease the probability for syntactical or semantical errors in queries. Furthermore, the user is able to make better tactical decisions and feels more confident that the system handles the query properly. Evaluations with 30 subjects showed that user satisfaction is improved, whereas no conclusive results were received for efficiency.

Palabras clave: Digital Library; Query Formulation; Search Interface; Heuristic Evaluation; Query Reformulation.

- Searching | Pp. 414-425

Expression of Z39.50 Supported Search Capabilities by Applying Formal Descriptions

Michalis Sfakakis; Sarantos Kapidakis

The wide adoption of the Z39.50 protocol from the Libraries exposes their abilities to participate in a distributed environment. In spite of the protocol specification of a unified global access mechanism, query failures and/or inconsistent answers are the pending issues when searching many sources due to the variant or poor implementations. The elimination of these issues heavily depends on the ability of the client to make decisions prior to initiating search requests, utilizing the knowledge of the supported search capabilities of each source. To effectively reformulate such requests, we propose a Datalog based description for capturing the knowledge about the supported search capabilities of a Z 39.50 source. We assume that the accessible sources can answer some but possibly not all queries over their data, and we describe a model for their supported search capabilities using a set of parameterized queries, according to the Relational Query Description Language (RQDL) specification.

Palabras clave: Access Point; Attribute Type; Formal Description; Query Language; Global Schema.

- Searching | Pp. 426-437

A Comparison of On-Line Computer Science Citation Databases

Vaclav Petricek; Ingemar J. Cox; Hui Han; Isaac G. Councill; C. Lee Giles

This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer’s autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies. We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also biased against low-cited papers. Despite their difference, both databases exhibit similar and significantly different citation distributions compared with previous analysis of the Physics community. In both databases, we also observe that the number of authors per paper has been increasing over time.

- Text Digital Libraries | Pp. 438-449

A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation

Shiyan Ou; Christopher S. G. Khoo; Dion H. Goh

The design, implementation and evaluation of a multi-document summarization system for sociology dissertation abstracts are described. The system focuses on extracting variables and their relationships from different documents, integrating the extracted information, and presenting the integrated information using a variable-based framework. Two important summarization steps – information extraction and information integration were evaluated by comparing system-generated output against human-generated output. Results indicate that the system-generated output achieves good precision and recall while extracting important concepts from each document, as well as good clusters of similar concepts from the set of documents.

Palabras clave: Important Concept; Information Extraction; Information Integration; Contextual Relation; Dissertation Abstract.

- Text Digital Libraries | Pp. 450-461