Catálogo de publicaciones - libros

Compartir en
redes sociales


Digital Libraries: Achievements, Challenges and Opportunities: 9th International Conference on Asian Digial Libraries, ICADL 2006, Kyoto, Japan, November 27-30, 2006, Proceedings

Shigeo Sugimoto ; Jane Hunter ; Andreas Rauber ; Atsuyuki Morishima (eds.)

En conferencia: 9º International Conference on Asian Digital Libraries (ICADL) . Kyoto, Japan . November 27, 2006 - November 30, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Database Management; Information Systems Applications (incl. Internet); Multimedia Information Systems; User Interfaces and Human Computer Interaction; Document Preparation and Text Processing

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-49375-4

ISBN electrónico

978-3-540-49377-8

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Parallelising Harvesting

Hussein Suleman

Metadata harvesting has become a common technique to transfer a stream of data from one metadata repository or digital library system to another. As collections of metadata, and their associated digital objects, grow in size, the ingest of these items at the destination archive can take a significant amount of time, depending on the type of indexing or post-processing that is required. This paper discusses an approach to parallelise the post-processing of data in a small cluster of machines or a multi-processor environment, while not increasing the burden on the source data provider. Performance tests have been carried out on varying architectures and the results indicate that this technique is indeed promising for some scenarios and can be extended to more computationally-intensive ingest procedures. In general, the technique presents a new approach for the construction of harvest-based distributed or component-based digital libraries, with better scalability than before.

Palabras clave: Digital Library; Data Provider; Disk Access; Beowulf Cluster; High Computational Load.

- Distributed Repositories | Pp. 81-90

Sibling Page Search by Page Examples

Hiroaki Ohshima; Satoshi Oyama; Katsumi Tanaka

We propose methods of searching Web pages that are “semantically” regarded as “siblings” with respect to given page examples. That is, our approach aims to find pages that are similar in theme but have different content from the given sample pages. We called this “sibling page search”. The proposed search methods are different from conventional content-based similarity search for Web pages. Our approach recommends Web pages whose “conceptual” classification category is the same as that of the given sample pages, but whose content is different from the sample pages. In this sense, our approach will be useful for supporting a user’s opportunistic search, meaning a search in which the user’s interest and intention are not fixed. The proposed methods were implemented by computing the “common” and “unique” feature vectors of the given sample pages, and by comparing those feature vectors with each retrieved page. We evaluated our method for sibling page search, in which our method was applied to test sets consisting of page collections from the Open Directory Project (ODP).

Palabras clave: Feature Vector; Term Frequency; Cosine Similarity; Part Vector; Relevant Page.

- Information Extraction | Pp. 91-100

Contextualization of a RDF Knowledge Base in the VIKEF Project

Heiko Stoermer; Ignazio Palmisano; Domenico Redavid; Luigi Iannone; Paolo Bouquet; Giovanni Semeraro

Due to the simplicity of RDF data model and semantics, complex application scenarios in which RDF is used to represent the application data model raise important design issues. Modelling e.g. the temporary evolution, relevance, trust and provenance in Knowledge Bases require more than just a set of universally true statements, without any reference to a situation, a point in time, or generally a context. Our proposed solution is to use the notion of context to separate statements that refer to different contextual information, which could so far not explicitly be tied to the statements. In this paper we describe a practical solution to this problem, which has been implemented in the VIKEF project, which deals with making explicit and intelligently useable information contained in vast collections of documents, databases and metadata repositories.

Palabras clave: Resource Description Framework; SPARQL Query; Metadata Repository; Important Design Issue; SPARQL Engine.

- Information Extraction | Pp. 101-110

Visualizing User Communities and Usage Trends of Digital Libraries Based on User Tracking Information

Seonho Kim; Subodh Lele; Sreeram Ramalingam; Edward A. Fox

We describe VUDM, our Visual User-model Data Mining tool, and its application to data logged regarding interactions of 1,200 users of the Networked Digital Library of Theses and Dissertations (NDLTD). The goals of VUDM are to visualize social networks, patrons’ distributions, and usage trends of NDLTD. The distinctive approach of this research is that we focus on analysis and visualization of users’ implicit rating data, which was generated based on user tracking information, such as sending queries and browsing result sets – rather than focusing on explicit data obtained from a user survey, such as major, specialties, years of experience, and demographics. The VUDM interface uses spirals to portray virtual interest groups, positioned based on inter-group relationships. VUDM facilitates identifying trends related to changes in interest, as well as concept drift. A formative evaluation found that VUDM is perceived to be effective for five types of tasks. Future work will aim to improve the understandability and utility of VUDM.

Palabras clave: Digital Library; Collaborative Filter; Concept Drift; Information Visualization; Explicit Data.

- Information Extraction | Pp. 111-120

Extracting Mnemonic Names of People from the Web

Tomoko Hokama; Hiroyuki Kitagawa

The web has gained much attention as new media reflecting real-time interest in the world. This attention is driven by the proliferation of tools like bulletin boards and weblogs. The web is a source from which we can collect and summarize information about a particular object (e.g., business organization, product, person, etc.) For example, the extraction of reputation information is a major research topic in information extraction and knowledge extraction from the web. The ability to collect web pages about a particular object is essential in obtaining such information and extracting knowledge from it. A big problem in the web page collection process is that the same objects are referred to in different ways in different web documents. For example, a person may be referred to by full name, first name, affiliation and title, or nicknames. This paper proposes a method for extracting these mnemonic names of people from the web and shows experimental results using real web data.

Palabras clave: knowledge extraction; object identification; web mining.

- Information Extraction | Pp. 121-130

Automatic Task Detection in the Web Logs and Analysis of Multitasking

Nikolai Buzikashvili

In this paper, we describe the conceptual basis and results of the Web search task detection study with emphasis on multitasking. The basis includes: logical structure of a search process, a space of physical realizations, mapping of a logical structure into the space of realizations. Questions on the users’ manners of search realization are formulated, with emphasis on multiple tasks execution. An automatic analysis of the Web logs shows that multitasking is rare, usually it includes only two task sessions and is formed into a temporal inclusion of an interrupting task session into the interrupted one. Searchers follow the principle of least effort and select the cheapest tactics: sequential tasks execution as a rule or, in the rare case of multitasking, the least expensive form of it. Quantitative characteristics of search behavior in 3 classes of temporal sessions (1-task session, several tasks executed one-by-one, and multitasking session) were compared, and significant differences were revealed.

Palabras clave: Task Detection; Physical Realization; Parallel Task; Sequential Execution; Logical Search.

- Information Extraction | Pp. 131-140

Extracting Structured Subject Information from Digital Document Archives

Jyi-Shane Liu; Ching-Ying Lee

Information extraction (IE) techniques are capable of decoding targeted subject information in documents, and reducing text data into a set of structured core information. The implication for digital libraries is that IE potentially serves as an enabling tool to extend the value of digital document archives. We present an approach, called sandwich extraction pattern, to address the closely coupled template relation tasks. The approach provides interactive capabilities for task specification, domain knowledge acquisition, and output evaluation. This allows users (e.g. librarians) to have direct control on the design of value-added content products and the performance of IE tools. We conducted empirical validation by implementing an IE system, called SEP , and field testing it in a practical document archive. Encouraged by successful test runs, NCCU library has formally initiated a project to develop a value-added content product of government personnel gazettes, including document images, electronic texts, and personnel changes database.

Palabras clave: information extraction; digital document archives; value-added services.

- Information Extraction | Pp. 141-150

Topic Structure Mining Using PageRank Without Hyperlinks

Hiroyuki Toda; Ko Fujimura; Ryoji Kataoka; Hiroyuki Kitagawa

This paper proposes a novel text mining method for any given document set. It is based on PageRank-based centrality scores within the graph structure generated from the similarity of all document pairs. Evaluations using a newspaper collection show that the proposed approach yields much better performance in terms of main topic identification and topical clustering than the baseline method. Furthermore, we show an example of document set visualization that offers novel document browsing through the topic structure. Experiments show that our topic structure mining method is useful for user-oriented document selection.

- Information Extraction | Pp. 151-162

Personalized Information Delivering Service in Blog-Like Digital Libraries

Jason J. Jung

With increasing concerns about the personalized digital libraries (e.g., blogs), people need to share relevant information and knowledge with other like-minded users. In this paper, we aim at building a grid environment for information recommendation, in order to support users’ information searching tasks. By thoroughly analyzing the social linkage and social interaction patterns, we want to extract the meaningful relationships between the unknown users by co-occurrence analysis. Therefore, social grid environment can be constructed by aggregating a set of virtual hubs discovered from the hidden connections. For implementation and evaluation, we exploit the proposed method to blogosphere. The BlogGrid framework is proposed to provide efficient information pushing service to bloggers without requesting any user intervention.

- Personalization for Digital Libraries | Pp. 163-172

A Personal Ontology Model for Library Recommendation System

I-En Liao; Shu-Chuan Liao; Kuo-Fong Kao; Ine-Fei Harn

With the advent of information technology, library services are facing tremendous changes in the form of digitalization. In addition to the digitalization of library resources, personalized systems and recommendation systems are two of highly desirable services among library patrons. This study proposes a novel recommendation system based on analysis of loan records. In our system, we use the traditional cataloging scheme, such as the Library of Congress Classification (LCC), as the reference ontology and build personal ontology by mining interested subjects and relationships among subjects from patron’s borrowing records. The proposed scheme can meet diversified demands of individual patron and provide patrons with a user-friendly interface to help them access needed information.

Palabras clave: personalized service; personal ontology; information filtering; recommendation system.

- Personalization for Digital Libraries | Pp. 173-182