Catálogo de publicaciones - libros
Digital Libraries: Achievements, Challenges and Opportunities: 9th International Conference on Asian Digial Libraries, ICADL 2006, Kyoto, Japan, November 27-30, 2006, Proceedings
Shigeo Sugimoto ; Jane Hunter ; Andreas Rauber ; Atsuyuki Morishima (eds.)
En conferencia: 9º International Conference on Asian Digital Libraries (ICADL) . Kyoto, Japan . November 27, 2006 - November 30, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Information Storage and Retrieval; Database Management; Information Systems Applications (incl. Internet); Multimedia Information Systems; User Interfaces and Human Computer Interaction; Document Preparation and Text Processing
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-49375-4
ISBN electrónico
978-3-540-49377-8
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Cobertura temática
Tabla de contenidos
doi: 10.1007/11931584_11
Parallelising Harvesting
Hussein Suleman
Metadata harvesting has become a common technique to transfer a stream of data from one metadata repository or digital library system to another. As collections of metadata, and their associated digital objects, grow in size, the ingest of these items at the destination archive can take a significant amount of time, depending on the type of indexing or post-processing that is required. This paper discusses an approach to parallelise the post-processing of data in a small cluster of machines or a multi-processor environment, while not increasing the burden on the source data provider. Performance tests have been carried out on varying architectures and the results indicate that this technique is indeed promising for some scenarios and can be extended to more computationally-intensive ingest procedures. In general, the technique presents a new approach for the construction of harvest-based distributed or component-based digital libraries, with better scalability than before.
Palabras clave: Digital Library; Data Provider; Disk Access; Beowulf Cluster; High Computational Load.
- Distributed Repositories | Pp. 81-90
doi: 10.1007/11931584_12
Sibling Page Search by Page Examples
Hiroaki Ohshima; Satoshi Oyama; Katsumi Tanaka
We propose methods of searching Web pages that are “semantically” regarded as “siblings” with respect to given page examples. That is, our approach aims to find pages that are similar in theme but have different content from the given sample pages. We called this “sibling page search”. The proposed search methods are different from conventional content-based similarity search for Web pages. Our approach recommends Web pages whose “conceptual” classification category is the same as that of the given sample pages, but whose content is different from the sample pages. In this sense, our approach will be useful for supporting a user’s opportunistic search, meaning a search in which the user’s interest and intention are not fixed. The proposed methods were implemented by computing the “common” and “unique” feature vectors of the given sample pages, and by comparing those feature vectors with each retrieved page. We evaluated our method for sibling page search, in which our method was applied to test sets consisting of page collections from the Open Directory Project (ODP).
Palabras clave: Feature Vector; Term Frequency; Cosine Similarity; Part Vector; Relevant Page.
- Information Extraction | Pp. 91-100
doi: 10.1007/11931584_13
Contextualization of a RDF Knowledge Base in the VIKEF Project
Heiko Stoermer; Ignazio Palmisano; Domenico Redavid; Luigi Iannone; Paolo Bouquet; Giovanni Semeraro
Due to the simplicity of RDF data model and semantics, complex application scenarios in which RDF is used to represent the application data model raise important design issues. Modelling e.g. the temporary evolution, relevance, trust and provenance in Knowledge Bases require more than just a set of universally true statements, without any reference to a situation, a point in time, or generally a context. Our proposed solution is to use the notion of context to separate statements that refer to different contextual information, which could so far not explicitly be tied to the statements. In this paper we describe a practical solution to this problem, which has been implemented in the VIKEF project, which deals with making explicit and intelligently useable information contained in vast collections of documents, databases and metadata repositories.
Palabras clave: Resource Description Framework; SPARQL Query; Metadata Repository; Important Design Issue; SPARQL Engine.
- Information Extraction | Pp. 101-110
doi: 10.1007/11931584_14
Visualizing User Communities and Usage Trends of Digital Libraries Based on User Tracking Information
Seonho Kim; Subodh Lele; Sreeram Ramalingam; Edward A. Fox
We describe VUDM, our Visual User-model Data Mining tool, and its application to data logged regarding interactions of 1,200 users of the Networked Digital Library of Theses and Dissertations (NDLTD). The goals of VUDM are to visualize social networks, patrons’ distributions, and usage trends of NDLTD. The distinctive approach of this research is that we focus on analysis and visualization of users’ implicit rating data, which was generated based on user tracking information, such as sending queries and browsing result sets – rather than focusing on explicit data obtained from a user survey, such as major, specialties, years of experience, and demographics. The VUDM interface uses spirals to portray virtual interest groups, positioned based on inter-group relationships. VUDM facilitates identifying trends related to changes in interest, as well as concept drift. A formative evaluation found that VUDM is perceived to be effective for five types of tasks. Future work will aim to improve the understandability and utility of VUDM.
Palabras clave: Digital Library; Collaborative Filter; Concept Drift; Information Visualization; Explicit Data.
- Information Extraction | Pp. 111-120
doi: 10.1007/11931584_15
Extracting Mnemonic Names of People from the Web
Tomoko Hokama; Hiroyuki Kitagawa
The web has gained much attention as new media reflecting real-time interest in the world. This attention is driven by the proliferation of tools like bulletin boards and weblogs. The web is a source from which we can collect and summarize information about a particular object (e.g., business organization, product, person, etc.) For example, the extraction of reputation information is a major research topic in information extraction and knowledge extraction from the web. The ability to collect web pages about a particular object is essential in obtaining such information and extracting knowledge from it. A big problem in the web page collection process is that the same objects are referred to in different ways in different web documents. For example, a person may be referred to by full name, first name, affiliation and title, or nicknames. This paper proposes a method for extracting these mnemonic names of people from the web and shows experimental results using real web data.
Palabras clave: knowledge extraction; object identification; web mining.
- Information Extraction | Pp. 121-130
doi: 10.1007/11931584_16
Automatic Task Detection in the Web Logs and Analysis of Multitasking
Nikolai Buzikashvili
In this paper, we describe the conceptual basis and results of the Web search task detection study with emphasis on multitasking. The basis includes: logical structure of a search process, a space of physical realizations, mapping of a logical structure into the space of realizations. Questions on the users’ manners of search realization are formulated, with emphasis on multiple tasks execution. An automatic analysis of the Web logs shows that multitasking is rare, usually it includes only two task sessions and is formed into a temporal inclusion of an interrupting task session into the interrupted one. Searchers follow the principle of least effort and select the cheapest tactics: sequential tasks execution as a rule or, in the rare case of multitasking, the least expensive form of it. Quantitative characteristics of search behavior in 3 classes of temporal sessions (1-task session, several tasks executed one-by-one, and multitasking session) were compared, and significant differences were revealed.
Palabras clave: Task Detection; Physical Realization; Parallel Task; Sequential Execution; Logical Search.
- Information Extraction | Pp. 131-140
doi: 10.1007/11931584_17
Extracting Structured Subject Information from Digital Document Archives
Jyi-Shane Liu; Ching-Ying Lee
Information extraction (IE) techniques are capable of decoding targeted subject information in documents, and reducing text data into a set of structured core information. The implication for digital libraries is that IE potentially serves as an enabling tool to extend the value of digital document archives. We present an approach, called sandwich extraction pattern, to address the closely coupled template relation tasks. The approach provides interactive capabilities for task specification, domain knowledge acquisition, and output evaluation. This allows users (e.g. librarians) to have direct control on the design of value-added content products and the performance of IE tools. We conducted empirical validation by implementing an IE system, called SEP , and field testing it in a practical document archive. Encouraged by successful test runs, NCCU library has formally initiated a project to develop a value-added content product of government personnel gazettes, including document images, electronic texts, and personnel changes database.
Palabras clave: information extraction; digital document archives; value-added services.
- Information Extraction | Pp. 141-150
doi: 10.1007/11931584_18
Topic Structure Mining Using PageRank Without Hyperlinks
Hiroyuki Toda; Ko Fujimura; Ryoji Kataoka; Hiroyuki Kitagawa
This paper proposes a novel text mining method for any given document set. It is based on PageRank-based centrality scores within the graph structure generated from the similarity of all document pairs. Evaluations using a newspaper collection show that the proposed approach yields much better performance in terms of main topic identification and topical clustering than the baseline method. Furthermore, we show an example of document set visualization that offers novel document browsing through the topic structure. Experiments show that our topic structure mining method is useful for user-oriented document selection.
- Information Extraction | Pp. 151-162
doi: 10.1007/11931584_19
Personalized Information Delivering Service in Blog-Like Digital Libraries
Jason J. Jung
With increasing concerns about the personalized digital libraries (e.g., blogs), people need to share relevant information and knowledge with other like-minded users. In this paper, we aim at building a grid environment for information recommendation, in order to support users’ information searching tasks. By thoroughly analyzing the social linkage and social interaction patterns, we want to extract the meaningful relationships between the unknown users by co-occurrence analysis. Therefore, social grid environment can be constructed by aggregating a set of virtual hubs discovered from the hidden connections. For implementation and evaluation, we exploit the proposed method to blogosphere. The BlogGrid framework is proposed to provide efficient information pushing service to bloggers without requesting any user intervention.
- Personalization for Digital Libraries | Pp. 163-172
doi: 10.1007/11931584_20
A Personal Ontology Model for Library Recommendation System
I-En Liao; Shu-Chuan Liao; Kuo-Fong Kao; Ine-Fei Harn
With the advent of information technology, library services are facing tremendous changes in the form of digitalization. In addition to the digitalization of library resources, personalized systems and recommendation systems are two of highly desirable services among library patrons. This study proposes a novel recommendation system based on analysis of loan records. In our system, we use the traditional cataloging scheme, such as the Library of Congress Classification (LCC), as the reference ontology and build personal ontology by mining interested subjects and relationships among subjects from patron’s borrowing records. The proposed scheme can meet diversified demands of individual patron and provide patrons with a user-friendly interface to help them access needed information.
Palabras clave: personalized service; personal ontology; information filtering; recommendation system.
- Personalization for Digital Libraries | Pp. 173-182