Catálogo de publicaciones - libros

Compartir en
redes sociales


Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers: 10th International Conference on Asian Digital Libraries, ICADL 2007, Hanoi, Vietnam, December 10-13, 2007. Proceedings

Dion Hoe-Lian Goh ; Tru Hoang Cao ; Ingeborg Torvik Sølvberg ; Edie Rasmussen (eds.)

En conferencia: 10º International Conference on Asian Digital Libraries (ICADL) . Hanoi, Vietnam . December 10, 2007 - December 13, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Data Mining and Knowledge Discovery; Database Management; Information Systems Applications (incl. Internet); Multimedia Information Systems; User Interfaces and Human Computer Interaction; Document Preparation and Text Processing

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-77093-0

ISBN electrónico

978-3-540-77094-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

On Building a Full-Text Digital Library of Historical Documents

Szu-Pei Chen; Jieh Hsiang; Hsieh-Chang Tu; Micha Wu

The National Taiwan University Library has built a digital library of historical documents about Taiwan. The content is unique in that it covers about 80% of all primary Chinese historical materials about Taiwan before 1895, and that they are all available in searchable full text, in addition to metadata. To make these materials more accessible to the research community, we have developed, in addition to full-text search and retrieval, a concept of regarding the set of documents retrieved by a query as a sub-collection, and have designed post-query classification methods to help users find the inter-relationships among documents and the collective meaning of a sub-collection. We have also developed techniques for term extraction for old Chinese and a data format for representing governmental structures. We hope that our system will help advance research in Taiwanese history, and will set a model for other similar endeavor.

- Digital Archives | Pp. 49-60

Towards a Digital Archive for Handwritten Paper Slips with Ethnological Contents

A. C. Schering; I. Bruder; C. Schmitt; H. Meyer; A. Heuer

Contemporary digital libraries and archives of ethnological information focus mainly on document based storage and access methods for their data. However, our archive is designed to manage smallest pieces of information and can enable ethnologists not only to easily store and access their material, but also to derive new knowledge by combining existing data. In this paper, we present the first steps in building a digital archive for paper slips with ethnological contents from the 19th and the beginning of the 20th century. Along with the architectural and accessibility aspects of the system, we describe enhancements for efficient retrieval and for supporting modifications to access structures.

- Digital Archives | Pp. 61-64

Automatic Classification of Web Search Results: Product Review vs. Non-review Documents

Tun Thura Thet; Jin-Cheon Na; Christopher S. G. Khoo

This study seeks to develop an automatic method to identify product review documents on the Web using the snippets (summary information that includes the URL, title, and summary text) returned by the Web search engine. The aim is to allow the user to extend topical search with genre-based filtering or categorization. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of the snippets are useful for classification. The best results were obtained using just the title and URL (domain and folder names) of the snippets as phrase terms (n-grams). Then we developed a heuristic approach that utilizes domain knowledge constructed semi-automatically, and found that it performs comparatively well, with only a small drop in accuracy rates. A hybrid approach which combines both the machine learning and heuristic approaches performs slightly better than the machine learning approach alone.

- Information Retrieval Techniques I | Pp. 65-74

An Effective Algorithm for Dimensional Reduction in Collaborative Filtering

Fengrong Gao; Chunxiao Xing; Yong Zhao

It is necessary to provide personalized information service for users through the enormous volume of information on the web. Collaborative filtering is the most successful recommender system technology to date and is used in many domains. Unfortunately collaborative filtering is limited by the high dimensionality and sparsity of user-item rating matrix. In this paper, we propose a new method for applying semantic classification to collaborative filtering. Experimental results show the high efficiency and performance of our approach, compared with tradition collaborative filtering algorithm and collaborative filtering using K-means clustering algorithm.

- Information Retrieval Techniques I | Pp. 75-84

Modeling and Learning User Profiles for Personalized Content Service

Heung-Nam Kim; Inay Ha; Seung-Hoon Lee; Geun-Sik Jo

With the spread of the digital library and the web, users can obtain a wide variety of information, and also can access novel content. In this environment, finding useful information from a huge amount of available content becomes a time consuming process. In this paper, we focus on user modeling for personalization to recommend content relevant to user interests. We exploit the data mining techniques for identifying useful and meaningful patterns of users. Each user model, collectively called PTP (Personalized Term Pattern), is represented as both interest patterns and disinterest patterns. We present empirical experiments using datasets to demonstrate our approach and evaluate performance compared with existing methods.

- Information Retrieval Techniques I | Pp. 85-94

Ontology-Based Fuzzy Retrieval for Digital Library

Tho Thanh Quan; Siu Cheung Hui; Tru Hoang Cao

With the recent advancement of the Semantic Web, researchers are now considering developing ontology-based digital librarires for the sake of efficient information sharing, exchanging and retrieval. In addition, fuzzy queries have been also introduced to help readers to specify their queries more precisely when searching information in digital librarires. In this paper, we first propose an architecture that enables multiple digital libraries to collaborate in the Semantic Web environment. Then we discuss using fuzzy ontology to represent uncertain information in digital libraries and fuzzy queries for retrieving information from fuzzy ontology. An illustrative system is then developed for experiment purpose. Performance of our system is also evaluated and analyzed.

- Information Retrieval Techniques I | Pp. 95-98

Feature Reinforcement Approach to Poly-lingual Text Categorization

Chih-Ping Wei; Huihua Shi; Christopher C. Yang

With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically accessible online. Poly-lingual text categorization (PLTC) refers to the automatic learning of a text categorization model(s) from a set of preclassified training documents written in different languages and the subsequent assignment of unclassified poly-lingual documents to predefined categories on the basis of the induced text categorization model(s). Although PLTC can be approached as multiple independent monolingual text categorization problems, this naïve approach employs only the training documents of the same language to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. In this study, we propose a feature reinforcement approach to PLTC that takes into account the training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as performance benchmarks, our empirical evaluation results show that the proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora.

- Multilingual Techniques | Pp. 99-108

Development of Prototype Morphological Analyzer for he South Indian Language of Kannada

T. N. Vikram; Shalini R. Urs

A prototype morphological analyzer for the south Indian language of Kannada is presented in this work. The analyzer is based on Finite state machines and can handle 500 distinct Noun and Verb stems of Kannada. The morphological analyzer can simultaneously serve as a stemmer, part of speech tagger and spell checker and hence it becomes a very efficient tool for content management.

- Multilingual Techniques | Pp. 109-116

Semantic Similarity Measures for Malay Sentences

Shahrul Azman Noah; Amru Yusrin Amruddin; Nazlia Omar

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.

- Multilingual Techniques | Pp. 117-126

Enabling Resource Selection Based on Written English and Intellectual Competencies

Ayako Morozumi; Liddy Nevile; Shigeo Sugimoto

A growing number of people are using the Web to access English-language resources, among other things. In Asian countries, for example, many people want access to English texts. Many Asians are not as competent reading English as they may be in the intellectual content of their domain. The problem of accessibility to English texts is significant simply because of the number of people involved. The problems for second language English readers are similar to those for many dyslexic first language readers. We propose a descriptive model that supports adaptability of texts for the benefit of such people based on FRBR and AccessForAll standards.

- Multilingual Techniques | Pp. 127-130