Catálogo de publicaciones - libros
Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers: 10th International Conference on Asian Digital Libraries, ICADL 2007, Hanoi, Vietnam, December 10-13, 2007. Proceedings
Dion Hoe-Lian Goh ; Tru Hoang Cao ; Ingeborg Torvik Sølvberg ; Edie Rasmussen (eds.)
En conferencia: 10º International Conference on Asian Digital Libraries (ICADL) . Hanoi, Vietnam . December 10, 2007 - December 13, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Data Mining and Knowledge Discovery; Database Management; Information Systems Applications (incl. Internet); Multimedia Information Systems; User Interfaces and Human Computer Interaction; Document Preparation and Text Processing
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-77093-0
ISBN electrónico
978-3-540-77094-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
On Building a Full-Text Digital Library of Historical Documents
Szu-Pei Chen; Jieh Hsiang; Hsieh-Chang Tu; Micha Wu
The National Taiwan University Library has built a digital library of historical documents about Taiwan. The content is unique in that it covers about 80% of all primary Chinese historical materials about Taiwan before 1895, and that they are all available in searchable full text, in addition to metadata. To make these materials more accessible to the research community, we have developed, in addition to full-text search and retrieval, a concept of regarding the set of documents retrieved by a query as a sub-collection, and have designed post-query classification methods to help users find the inter-relationships among documents and the collective meaning of a sub-collection. We have also developed techniques for term extraction for old Chinese and a data format for representing governmental structures. We hope that our system will help advance research in Taiwanese history, and will set a model for other similar endeavor.
- Digital Archives | Pp. 49-60
Towards a Digital Archive for Handwritten Paper Slips with Ethnological Contents
A. C. Schering; I. Bruder; C. Schmitt; H. Meyer; A. Heuer
Contemporary digital libraries and archives of ethnological information focus mainly on document based storage and access methods for their data. However, our archive is designed to manage smallest pieces of information and can enable ethnologists not only to easily store and access their material, but also to derive new knowledge by combining existing data. In this paper, we present the first steps in building a digital archive for paper slips with ethnological contents from the 19th and the beginning of the 20th century. Along with the architectural and accessibility aspects of the system, we describe enhancements for efficient retrieval and for supporting modifications to access structures.
- Digital Archives | Pp. 61-64
Automatic Classification of Web Search Results: Product Review vs. Non-review Documents
Tun Thura Thet; Jin-Cheon Na; Christopher S. G. Khoo
This study seeks to develop an automatic method to identify product review documents on the Web using the snippets (summary information that includes the URL, title, and summary text) returned by the Web search engine. The aim is to allow the user to extend topical search with genre-based filtering or categorization. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of the snippets are useful for classification. The best results were obtained using just the title and URL (domain and folder names) of the snippets as phrase terms (n-grams). Then we developed a heuristic approach that utilizes domain knowledge constructed semi-automatically, and found that it performs comparatively well, with only a small drop in accuracy rates. A hybrid approach which combines both the machine learning and heuristic approaches performs slightly better than the machine learning approach alone.
- Information Retrieval Techniques I | Pp. 65-74
An Effective Algorithm for Dimensional Reduction in Collaborative Filtering
Fengrong Gao; Chunxiao Xing; Yong Zhao
It is necessary to provide personalized information service for users through the enormous volume of information on the web. Collaborative filtering is the most successful recommender system technology to date and is used in many domains. Unfortunately collaborative filtering is limited by the high dimensionality and sparsity of user-item rating matrix. In this paper, we propose a new method for applying semantic classification to collaborative filtering. Experimental results show the high efficiency and performance of our approach, compared with tradition collaborative filtering algorithm and collaborative filtering using K-means clustering algorithm.
- Information Retrieval Techniques I | Pp. 75-84
Modeling and Learning User Profiles for Personalized Content Service
Heung-Nam Kim; Inay Ha; Seung-Hoon Lee; Geun-Sik Jo
With the spread of the digital library and the web, users can obtain a wide variety of information, and also can access novel content. In this environment, finding useful information from a huge amount of available content becomes a time consuming process. In this paper, we focus on user modeling for personalization to recommend content relevant to user interests. We exploit the data mining techniques for identifying useful and meaningful patterns of users. Each user model, collectively called PTP (Personalized Term Pattern), is represented as both interest patterns and disinterest patterns. We present empirical experiments using datasets to demonstrate our approach and evaluate performance compared with existing methods.
- Information Retrieval Techniques I | Pp. 85-94
Ontology-Based Fuzzy Retrieval for Digital Library
Tho Thanh Quan; Siu Cheung Hui; Tru Hoang Cao
With the recent advancement of the Semantic Web, researchers are now considering developing ontology-based digital librarires for the sake of efficient information sharing, exchanging and retrieval. In addition, fuzzy queries have been also introduced to help readers to specify their queries more precisely when searching information in digital librarires. In this paper, we first propose an architecture that enables multiple digital libraries to collaborate in the Semantic Web environment. Then we discuss using fuzzy ontology to represent uncertain information in digital libraries and fuzzy queries for retrieving information from fuzzy ontology. An illustrative system is then developed for experiment purpose. Performance of our system is also evaluated and analyzed.
- Information Retrieval Techniques I | Pp. 95-98
Feature Reinforcement Approach to Poly-lingual Text Categorization
Chih-Ping Wei; Huihua Shi; Christopher C. Yang
With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically accessible online. Poly-lingual text categorization (PLTC) refers to the automatic learning of a text categorization model(s) from a set of preclassified training documents written in different languages and the subsequent assignment of unclassified poly-lingual documents to predefined categories on the basis of the induced text categorization model(s). Although PLTC can be approached as multiple independent monolingual text categorization problems, this naïve approach employs only the training documents of the same language to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. In this study, we propose a feature reinforcement approach to PLTC that takes into account the training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as performance benchmarks, our empirical evaluation results show that the proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora.
- Multilingual Techniques | Pp. 99-108
Development of Prototype Morphological Analyzer for he South Indian Language of Kannada
T. N. Vikram; Shalini R. Urs
A prototype morphological analyzer for the south Indian language of Kannada is presented in this work. The analyzer is based on Finite state machines and can handle 500 distinct Noun and Verb stems of Kannada. The morphological analyzer can simultaneously serve as a stemmer, part of speech tagger and spell checker and hence it becomes a very efficient tool for content management.
- Multilingual Techniques | Pp. 109-116
Semantic Similarity Measures for Malay Sentences
Shahrul Azman Noah; Amru Yusrin Amruddin; Nazlia Omar
The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.
- Multilingual Techniques | Pp. 117-126
Enabling Resource Selection Based on Written English and Intellectual Competencies
Ayako Morozumi; Liddy Nevile; Shigeo Sugimoto
A growing number of people are using the Web to access English-language resources, among other things. In Asian countries, for example, many people want access to English texts. Many Asians are not as competent reading English as they may be in the intellectual content of their domain. The problem of accessibility to English texts is significant simply because of the number of people involved. The problems for second language English readers are similar to those for many dyslexic first language readers. We propose a descriptive model that supports adaptability of texts for the benefit of such people based on FRBR and AccessForAll standards.
- Multilingual Techniques | Pp. 127-130