Catálogo de publicaciones - libros

Compartir en
redes sociales


Information Retrieval Technology: Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005, Proceedings

Gary Geunbae Lee ; Akio Yamada ; Helen Meng ; Sung Hyon Myaeng (eds.)

En conferencia: 2º Asia Information Retrieval Symposium (AIRS) . Jeju Island, South Korea . October 13, 2005 - October 15, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Library Science; Theory of Computation; Information Systems Applications (incl. Internet); Algorithm Analysis and Problem Complexity; Data Structures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29186-2

ISBN electrónico

978-3-540-32001-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Query Transitive Translation Using IR Score for Indonesian-Japanese CLIR

Ayu Purwarianti; Masatoshi Tsuchiya; Seiichi Nakagawa

We combined the mutual information score and TF × IDF score (IR score) in order to select the best keyword translation in our transitive translation. The transitive translation used bilingual dictionaries to translate Indonesian query into Japanese keywords. The Japanese keywords are then used as the input to retrieve Japanese documents. The keyword selection is done in two steps. The first step is to sort translation candidates according to their mutual information scores calculated from a monolingual target language corpus. The second step is to select the best candidate set among 5 top mutual information scores based on their TF × IDF scores. The experiment against NTCIR-3 Web Retrieval Task data shows that the keyword selection based on this combination achieved higher IR score than a direct translation method using original Indonesian-Japanese dictionary and also higher than the machine translation result using Kataku (Indonesian-English) and Babelfish (English-Japanese) engines.

- Poster and Demo Session 1 | Pp. 565-570

Development of a Meta Product Search Engine with Web Services

Wooju Kim; Daewoo Choi; Jongmyong Kim; Bonggyun Jin

The research goal of this paper is to develop an advanced product search agent framework where personalized agents can meet consumer’s information needs more effectively and accurately based on the Web Services, Semantic Web technologies and AI techniques. These days, one of the major bottlenecks in E-commerce is that it is not easy for consumers to find the relevant information about the products they want. Such a situation is caused mainly by inaccurate representation of consumer’s search intent, and absence of appropriate product information filtering and retrieval mechanism. To resolve these problems, we developed an ontology-based personalized product search query representation methodology, an information extracting methodology specialized for semantic web-based product information, and a multi-attribute-based product scoring methodology. Furthermore, we implemented the proposed methodologies as a prototype system and validated its performance by connecting our system to the well-known Amazon.com and Buy.com.

- Poster and Demo Session 1 | Pp. 571-576

An Automatic Code Classification System by Using Memory-Based Learning and Information Retrieval Technique

Heui Seok Lim; Won Kyu Hoon Lee; Hyeon Chul Kim; Soon Young Jeong; Heon Chang Yu

This paper proposes an automatic code classification for Korean census data by using information retrieval technique and memoory-based learning technique. The purpose of the proposed system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. The system was trained by memory based learning and experimented with 46,762 industry records and occupation 36,286 records. It was evaluated by using 10-fold cross-validation method. As experimental results, the proposed system showed 99.10% and 92.88% production rates for level 2 and level 5 codes respectively.

- Poster and Demo Session 1 | Pp. 577-582

Finding New News: Novelty Detection in Broadcast News

Georgina Gaughan; Alan F. Smeaton

The automatic detection of novelty, or newness, as part of an information retrieval system would greatly improve a searcher’s experience by presenting “documents” in order of how much extra information they add to what is already known instead of how similar they are to a user’s query. In this paper we present a novelty detection system evaluated on the AQUAINT text collection as part of our TREC 2004 Novelty Track experiments. Subsequent to participation in TREC, the algorithm has been evaluated on another collection with its parameters optimized and we present those results here. We also discuss how we are extending the text-only approach to novelty detection to also include input from video analysis.

- Poster and Demo Session 1 | Pp. 583-588

Named Entity Tagging for Korean Using DL-CoTrain Algorithm

Byung-Kwan Kwak; Jeong-Won Cha

Our approach to solve the problem of Korean named entity classification adopted a co-training method called DL-CoTrain. We use only a part-of-speech tagger and a simple noun phrase chunker instead of a full parser to extract the contextual features of a named entity. We will discuss the linguistic features in Korean which are valuable for named entity classification and experimentally show how large a labeled corpus and which unlabeled corpus is necessary for the better performance and portability of a named entity classifier. With only about a quarter of the labeled corpus, our method can compete with its supervised counterpart.

- Poster and Demo Session 1 | Pp. 589-594

Semantic Categorization of Contextual Features Based on Wordnet for G-to-P Conversion of Arabic Numerals Combined with Homographic Classifiers

Youngim Jung; Aesun Yoon; Hyuk-Chul Kwon

Arabic numerals show a high occurrence-frequency and deliver significant senses, especially in scientific or informative texts. The problem, how to convert Arabic numerals to phonemes with ambiguous classifiers in Korean, is not easily resolved. In this paper, the ambiguities of Arabic numerals combined with homographic classifiers are analyzed and the resolutions for their sense disambiguation based on KorLex (ean ico-Semantic Network) are proposed. Words proceeding or following the Arabic Numerals are categorized into 54 semantic classes based on the lexical hierarchy in KorLex 1.0. The semantic classes are trained to classify the meaning and the reading of Arabic Numerals using a decision tree. The proposed model shows 87.3% accuracy which is 14.1% higher than the baseline.

- Poster and Demo Session 1 | Pp. 595-600

Robust Matching Method for Scale and Rotation Invariant Local Descriptors and Its Application to Image Indexing

Kengo Terasawa; Takeshi Nagasaki; Toshio Kawashima

Interest point matching is widely used for image indexing. In this paper we introduce a new distance measure between two local descriptors instead of conventional Mahalanobis distance to improve matching accuracy. From experiments with synthetic images we show that the error distribution of local jet is gaussian but the distribution of the descriptors derived from local jet is not gaussian. Based on the observation, we design a new distance measure between two local descriptors and improve accuracy of point matching. We also reduce the number of candidate points and reduce the computational cost by taking into account the characteristic scale ratio. Experimental results confirm the validity of our method.

- Poster and Demo Session 2 | Pp. 601-615

Image Feedback Retrieval Based on Vector Space Model Transformation

Luo Xin; Shiro Ajioka; Masami Shishibori; Kenji Kita

In recent years, the employment of user feedback information to improve the image retrieval precision has become a hot subject in the research field. But in traditional relevance feedback methods, both relevant and irrelevant user assigned information was required for the retrieval system. For the sake of practicality and convenience, the present paper advances that users only need to choose their inquired image files, which generate a new index vector as relevant information. Through the feature vector space transformation, the index is moved towards the user’s inquiry intention. Meanwhile, the analysis of the user’s inquiry intention together with relevant forecast of index target in the database make it possible for the less similar vectors to get closer to the demanding vectors and thus increasing index precision. In this paper, a prototype system is introduced of image database and experimental illustration to 51138 image files. Compared with the traditional relevance feedback technique, the suggested method is shown to obviously improve the retrieval function.

- Poster and Demo Session 2 | Pp. 616-625

Indexing Structures for Content-Based Retrieval of Large Image Databases: A Review

He Ling; Wu Lingda; Cai Yichao; Liu Yuchi

Content-based image retrieval is a focused problem in current multimedia domain. To obtain better searching results more efficiently in some applications, a proper indexing structure is indispensable. This paper reviews the typical indexing structures in content-based image retrieval at first. Then based on the comparison of their different performance, the paper uncovers the problems in those structures and points out the development direction to improve the performance of CBIR in the future.

- Poster and Demo Session 2 | Pp. 626-634

Document Similarity Search Based on Generic Summaries

Xiaojun Wan; Jianwu Yang

Document similarity search is to find documents similar to a query document in a text corpus and return a ranked list of documents to users, which is widely used in recommender systems in library or web applications. The popular approach to similarity search is to calculate the similarities between the query document and documents in the corpus and then rank the documents. In this paper, we investigate the use of document summarization techniques to improve the effectiveness of document similarity search. In the proposed summary-based approach, the query document is summarized and similarity searches are performed with the new query of the produced summary instead of the original document. Different retrieval models and different summarization methods are investigated in the experiments. Experimental results demonstrate the higher effectiveness of the summary-based similarity search.

- Poster and Demo Session 2 | Pp. 635-640