Catálogo de publicaciones - libros

Compartir en
redes sociales


Information Retrieval Technology: Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005, Proceedings

Gary Geunbae Lee ; Akio Yamada ; Helen Meng ; Sung Hyon Myaeng (eds.)

En conferencia: 2º Asia Information Retrieval Symposium (AIRS) . Jeju Island, South Korea . October 13, 2005 - October 15, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Library Science; Theory of Computation; Information Systems Applications (incl. Internet); Algorithm Analysis and Problem Complexity; Data Structures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29186-2

ISBN electrónico

978-3-540-32001-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

A Rough Set-Based Fuzzy Clustering

Zhao Yaqin; Zhou Xianzhong; Tang Guizhong

This paper presents a rough set-based fuzzy clustering algorithm in which the objects of fuzzy clustering are initial clusters obtained in terms of equivalence relations. Initial clustering is performed directly by judging whether equivalence relations are equal, not computing the intersection of equivalence classes as usual, and the correctness of the theory is proved using rough set theory. Excessive generation of some small classes is suppressed by secondary clustering on the basis of defining fuzzy similarity between two initial clusters. Consequently the dimension of fuzzy similarity matrix is reduced. The definition of integrated approximation precision is given as evaluation of clustering validity. The algorithm can dynamically adjust parameter to get the optimal result. Some experiments were performed to validate this method. The results showed that the algorithm could handle preferably the clustering problems of both numerical data and nominal data.

- Session 5A: TDT/Clustering | Pp. 401-409

Effective Use of Place Information for Event Tracking

Yun Jin; Sung Hyon Myaeng; Mann-Ho Lee; Hyo-Jung Oh; Myung-Gil Jang

The main purpose of topic detection and tracking (TDT) is to detect, group, and organize newspaper articles reporting on the same event. Since an event is a reported occurrence at a specific time and place, and the unavoidable consequences, it is conceivable that place information in a news article plays an important role in TDT. We analyzed news articles for their characteristics of place information and devised a new topic tracking method incorporating the analysis results. Experiments show that appropriate use of place information indeed helps identifying news articles reporting on the same events.

- Session 5A: TDT/Clustering | Pp. 410-422

A Classifier Design Based on Combining Multiple Components by Maximum Entropy Principle

Akinori Fujino; Naonori Ueda; Kazumi Saito

Designing high performance classifiers for structured data consisting of multiple components is an important and challenging research issue in the field of machine learning. Although the main component of structured data plays an important role when designing classifiers, additional components may contain beneficial information for classification. This paper focuses on a probabilistic classifier design for multiclass classification based on the combination of main and additional components. Our formulation separately considers component generative models and constructs the classifier by combining these trained models based on the maximum entropy principle. We use naive Bayes models as the component generative models for text and link components so that we can apply our classifier design to document and web page classification problems. Our experimental results for three test collections confirmed that the proposed method effectively combined the main and additional components to improve classification performance.

- Session 5B: Multimedia/Classification | Pp. 423-438

A Query-by-Singing Technique for Retrieving Polyphonic Objects of Popular Music

Hung-Ming Yu; Wei-Ho Tsai; Hsin-Min Wang

This paper investigates the problem of retrieving popular music by singing. In contrast to the retrieval of MIDI music, which is easy to acquire the main melody by the selection of the symbolic tracks, retrieving polyphonic objects in CD or MP3 format requires to extract the main melody directly from the accompanied singing signals, which proves difficult to handle well simply using the conventional pitch estimation. To reduce the interference of background accompaniments during the main melody extraction, methods are proposed to estimate the underlying sung notes in a music recording by taking into account the characteristic structure of popular song. In addition, to accommodate users’ unprofessional or personal singing styles, methods are proposed to handle the inaccuracies of tempo, pause, transposition, or off-key, etc., inevitably existing in queries. The proposed system has been evaluated on a music database consisting of 2613 phrases extracted manually from 100 Mandarin pop songs. The experimental results indicate the feasibility of retrieving pop songs by singing.

- Session 5B: Multimedia/Classification | Pp. 439-453

Integrating Textual and Visual Information for Cross-Language Image Retrieval

Wen-Cheng Lin; Yih-Chen Chang; Hsin-Hsi Chen

This paper explores the integration of textual and visual information for cross-language image retrieval. An approach which automatically transforms textual queries into visual representations is proposed. The relationships between text and images are mined. We employ the mined relationships to construct visual queries from textual ones. The retrieval results of textual and visual queries are combined. We conduct English monolingual and Chinese-English cross-language retrieval experiments to evaluate the proposed approach. The selection of suitable textual query terms to construct visual queries is the major concern. Experimental results show that the proposed approach improves retrieval performance, and nouns are appropriate to generate visual queries.

- Session 5B: Multimedia/Classification | Pp. 454-466

Practical Application of Associative Classifier for Document Classification

Yongwook Yoon; Gary Geunbae Lee

In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. The associative classifier has favorable characteristics, rapid training, good classification accuracy, and excellent interpretation. However, the associative classifier has some obstacles to overcome when it is applied in the area of text classification. First of all, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction for a new document ineffective. We resolve this by pruning the rules according to their contribution to correct classifications. In addition, since the target text collection generally has a high dimension, the training process might take a very long time. We propose mutual information between the word and class variables as a feature selection measure to reduce the space dimension. Experimental classification results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting.

- Session 5B: Multimedia/Classification | Pp. 467-478

A Method for Query Expansion Using a Hierarchy of Clusters

Masaki Aono; Hironori Doi

We will present a new algorithm for improving the retrieval performance using query expansion, based on a hierarchy of clusters. In order to create this hierarchical data structure, a clustering algorithm is executed multiple times with different initial conditions. With the aid of this hierarchical data structure, we have achieved significant improvement in retrieval performance over previously known methods in terms of both recall and precision. In our experiments with Japanese patent data, we have employed a co-clustering algorithm as a clustering method.

- Poster and Demo Session 1 | Pp. 479-484

Chinese Question Classification from Approach and Semantic Views

Youzheng Wu; Jun Zhao; Bo Xu

This paper presents a new Chinese question taxonomy respectively from approach and semantic viewpoints, and a SVM classification algorithm based on multiple features and hybrid feature weighting. The experimental results show that: (1) Lexical semantic features and structural features are the guarantee of high performance of question classification; (2) The contribution of dependency relation extracted from our current parser is no better than that of Bi-gram. (3) Our proposed feature weighting is effective for question classification.

- Poster and Demo Session 1 | Pp. 485-490

GJM-2: A Special Case of General Jelinek-Mercer Smoothing Method for Language Modeling Approach to Ad Hoc IR

Guodong Ding; Bin Wang

The language modeling approach to IR is attractive and promising because it connects the problem of retrieval with that of language model estimation. A core technique for language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper we propose a General Jelinek-Mercer method (GJM) by using a document-dependent mixture coefficient to control the influence of maximum likelihood model and the collection model. Utilizing the number of unique terms in the document to improve the accuracy of language model estimation, we further develop GJM-2 smoothing method as a special case of GJM. Experimental results show that using GJM-2 for the language modeling approach can achieve better retrieval performances than the existing three popular methods both on short and long queries.

- Poster and Demo Session 1 | Pp. 491-496

The Empirical Impact of the Nature of Novelty Detection

Le Zhao; Min Zhang; Shaoping Ma

Novelty detection systems aim at reducing redundant documents or sentences from a list of documents chronologically ordered. In the task, sentences appearing later in the list with no new meanings are eliminated. In an accompanying paper, the nature of novelty detection was revealed – Novelty as a combination of the PO (partial overlap) and CO (complete overlap) relations, which can be treated as two classification tasks; theoretical impacts were given. This paper provides what the nature of the task mean empirically. One new method – selected pool – implementing the nature of the task gained improvements on TREC Novelty datasets. New evaluation criteria are given, which are natural from the viewpoint of the nature of novelty detection.

- Poster and Demo Session 1 | Pp. 497-502