Catálogo de publicaciones - libros

Compartir en
redes sociales


Information Retrieval Technology: Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005, Proceedings

Gary Geunbae Lee ; Akio Yamada ; Helen Meng ; Sung Hyon Myaeng (eds.)

En conferencia: 2º Asia Information Retrieval Symposium (AIRS) . Jeju Island, South Korea . October 13, 2005 - October 15, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Library Science; Theory of Computation; Information Systems Applications (incl. Internet); Algorithm Analysis and Problem Complexity; Data Structures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29186-2

ISBN electrónico

978-3-540-32001-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

The Reliability of Metrics Based on Graded Relevance

Tetsuya Sakai

This paper compares 14 metrics designed for information retrieval evaluation with graded relevance, together with 10 traditional metrics based on binary relevance, in terms of reliability and resemblance of system rankings. More specifically, we use two test collections with submitted runs from the Chinese IR and English IR tasks in the NTCIR-3 CLIR track to examine the metrics using methods proposed by Buckley/Voorhees and Voorhees/Buckley as well as Kendall’s rank correlation. Our results show that AnDCG and nDCG ((Average) Normalised Discounted Cumulative Gain at Document cut-off ) are good metrics, provided that is large. However, if one wants to avoid the parameter altogether, or if one requires a metric that closely resembles TREC Average Precision, then Q-measure appears to be the best choice.

- Session 1A: Relevance/Retrieval Models | Pp. 1-16

Improving Weak Ad-Hoc Retrieval by Web Assistance and Data Fusion

Kui-Lam Kwok; Laszlo Grunfeld; Peter Deng

Users experience frustration when their reasonable queries retrieve no relevant documents. We call these weak queries and retrievals. Improving their effectiveness is an important issue in ad-hoc retrieval and will be most rewarding for these users. We offer an explanation (with experimental support) why data fusion of sufficiently different retrieval lists can improve weak query results. This approach requires sufficiently different retrieval lists for an ad-hoc query. We propose various ways of selecting salient terms from longer queries to probe the web, and define alternate queries from web results. Target retrievals by the original and alternate queries are combined. When compared with normal ad-hoc retrieval, web assistance and data fusion can improve weak query effectiveness by over 100%. Another benefit of this approach is that other queries also improve along with weak ones, unlike pseudo-relevance feedback which works mostly for non-weak queries.

- Session 1A: Relevance/Retrieval Models | Pp. 17-30

Query Expansion with the Minimum Relevance Judgments

Masayuki Okabe; Kyoji Umemura; Seiji Yamada

Query expansion techniques generally select new query terms from a set of top ranked documents. Although a user’s manual judgment of those documents would much help to select good expansion terms, it is difficult to get enough feedback from users in practical situations. In this paper we propose a query expansion technique which performs well even if a user notifies just a relevant document and a non-relevant document. In order to tackle this specific condition, we introduce two refinements to a well-known query expansion technique. One is to increase documents possibly being relevant by a transductive learning method because the more relevant documents will produce the better performance. The other is a modified term scoring scheme based on the results of the learning method and a simple function. Experimental results show that our technique outperforms some traditional methods in standard precision and recall criteria.

- Session 1A: Relevance/Retrieval Models | Pp. 31-42

Improved Concurrency Control Technique with Lock-Free Querying for Multi-dimensional Index Structure

Myung-Keun Kim; Hae-Young Bae

This paper proposes the improved concurrency control technique with lock-free querying for multi-dimensional index structure. In highly concurrent workloads due to frequent updates for storing location of moving object, the variants of R-tree structure cannot provide the real-time response. Because query processing is frequently blocked by node-split or region propagation as the locations of objects change. This paper improves the query performance by using the new versioning technique. It does not physically modify data, but creates new version for compensating data intactness. Search operation can access data without any locking or latching by reading old version. In the performance evaluation, it is proven that search operation of the proposed tree is at least two times faster than a previous work.

- Session 1A: Relevance/Retrieval Models | Pp. 43-55

A Color-Based Image Retrieval Method Using Color Distribution and Common Bitmap

Chin-Chen Chang; Tzu-Chuen Lu

Image retrieval has emerged as an important problem in multimedia database management. This paper uses the color distribution, the mean value and the standard deviation, of an image as global information for image retrieval. Furthermore, this paper uses the common bitmap to represent the local characteristics of the image. The performance of the method is tested on three different image databases consisting of 410, 235, and 10,235 images. The third database has been partitioned into 10 categories for exploring the category retrieval ability. According to the experimental results, we find that the proposed method can effectively retrieve more similar images than other methods and the category ability is also higher than others. In addition, the total memory space for saving the image features of the proposed method is less than other methods.

- Session 1B: Multimedia IR | Pp. 56-71

A Probabilistic Model for Music Recommendation Considering Audio Features

Qing Li; Sung Hyon Myaeng; Dong Hai Guan; Byeong Man Kim

In order to make personalized recommendations, many collaborative music recommender systems (CMRS) focused on capturing precise similarities among users or items based on user historical ratings. Despite the valuable information from audio features of music itself, however, few studies have investigated how to directly extract and utilize information from music for personalized recommendation in CMRS. In this paper, we describe a CMRS based on our proposed item-based probabilistic model, where items are classified into groups and predictions are made for users considering the Gaussian distribution of user ratings. By utilizing audio features, this model provides a way to alleviate three well-known challenges in collaborative recommender systems: user bias, non-association, and cold start problems in capturing accurate similarities among items. Experiments on a real-world data set illustrate that the audio information of music is quite useful and our system is feasible to integrate it for better personalized recommendation.

- Session 1B: Multimedia IR | Pp. 72-83

VisMed: A Visual Vocabulary Approach for Medical Image Indexing and Retrieval

Joo-Hwee Lim; Jean-Pierre Chevallet

Voluminous medical images are generated daily. They are critical assets for medical diagnosis, research, and teaching. To facilitate automatic indexing and retrieval of large medical image databases, we propose a structured framework for designing and learning vocabularies of meaningful medical terms associated with visual appearance from image samples. These VisMed terms span a new feature space to represent medical image contents. After a multi-scale detection process, a medical image is indexed as compact spatial distributions of VisMed terms. A flexible tiling (FlexiTile) matching scheme is proposed to compare the similarity between two medical images of arbitrary aspect ratios.

We evaluate the VisMed approach on the medical retrieval task of the ImageCLEF 2004 benchmark. Based on 2% of the 8725 CasImage collection, we cropped 1170 image regions to train and validate 40 VisMed terms using support vector machines. The Mean Average Precision (MAP) over 26 query topics is 0.4156, an improvement over all the automatic runs in ImageCLEF 2004.

- Session 1B: Multimedia IR | Pp. 84-96

Object Identification and Retrieval from Efficient Image Matching: Snap2Tell with the STOIC Dataset

Jean-Pierre Chevallet; Joo-Hwee Lim; Mun-Kew Leong

Traditional content based image retrieval attempts to retrieve images using syntactic features for a query image. Annotated image banks and Google allow the use of text to retrieve images. In this paper, we studied the task of using the content of an image to retrieve information in general. We describe the significance of object identification in an information retrieval paradigm that uses image set as intermediate means in indexing and matching. We also describe a unique Singapore Tourist Object Identification Collection with associated queries and relevance judgments for evaluating the new task and the need for efficient image matching using simple image features. We present comprehensive experimental evaluation on the effects of feature dimensions, context, spatial weightings, coverage of image indexes, and query devices on task performance. Lastly we describe the current system developed to support mobile image-based tourist information retrieval.

- Session 1B: Multimedia IR | Pp. 97-112

Extracting the Significant Terms from a Sentence-Term Matrix by Removal of the Noise in Term Usage

Changbeom Lee; Hoseop Choe; Hyukro Park; Cheolyoung Ock

In this paper, we propose an approach to extracting the significant terms in a document by the quantification methods which are both singular value decomposition (SVD) and principal component analysis (PCA). The SVD can remove the noise of variability in term usage of an original sentence-term matrix by using the singular values acquired after computing the SVD. This adjusted sentence-term matrix, which have removed its noisy usage of terms, can be used to perform the PCA, since the dimensionality of the revised matrix is the same as that of the original. Since the PCA can be used to extract the significant terms on the basis of the eigenvalue-eigenvector pairs for the sentence-term matrix, the extracted terms by the revised matrix instead of the original can be regarded as more effective or appropriate. Experimental results on Korean newspaper articles in automatic summarization show that the proposed method is superior to that over the only PCA.

- Session 2A: Natural Language Processing in IR | Pp. 113-120

Cross Document Event Clustering Using Knowledge Mining from Co-reference Chains

June-Jei Kuo; Hsin-Hsi Chen

Unification of the terminology usages which captures more term semantics is useful for event clustering. This paper proposes a metric of normalized chain edit distance to mine controlled vocabulary from cross-document co-reference chains incrementally. A novel threshold model that incorporates time decay function and spanning window utilizes the controlled vocabulary for event clustering on streaming news. The experimental results show that the proposed system has 16% performance increase compared to the baseline system and 6% performance increase compared to the system without introducing controlled vocabulary.

- Session 2A: Natural Language Processing in IR | Pp. 121-134