Catálogo de publicaciones - libros

Compartir en
redes sociales


Information Retrieval Technology: Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005, Proceedings

Gary Geunbae Lee ; Akio Yamada ; Helen Meng ; Sung Hyon Myaeng (eds.)

En conferencia: 2º Asia Information Retrieval Symposium (AIRS) . Jeju Island, South Korea . October 13, 2005 - October 15, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Library Science; Theory of Computation; Information Systems Applications (incl. Internet); Algorithm Analysis and Problem Complexity; Data Structures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29186-2

ISBN electrónico

978-3-540-32001-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

An Empirical Study of Query Expansion and Cluster-Based Retrieval in Language Modeling Approach

Seung-Hoon Na; In-Su Kang; Ji-Eun Roh; Jong-Hyeok Lee

In information retrieval, the word mismatch problem is a critical issue. To resolve the problem, several techniques have been developed, such as query expansion, cluster-based retrieval, and dimensionality reduction. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance. By performing experimentation on seven test collections of NTCIR and TREC, we conclude that 1) query expansion using parsimony is well performed, 2) cluster-based retrieval by agglomerative clustering is better than that by partitioning clustering, and 3) query expansion is generally more effective than cluster-based retrieval in resolving the word-mismatch problem, and finally 4) their combinations are effective when each method significantly improves baseline performance.

- Session 4A: Document/Query Models | Pp. 274-287

Effective Query Model Estimation Using Parsimonious Translation Model in Language Modeling Approach

Seung-Hoon Na; In-Su Kang; Ji-Eun Roh; Jong-Hyeok Lee

The KL divergence framework, the extended language modeling approach has a critical problem with estimation of query model, which is the probabilistic model that encodes user’s information need. At initial retrieval, estimation of query model by had been proposed that involves term co-occurrence statistics. However, the translation model has a difficulty to applying, because term co-occurrence statistics must be constructed in offline. Especially in large collection, constructing such large matrix of term co-occurrences statistics prohibitively increases time and space complexity. More seriously, because translation model comprises noisy non-topical terms in documents, reliable retrieval performance cannot be guaranteed. This paper proposes an effective method to construct co-occurrence statistics and eliminate noisy terms by employing . Parsimonious translation model is a compact version of and enables to drastically reduce number of terms that includes non-zero probabilities by eliminating non-topical terms in documents. From experimentations, we show that query model estimated from parsimonious translation model significantly outperforms not only baseline language modeling but also non-parsimonious model.

- Session 4A: Document/Query Models | Pp. 288-298

Chinese Document Re-ranking Based on Term Distribution and Maximal Marginal Relevance

Lingpeng Yang; Donghong Ji; Munkew Leong

In this paper, we propose a document re-ranking method for Chinese information retrieval where a query is a short natural language description. The method bases on term distribution where each term is weighted by its local and global distribution, including document frequency, document position and term length. The weight scheme lifts off the worry that very fewer relevant documents appear in top retrieved documents, and allows randomly setting a larger portion of the retrieved documents as relevance feedback. It also helps to improve the performance of MMR model in document re-ranking. The experiments show our method can get significant improvement against standard baselines, and outperforms relevant methods consistently.

- Session 4A: Document/Query Models | Pp. 299-311

On Effectiveness Measures and Relevance Functions in Ranking INEX Systems

Huyen-Trang Vu; Patrick Gallinari

This paper investigates the effect of performance measures and relevance functions in comparing retrieval systems in INEX, an evaluation forum dedicated to XML retrieval. We focus on two interdependent challenges which arise when evaluating XML retrieval systems, namely weak ordering issue of retrieved lists and multivalued relevance scales. Our analysis provides empirical evidence about the reasonableness of popular assumptions in information retrieval (IR) evaluation which state that ties can be ignored and binary relevance is sufficient. We also shed light on the impact of a parameter in Q-measure [18] on the sensitivity of the metric.

- Session 4A: Document/Query Models | Pp. 312-327

Home Photo Categorization Based on Photographic Region Templates

Sang-Kyun Kim; Seungji Yang; Kyong Sok Seo; Yong Man Ro; Ji-Yeon Kim; Yang Suk Seo

In this paper, we propose new photo categorization which is suitable for a home photo album. To enhance the categorization, both local and global concepts of the photos are modeled and their combined concept learning method for the photo categorization is proposed. The local and global concepts are trained by individual support vector machines. Region templates for the local concepts of generic home photos are proposed. Further, local concepts are merged with confidence to lead to the global concept to achieve reliable categorization. Experiment results show that the proposed method is useful to detect multi-category concepts for the home photo album.

- Session 4B: Special Session: Digital PhotoAlbum | Pp. 328-338

MPEG-7 Visual-Temporal Clustering for Digital Image Collections

Robert O’Callaghan; Mirosław Bober

We present a novel, yet simple algorithm for clustering large collections of digital images. The method is applicable to consumer digital photo libraries, where it can be used to organise a photo-album, enhancing the search/browse capability and simplifying the interface in the process. The method is based on standard MPEG-7 visual content descriptors, which, when combined with date and time metadata, provide powerful cues to the semantic structure of the photo collection. Experiments are presented showing how the proposed method closely matches consensus human judgements of cluster structure.

- Session 4B: Special Session: Digital PhotoAlbum | Pp. 339-350

A Structured Learning Approach to Semantic Photo Indexing and Query

Joo-Hwee Lim; Jesse S. Jin; Suhuai Luo

Consumer photos exhibit highly varied contents, diverse resolutions and inconsistent quality. The objects are usually ill-posed, occluded, and cluttered with poor lighting, focus, and exposure. Existing image retrieval approaches face many obstacles such as robust object segmentation, small sampling problem during relevance feedback, semantic gap between low-level features and high-level semantics, etc.

We propose a structured learning approach to design domain-relevant visual semantics, known as semantic support regions, to support semantic indexing and visual query for consumer photos. Semantic support regions are segmentation-free image regions that exhibit semantic meanings and that can be learned statistically to span a new indexing space. They are detected from image content, reconciled across multiple resolutions, and aggregated spatially to form local semantic histograms.

Query by Spatial Icons (QBSI) is a unique visual query language to specify semantic icons and spatial extents in a Boolean expression. Based on 2400 heterogeneous consumer photos and 26 semantic support regions learned from a small training set, we demonstrate the usefulness of the visual query language with 15 QBSI queries that have attained high precision values at top retrieved images.

- Session 4B: Special Session: Digital PhotoAlbum | Pp. 351-365

Image Retrieval Using Sub-image Matching in Photos Using MPEG-7 Descriptors

Min-Sung Ryu; Soo-Jun Park; Chee Sun Won

Regions of interest in photos are important clues for the content-based image retrieval. However, segmenting semantically meaningful objects in the photo automatically for the query and similarity matching is known to be an unsolved problem. As an alternative, in this paper, we propose a scheme to form a query region in the image space in terms of 4× 4 sub-images. The set of query sub-images, which include the region of interest in the image space, is used for the basic unit for the similarity matching. Specifically, the edge histogram descriptor and the color layout descriptor in MPEG-7 are used to extract image features in the chosen sub-images and are compared to those extracted from the test images in the database. Experimental results show that the proposed method can retrieve images with similar regions in the images, even if the background regions look quite different from each other.

- Session 4B: Special Session: Digital PhotoAlbum | Pp. 366-373

An Incremental Document Clustering for the Large Document Database

Kil Hong Joo; Won Suk Lee

With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF × NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.

- Session 5A: TDT/Clustering | Pp. 374-387

Subsite Retrieval: A Novel Concept for Topic Distillation

Tao Qin; Tie-Yan Liu; Xu-Dong Zhang; Guang Feng; Wei-Ying Ma

Topic distillation is one of the main information needs when users search the Web. In previous approaches to topic distillation, the single page was treated as the basic searching unit. This strategy is inherited from general information retrieval, which has not fully utilized the structure information of the Web. In this paper, we propose a novel concept for topic distillation, named subsite retrieval, in which the basic searching unit is the subsite instead of the single page. As indicated by the name, the subsite is a subset of website, consisting of a structural collection of pages. The key of subsite retrieval is to extract effective features to represent a subsite by utilizing both the content in each page and the structural information in the subsite. Specifically, we propose a so-called PI algorithm for this purpose, which is based on the modeling of website growth. Testing on the topic distillation task of TREC 2003 and TREC 2004, subsite retrieval gets significant improvement of retrieval performance over the previous single page based methods.

- Session 5A: TDT/Clustering | Pp. 388-400