Catálogo de publicaciones - libros

Compartir en
redes sociales


Information Retrieval Technology: Second Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, October 13-15, 2005, Proceedings

Gary Geunbae Lee ; Akio Yamada ; Helen Meng ; Sung Hyon Myaeng (eds.)

En conferencia: 2º Asia Information Retrieval Symposium (AIRS) . Jeju Island, South Korea . October 13, 2005 - October 15, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Library Science; Theory of Computation; Information Systems Applications (incl. Internet); Algorithm Analysis and Problem Complexity; Data Structures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29186-2

ISBN electrónico

978-3-540-32001-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Song Wave Retrieval Based on Frame-Wise Phoneme Recognition

Yuuichi Yaguchi; Ryuichi Oka

We propose a song wave retrieval method. Both song wave data and a query wave for song wave data are transformed into phoneme sequences by frame-wise labeling of each frame feature. By applying a search algorithm, called Continuous Dynamic Programming (CDP), to these phoneme sequences, we can detect a set of similar parts in a song database, each of which is similar to a query song wave. Song retrieval rates hit 78% in four clauses from whole databases. Differences in each query from song wave data and speech wave data is investigated.

- Poster and Demo Session 1 | Pp. 503-509

: Exploring a Large-Scale News Video Archive by Tracking Human Relations

Ichiro Ide; Tomoyoshi Kinoshita; Hiroshi Mo; Norio Katayama; Shin’ichi Satoh

We propose a novel retrieval method for a very large-scale news video archive based on human relations extracted from the archive itself. This paper presents the idea and the implementation of the method, and also introduces the interface that enables the retrieval and at the same time track down the relations. Although detailed evaluations are yet to be done, we have found interesting relations through the exploration of the archive by making use of the proposed interface.

- Poster and Demo Session 1 | Pp. 510-515

Topic-Independent Web High-Quality Page Selection Based on K-Means Clustering

Canhui Wang; Yiqun Liu; Min Zhang; Shaoping Ma

One of the web search engines’ challenges is to identify the quality of web pages independent of a given user request. Web high-quality pages provide readers proper entries to get more concentrated required information on the web. This paper focuses on topic-independent web high-quality page selection to reduce web information redundancies and clean noise. Different non-content features and their effects on high-quality page selection are studied. Then K-means clustering with these features is performed to separate high-quality pages from common ones. Experiments on 19GB (document size) TREC web data set (.GOV data) have been made. By this proposed approach, less than 50% of web pages are obtained as high-quality ones, covering about 90% key information in the whole set. Information retrieval on this high-quality page set achieves more than 40% improvement, compared with that on the whole data collection.

- Poster and Demo Session 1 | Pp. 516-521

Improving Text Similarity Measurement by Critical Sentence Vector Model

Wei Li; Kam-Fai Wong; Chunfa Yuan; Wenjie Li; Yunqing Xia

We propose the , a novel model to measure text similarity. The CSVM accounts for the structural and semantic information of the document. Compared to existing methods based on keyword vector, e.g. , measures documents similarity by measuring similarity between critical sentence vectors extracted from documents. Experiments show that CSVM outperforms VSM in calculation of text similarity.

- Poster and Demo Session 1 | Pp. 522-527

Audio Fingerprinting Scheme by Temporal Filtering for Audio Identification Immune to Channel-Distortion

Mansoo Park; Hoi-Rin Kim; Dong-Ho Shin; Seung Hyun Yang

Channel-distortion in real-environment is at issue in music information retrieval system by content-based audio identification technique. As a matter of fact, audio signal is commonly distorted by channel and background noise in case of that it is recorded under real-situation. Recently, Philips published a robust and efficient audio fingerprinting system for audio identification. To extract a robust and efficient audio fingerprint, Philips applied the first derivative (differential) to the frequency-time sequence of perceptual filter-bank energies. In practice, however, it is not sufficient to remove the undesired perturbations. This paper introduces an extension method of the audio fingerprint extraction scheme of Philips that is more immune to channel-distortion. The channel-normalization techniques for temporal filtering are used to lessen the channel effects of real-environment.

- Poster and Demo Session 1 | Pp. 528-533

On the Chinese Document Clustering Based on Dynamical Term Clustering

Chih-Ming Tseng; Kun-Hsiu Tsai; Chiun-Chieh Hsu; His-Cheng Chang

With the rapid development of global networking through the network, more and more information is accessible on-line. It makes the document clustering technique more dispensable. With the clustering process we can efficiently browse the large information. In this paper, we focus on Chinese document clustering process, which uses data mining technique and neural network model. There are two main phases: preprocessing phase and clustering phase. In the preprocessing phase, we propose another Chinese sentence segmentation method, which based on data mining technique of using a hash-based method. In the clustering phase, we adopt the dynamical SOM model with a view to dynamically clustering data. Furthermore, we use term vectors clustering process instead of document vectors clustering process. Our experiments demonstrate that the term clustering results in better precision rate, and the term clustering will be more efficiently when the amount of documents grows gradually.

- Poster and Demo Session 1 | Pp. 534-539

Integrating Heterogeneous Multimedia Resources

Seong Joon Yoo; Chull Hwan Song

This paper proposes a new multimedia metadata that will support integrating non-standard multimedia metadata as well as the standard multimedia metadata. The metadata is defined by integrating MPEG-7 MDS and TV-AnyTime metadata. We also designed and implemented a framework for integrating multimedia databases. Retrieving multimedia data from heterogeneous resources described in MPEG-7 MDS and TV-AnyTime metadata is faster than retrieving multimedia data from homogeneous resources.

- Poster and Demo Session 1 | Pp. 540-545

Calculating Webpage Importance with Site Structure Constraints

Hui-Min Yan; Tao Qin; Tie-Yan Liu; Xu-Dong Zhang; Guang Feng; Wei-Ying Ma

PageRank is one of the most popular link analysis algorithms that have shown their effectiveness in web search. However, PageRank only consider hyperlink information. In this paper, we propose several novel ranking algorithms, which make use of both hyperlink and site structure information to measure the importance of each web page. Specifically, two kinds of methodologies are adopted to refine the PageRank algorithm: one combines hyperlink information and website structure information together by graph fusion to refine PageRank algorithm, while the other re-ranks the pages within the same site by quadratic optimization based on original PageRank values. Experiments show that both two methodologies effectively improve the retrieval performance.

- Poster and Demo Session 1 | Pp. 546-551

Gene Ontology Classification of Biomedical Literatures Using Context Association

Ki Chan; Wai Lam

The functional annotation of gene products from biomedical literatures has become a pressing issue due to the huge human efforts involved and the evolving biomedical knowledge. In this paper, we propose an approach for facilitating this functional annotation to the Gene Ontology by focusing on a subtask of annotation, that is, to determine which of the Gene Ontology a literature is associated with. This subtask can be formulated as a document classification problem. A feature engineering approach using context association conveyed in the biomedical literatures, in particular, utilizing the proximity relationship between target gene(s) and term features is proposed. Our approach achieves an F-score of 60.24%, which outperforms the submission runs of TREC Genomics 2004 annotation hierarchy subtask. We show that incorporation of context association can enhance the performance of the annotation hierarchy classification problem.

- Poster and Demo Session 1 | Pp. 552-557

An Examination of Feature Selection Frameworks in Text Categorization

Bong Chih How; Wong Ting Kiong

Feature selection, an important task in text categorization, is used for the purpose of dimensionality reduction. Feature selection basically can be performed locally and globally. For local selection, distinct feature sets are derived from different classes. The number of feature set is thus depended on the number of class. In contrary, only one universal feature set will be used in global feature selection. It is assumed that the feature set should preserve the characteristic of all classes. Furthermore, feature selection can also be carried out based on relevant feature set only (local dictionary) or both relevant and irrelevant feature set (universal dictionary). In this paper, we explored the different frameworks of feature selection to the task of text categorization on the Reuters(10) and Reuters(115) datasets (variants of Reuters-21578 corpus). We then investigate the efficiency of 7 different local or global feature selections corresponds the use of local and universal dictionary. Our experiments have shown that local feature selection with local dictionary yields optimal categorization results.

- Poster and Demo Session 1 | Pp. 558-564