Catálogo de publicaciones - libros

Compartir en
redes sociales


Advanced Data Mining and Applications: 1st International Conference, ADMA 2005, Wuhan, China, July 22-24, 2005, Proceedings

Xue Li ; Shuliang Wang ; Zhao Yang Dong (eds.)

En conferencia: 1º International Conference on Advanced Data Mining and Applications (ADMA) . Wuhan, China . July 22, 2005 - July 24, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Database Management; Software Engineering; Computer Appl. in Administrative Data Processing; Information Systems Applications (incl. Internet); Health Informatics

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-27894-8

ISBN electrónico

978-3-540-31877-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

A Non-VSM kNN Algorithm for Text Classification

Zhi-Hong Deng; Shi-Wei Tang

The text classification problem, which is the task of assigning natural language texts to predefined categories based on their content, has been widely studied. Traditional text classification use VSM (Vector Space Model), which views documents as vectors in high dimensional spaces, to represent documents. In this paper, we propose a non-VSM kNN algorithm for text classification. Based on correlations between categories and features, the algorithms first get k F-C tuples, which are the first k tuples in term of correlation value, from an unlabeled document. Then the algorithm predicts the category of the unlabeled documents via these tuples. We have evaluated the algorithm on two document collections and compared it against traditional kNN. Experimental results show that our algorithm outperforms traditional kNN in both efficiency and effectivity.

- Text Mining | Pp. 339-346

A Study on Text Clustering Algorithms Based on Frequent Term Sets

Xiangwei Liu; Pilian He

In this paper, a new text-clustering algorithm named Frequent Term Set-based Clustering (FTSC) is introduced. It uses frequent term sets to cluster texts. First, it extracts useful information from documents and inserts into databases. Then, it uses the Apriori algorithm based on association rules mining efficiently to discover the frequent items sets. Finally, it clusters the documents according to the frequent words in subsets of the frequent term sets. This algorithm can reduce the dimension of the text data efficiently for very large databases, thus it can improve the accuracy and speed of the clustering algorithm. The results of clustering texts by the FTSC algorithm cannot reflect the overlap of texts’ classes. Based on the FTSC algorithm, an improved algorithm—Frequent Term Set-based Hierarchical Clustering algorithm (FTSHC) is given. This algorithm can determine the overlap of texts’ classes by the overlap of the frequent words sets, and provide an understandable description of the discovered clusters by the frequent terms sets. The FTSC, FTSHC and K-Means algorithms are evaluated quantitatively by experiments. The results of the experiments prove that FTSC and FTSHC algorithms are more efficient than K-Means algorithm in the performance of clustering.

- Text Mining | Pp. 347-354

An Improvement of Text Association Classification Using Rules Weights

Xiao-Yun Chen; Yi Chen; Rong-Lu Li; Yun-Fa Hu

Recently, categorization methods based on association rules have been given much attention. In general, association classification has the higher accuracy and the better performance. However, the classification accuracy drops rapidly when the distribution of feature words in training set is uneven. Therefore, text categorization algorithm Weighted Association Rules Categorization (WARC) is proposed in this paper. In this method, association rules are used to classify training samples and rule intensity is defined according to the number of misclassified training samples. Each strong rule is multiplied by factor less than 1 to reduce its weight while each weak rule is multiplied by factor more than 1 to increase its weight. The result of research shows that this method can remarkably improve the accuracy of association classification algorithms by regulation of rules weights.

- Text Mining | Pp. 355-363

Word Segmentation and POS Tagging for Chinese Keyphrase Extraction

Xiaochun Huang; Jian Chen; Puliu Yan; Xin Luo

Keyphrases are essential for many text mining applications. In order to automatically extracting keyphrases from Chinese text, an extraction system is proposed in this paper. To access a particular problem of Chinese information processing, a lexicon-based word segmentation approach is presented. For this purpose, a verb lexicon, a functional word lexicon and a stop word lexicon are constructed. A predefined keyphrase lexicon is applied to improve the performance of extraction. The approach uses a small Part-Of-Speech(POS) tagset to index phrases simply according to these lexicons. It is especially effective for identifying phrases in form of combinations of nouns, adjectives and verbs. Keyphrases are sifted by their weighted TF-IDF (Term occurrence Frequency-Inverse Document Frequency) values. New keyphrases are added into the keyphrase lexicon.

- Text Mining | Pp. 364-369

Learning User Profiles from Text in e-Commerce

M. Degemmis; P. Lops; S. Ferilli; N. Di Mauro; T. M. A. Basile; G. Semeraro

Exploring digital collections to find information relevant to a user’s interests is a challenging task. Algorithms designed to solve this base their relevance computations on in which representations of the users’ interests are maintained. This paper presents a new method, based on the classical Rocchio algorithm for text categorization, able to discover user preferences from the analysis of textual descriptions of items in online catalogues of e-commerce Web sites. Experiments have been carried out on a dataset of real users, and results have been compared with those obtained using an Inductive Logic Programming (ILP) approach and a probabilistic one.

- Text Mining | Pp. 370-381

Data Mining Based on Objects in Video Flow with Dynamic Background

Cheng Zeng; JiaHeng Cao; Ying Fang; Pei Du

This paper presents a model OMDB for mining the region information of non-rigid foreground object in video flow with dynamic background. The model constructs RDM algorithm and optimize the strategy of region matching using Q-learning to obtain better motion information of regions. Moreover, OMDB utilizes NEA algorithm to detect and merge gradually object regions of foreground based on the characteristics that there is motion difference between foreground and background and the regions of an object maintain integrality during moving. Experimental results on extracting region information of foreground object and tracking the object are presented to demonstrate the efficacy of the proposed model.

- Multimedia Mining | Pp. 382-390

An Approach to Compressed Image Retrieval Based on JPEG2000 Framework

Jianguo Tang; Wenyin Zhang; Chao Li

As the latest effort by JPEG in international standardization of still image compression, JPEG2000 contains a range of important functionalities superior to its earlier DCT based versions. In the expectation that the compression standard will become an important digital format for many images and photographs, we present our recent work in this paper on image indexing and retrieval directly in wavelets domain, which is suitable for JPEG2000 compressed image retrieval without involving its full decompression. Our methods mainly extract histogram features from those significant wavelet coefficients according to the EBCOT of JPEG2000 for compressed image retrieval. While our method gains the advantage of eliminating decompression, the experiments also support that the retrieving accuracy is better than the existing counterparts.

- Multimedia Mining | Pp. 391-399

Target Segmentation and Feature Extraction for Undersea Image Based on Function Transformation

Fuyuan Peng; Yan Tian; Xi Yu; Guohua Xu; Qian Xia

Because of the specialty of undersea channel and the complexity of undersea environment, many uncertain factors affect the quality of undersea image. Consequently, it is a difficult problem to segment and identify targets for undersea image. In this paper, a novel target segmentation and feature extraction approach for undersea image based on function transformation is presented. The approach overcomes the influence of complex environment and uneven illumination effectively. Experimental results demonstrate that the approach is valid for target segmentation and feature extraction for undersea hydrothermal vent image.

- Multimedia Mining | Pp. 400-406

ART in Image Reconstruction with Narrow Fan-Beam Based on Data Mining

Zhong Qu; Junhao Wen; Dan Yang; Ling Xu; Yu Wu

Image reconstruction is one of the key technologies of industrial computed tomography. Algebraic method has un-replaceable advantage when the data is incomplete or the noise effect is high because of data mining. However the use of algebraic method has been highly limited because of the low speed reconstruction. In this paper, a new iterative method (algorithm reconstruction technique) is introduced to accelerate the iteration process and increase the reconstruction speed. Besides, algebraic reconstruction method will be used more widely with the development of computer technology and increase of computer speed. Experiment results clearly demonstrate that algorithm reconstruction technique can efficiently improve quality of images reconstruction when processing the incomplete projection data or noisy projection data based on data mining.

- Multimedia Mining | Pp. 407-414

Digits Speech Recognition Based on Geometrical Learning

Wenming Cao; Xiaoxia Pan; Shoujue Wang; Jing Hu

We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems.We observe that this may be true for a recognition tasks based on geometrical learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy, Experiments show method based on ICA and geometrical learning outperforms HMM in different number of train samples.

- Multimedia Mining | Pp. 415-422