Catálogo de publicaciones - libros
Advanced Data Mining and Applications: 1st International Conference, ADMA 2005, Wuhan, China, July 22-24, 2005, Proceedings
Xue Li ; Shuliang Wang ; Zhao Yang Dong (eds.)
En conferencia: 1º International Conference on Advanced Data Mining and Applications (ADMA) . Wuhan, China . July 22, 2005 - July 24, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Artificial Intelligence (incl. Robotics); Database Management; Software Engineering; Computer Appl. in Administrative Data Processing; Information Systems Applications (incl. Internet); Health Informatics
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-27894-8
ISBN electrónico
978-3-540-31877-4
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11527503_41
A Non-VSM kNN Algorithm for Text Classification
Zhi-Hong Deng; Shi-Wei Tang
The text classification problem, which is the task of assigning natural language texts to predefined categories based on their content, has been widely studied. Traditional text classification use VSM (Vector Space Model), which views documents as vectors in high dimensional spaces, to represent documents. In this paper, we propose a non-VSM kNN algorithm for text classification. Based on correlations between categories and features, the algorithms first get k F-C tuples, which are the first k tuples in term of correlation value, from an unlabeled document. Then the algorithm predicts the category of the unlabeled documents via these tuples. We have evaluated the algorithm on two document collections and compared it against traditional kNN. Experimental results show that our algorithm outperforms traditional kNN in both efficiency and effectivity.
- Text Mining | Pp. 339-346
doi: 10.1007/11527503_42
A Study on Text Clustering Algorithms Based on Frequent Term Sets
Xiangwei Liu; Pilian He
In this paper, a new text-clustering algorithm named Frequent Term Set-based Clustering (FTSC) is introduced. It uses frequent term sets to cluster texts. First, it extracts useful information from documents and inserts into databases. Then, it uses the Apriori algorithm based on association rules mining efficiently to discover the frequent items sets. Finally, it clusters the documents according to the frequent words in subsets of the frequent term sets. This algorithm can reduce the dimension of the text data efficiently for very large databases, thus it can improve the accuracy and speed of the clustering algorithm. The results of clustering texts by the FTSC algorithm cannot reflect the overlap of texts’ classes. Based on the FTSC algorithm, an improved algorithm—Frequent Term Set-based Hierarchical Clustering algorithm (FTSHC) is given. This algorithm can determine the overlap of texts’ classes by the overlap of the frequent words sets, and provide an understandable description of the discovered clusters by the frequent terms sets. The FTSC, FTSHC and K-Means algorithms are evaluated quantitatively by experiments. The results of the experiments prove that FTSC and FTSHC algorithms are more efficient than K-Means algorithm in the performance of clustering.
- Text Mining | Pp. 347-354
doi: 10.1007/11527503_43
An Improvement of Text Association Classification Using Rules Weights
Xiao-Yun Chen; Yi Chen; Rong-Lu Li; Yun-Fa Hu
Recently, categorization methods based on association rules have been given much attention. In general, association classification has the higher accuracy and the better performance. However, the classification accuracy drops rapidly when the distribution of feature words in training set is uneven. Therefore, text categorization algorithm Weighted Association Rules Categorization (WARC) is proposed in this paper. In this method, association rules are used to classify training samples and rule intensity is defined according to the number of misclassified training samples. Each strong rule is multiplied by factor less than 1 to reduce its weight while each weak rule is multiplied by factor more than 1 to increase its weight. The result of research shows that this method can remarkably improve the accuracy of association classification algorithms by regulation of rules weights.
- Text Mining | Pp. 355-363
doi: 10.1007/11527503_44
Word Segmentation and POS Tagging for Chinese Keyphrase Extraction
Xiaochun Huang; Jian Chen; Puliu Yan; Xin Luo
Keyphrases are essential for many text mining applications. In order to automatically extracting keyphrases from Chinese text, an extraction system is proposed in this paper. To access a particular problem of Chinese information processing, a lexicon-based word segmentation approach is presented. For this purpose, a verb lexicon, a functional word lexicon and a stop word lexicon are constructed. A predefined keyphrase lexicon is applied to improve the performance of extraction. The approach uses a small Part-Of-Speech(POS) tagset to index phrases simply according to these lexicons. It is especially effective for identifying phrases in form of combinations of nouns, adjectives and verbs. Keyphrases are sifted by their weighted TF-IDF (Term occurrence Frequency-Inverse Document Frequency) values. New keyphrases are added into the keyphrase lexicon.
- Text Mining | Pp. 364-369
doi: 10.1007/11527503_45
Learning User Profiles from Text in e-Commerce
M. Degemmis; P. Lops; S. Ferilli; N. Di Mauro; T. M. A. Basile; G. Semeraro
Exploring digital collections to find information relevant to a user’s interests is a challenging task. Algorithms designed to solve this base their relevance computations on in which representations of the users’ interests are maintained. This paper presents a new method, based on the classical Rocchio algorithm for text categorization, able to discover user preferences from the analysis of textual descriptions of items in online catalogues of e-commerce Web sites. Experiments have been carried out on a dataset of real users, and results have been compared with those obtained using an Inductive Logic Programming (ILP) approach and a probabilistic one.
- Text Mining | Pp. 370-381
doi: 10.1007/11527503_46
Data Mining Based on Objects in Video Flow with Dynamic Background
Cheng Zeng; JiaHeng Cao; Ying Fang; Pei Du
This paper presents a model OMDB for mining the region information of non-rigid foreground object in video flow with dynamic background. The model constructs RDM algorithm and optimize the strategy of region matching using Q-learning to obtain better motion information of regions. Moreover, OMDB utilizes NEA algorithm to detect and merge gradually object regions of foreground based on the characteristics that there is motion difference between foreground and background and the regions of an object maintain integrality during moving. Experimental results on extracting region information of foreground object and tracking the object are presented to demonstrate the efficacy of the proposed model.
- Multimedia Mining | Pp. 382-390
doi: 10.1007/11527503_47
An Approach to Compressed Image Retrieval Based on JPEG2000 Framework
Jianguo Tang; Wenyin Zhang; Chao Li
As the latest effort by JPEG in international standardization of still image compression, JPEG2000 contains a range of important functionalities superior to its earlier DCT based versions. In the expectation that the compression standard will become an important digital format for many images and photographs, we present our recent work in this paper on image indexing and retrieval directly in wavelets domain, which is suitable for JPEG2000 compressed image retrieval without involving its full decompression. Our methods mainly extract histogram features from those significant wavelet coefficients according to the EBCOT of JPEG2000 for compressed image retrieval. While our method gains the advantage of eliminating decompression, the experiments also support that the retrieving accuracy is better than the existing counterparts.
- Multimedia Mining | Pp. 391-399
doi: 10.1007/11527503_48
Target Segmentation and Feature Extraction for Undersea Image Based on Function Transformation
Fuyuan Peng; Yan Tian; Xi Yu; Guohua Xu; Qian Xia
Because of the specialty of undersea channel and the complexity of undersea environment, many uncertain factors affect the quality of undersea image. Consequently, it is a difficult problem to segment and identify targets for undersea image. In this paper, a novel target segmentation and feature extraction approach for undersea image based on function transformation is presented. The approach overcomes the influence of complex environment and uneven illumination effectively. Experimental results demonstrate that the approach is valid for target segmentation and feature extraction for undersea hydrothermal vent image.
- Multimedia Mining | Pp. 400-406
doi: 10.1007/11527503_49
ART in Image Reconstruction with Narrow Fan-Beam Based on Data Mining
Zhong Qu; Junhao Wen; Dan Yang; Ling Xu; Yu Wu
Image reconstruction is one of the key technologies of industrial computed tomography. Algebraic method has un-replaceable advantage when the data is incomplete or the noise effect is high because of data mining. However the use of algebraic method has been highly limited because of the low speed reconstruction. In this paper, a new iterative method (algorithm reconstruction technique) is introduced to accelerate the iteration process and increase the reconstruction speed. Besides, algebraic reconstruction method will be used more widely with the development of computer technology and increase of computer speed. Experiment results clearly demonstrate that algorithm reconstruction technique can efficiently improve quality of images reconstruction when processing the incomplete projection data or noisy projection data based on data mining.
- Multimedia Mining | Pp. 407-414
doi: 10.1007/11527503_50
Digits Speech Recognition Based on Geometrical Learning
Wenming Cao; Xiaoxia Pan; Shoujue Wang; Jing Hu
We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems.We observe that this may be true for a recognition tasks based on geometrical learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy, Experiments show method based on ICA and geometrical learning outperforms HMM in different number of train samples.
- Multimedia Mining | Pp. 415-422