Catálogo de publicaciones - libros
Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings
Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)
En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-33206-0
ISBN electrónico
978-3-540-33207-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11731139_21
Clustering Multi-represented Objects Using Combination Trees
Elke Achtert; Hans-Peter Kriegel; Alexey Pryakhin; Matthias Schubert
When clustering complex objects, there often exist various feature transformations and thus multiple object representations. To cluster multi-represented objects, dedicated data mining algorithms have been shown to achieve improved results. In this paper, we will introduce combination trees for describing arbitrary semantic relationships which can be used to extend the hierarchical clustering algorithm OPTICS to handle multi-represented data objects. To back up the usability of our proposed method, we present encouraging results on real world data sets.
- Ensemble Learning | Pp. 174-178
doi: 10.1007/11731139_22
Parallel Density-Based Clustering of Complex Objects
Stefan Brecheisen; Hans-Peter Kriegel; Martin Pfeifle
In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distance functions can be used to parallelize the density-based clustering algorithm DBSCAN. First, the data is partitioned based on an enumeration calculated by the hierarchical clustering algorithm OPTICS, so that similar objects have adjacent enumeration values. We use the fact that clustering based on lower-bounding distance values conservatively approximates the exact clustering. By integrating the multi-step query processing paradigm directly into the clustering algorithms, the clustering on the slaves can be carried out very efficiently. Finally, we show that the different result sets computed by the various slaves can effectively and efficiently be merged to a global result by means of cluster connectivity graphs. In an experimental evaluation based on real-world test data sets, we demonstrate the benefits of our approach.
- Ensemble Learning | Pp. 179-188
doi: 10.1007/11731139_23
Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering
Yunming Ye; Joshua Zhexue Huang; Xiaojun Chen; Shuigeng Zhou; Graham Williams; Xiaofei Xu
This paper presents a new method for effectively selecting initial cluster centers in -means clustering. This method identifies the high density neighborhoods from the data first and then selects the central points of the neighborhoods as initial centers. The recently published Neighborhood-Based Clustering () algorithm is used to search for high density neighborhoods. The new clustering algorithm integrates into the -means clustering process to improve the performance of the -means algorithm while preserving the -means efficiency. is enhanced with a new cell-based neighborhood search method to accelerate the search for initial cluster centers. A merging method is employed to filter out insignificant initial centers to avoid too many clusters being generated. Experimental results on synthetic data sets have shown significant improvements in clustering accuracy in comparison with the random -means and the refinement -means algorithms.
- Ensemble Learning | Pp. 189-198
doi: 10.1007/11731139_24
Uncertain Data Mining: An Example in Clustering Location Data
Michael Chau; Reynold Cheng; Ben Kao; Jackey Ng
Data uncertainty is an inherent property in various applications due to reasons such as outdated sources or imprecise measurement. When data mining techniques are applied to these data, their uncertainty has to be considered to obtain high quality results. We present UK-means clustering, an algorithm that enhances the K-means algorithm to handle data uncertainty. We apply UK-means to the particular pattern of moving-object uncertainty. Experimental results show that by considering uncertainty, a clustering algorithm can produce more accurate results.
- Ensemble Learning | Pp. 199-204
doi: 10.1007/11731139_25
Parallel Randomized Support Vector Machine
Yumao Lu; Vwani Roychowdhury
A parallel support vector machine based on randomized sampling technique is proposed in this paper. We modeled a new LP-type problem so that it works for general linear-nonseparable SVM training problems unlike the previous work [2]. A unique priority based sampling mechanism is used so that we can prove an average convergence rate that is so far the fastest bounded convergence rate to the best of our knowledge. The numerical results on synthesized data and a real geometric database show that our algorithm has good scalability.
- Support Vector Machines | Pp. 205-214
doi: 10.1007/11731139_26
-Tube Based Pattern Selection for Support Vector Machines
Dongil Kim; Sungzoon Cho
The training time complexity of Support Vector Regression (SVR) is . Hence, it takes long time to train a large dataset. In this paper, we propose a pattern selection method to reduce the training time of SVR. With multiple bootstrap samples, we estimate -tube. Probabilities are computed for each pattern to fall inside -tube. Those patterns with higher probabilities are selected stochastically. To evaluate the new method, the experiments for 4 datasets have been done. The proposed method resulted in the best performance among all methods, and even its performance was found stable.
- Support Vector Machines | Pp. 215-224
doi: 10.1007/11731139_27
Self-adaptive Two-Phase Support Vector Clustering for Multi-Relational Data Mining
Ping Ling; Yan Wang; Chun-Guang Zhou
This paper proposes a novel Self-Adaptive Two-Phase Support Vector Clustering algorithm (STPSVC) to cluster multi-relational data. The algorithm produces an appreciate description of cluster contours and then extracts cluster centers information by iteratively performing classification procedure. An adaptive Kernel function is designed to find a desired width parameter for diverse dispersions. Experimental results indicate that the designed Kernel can capture multi-relational features well and STPSVC is of fine performance.
- Support Vector Machines | Pp. 225-229
doi: 10.1007/11731139_28
One-Class Support Vector Machines for Recommendation Tasks
Yasutoshi Yajima
The present paper proposes new approaches for recommendation tasks based on one-class support vector machines (1-SVMs) with graph kernels generated from a Laplacian matrix. We introduce new formulations for the 1-SVM that can manipulate graph kernels quite efficiently. We demonstrate that the proposed formulations fully utilize the sparse structure of the Laplacian matrix, which enables the proposed approaches to be applied to recommendation tasks having a large number of customers and products in practical computational times. Results of various numerical experiments demonstrating the high performance of the proposed approaches are presented.
- Support Vector Machines | Pp. 230-239
doi: 10.1007/11731139_29
Heterogeneous Information Integration in Hierarchical Text Classification
Huai-Yuan Yang; Tie-Yan Liu; Li Gao; Wei-Ying Ma
Previous work has shown that considering the category distance in the taxonomy tree can improve the performance of text classifiers. In this paper, we propose a new approach to further integrate more categorical information in the text corpus using the principle of multi-objective programming (MOP). That is, we not only consider the distance between categories defined by the branching of the taxonomy tree, but also consider the similarity between categories defined by the document/term distributions in the feature space. Consequently, we get a refined category distance by using MOP to leverage these two kinds of information. Experiments on both synthetic and real-world datasets demonstrated the effectiveness of the proposed algorithm in hierarchical text classification.
- Text and Document Mining | Pp. 240-249
doi: 10.1007/11731139_30
FISA: Feature-Based Instance Selection for Imbalanced Text Classification
Aixin Sun; Ee-Peng Lim; Boualem Benatallah; Mahbub Hassan
Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning time, methods based on FISA delivered much better classification accuracy than those methods using all negative training documents.
- Text and Document Mining | Pp. 250-254