Catálogo de publicaciones - libros

Compartir en
redes sociales

Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings

Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)

En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-33206-0

ISBN electrónico

978-3-540-33207-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11731139_21

Clustering Multi-represented Objects Using Combination Trees

Elke Achtert; Hans-Peter Kriegel; Alexey Pryakhin; Matthias Schubert

When clustering complex objects, there often exist various feature transformations and thus multiple object representations. To cluster multi-represented objects, dedicated data mining algorithms have been shown to achieve improved results. In this paper, we will introduce combination trees for describing arbitrary semantic relationships which can be used to extend the hierarchical clustering algorithm OPTICS to handle multi-represented data objects. To back up the usability of our proposed method, we present encouraging results on real world data sets.

- Ensemble Learning | Pp. 174-178

doi: 10.1007/11731139_22

Parallel Density-Based Clustering of Complex Objects

Stefan Brecheisen; Hans-Peter Kriegel; Martin Pfeifle

In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distance functions can be used to parallelize the density-based clustering algorithm DBSCAN. First, the data is partitioned based on an enumeration calculated by the hierarchical clustering algorithm OPTICS, so that similar objects have adjacent enumeration values. We use the fact that clustering based on lower-bounding distance values conservatively approximates the exact clustering. By integrating the multi-step query processing paradigm directly into the clustering algorithms, the clustering on the slaves can be carried out very efficiently. Finally, we show that the different result sets computed by the various slaves can effectively and efficiently be merged to a global result by means of cluster connectivity graphs. In an experimental evaluation based on real-world test data sets, we demonstrate the benefits of our approach.

- Ensemble Learning | Pp. 179-188

doi: 10.1007/11731139_23

Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering

Yunming Ye; Joshua Zhexue Huang; Xiaojun Chen; Shuigeng Zhou; Graham Williams; Xiaofei Xu

This paper presents a new method for effectively selecting initial cluster centers in -means clustering. This method identifies the high density neighborhoods from the data first and then selects the central points of the neighborhoods as initial centers. The recently published Neighborhood-Based Clustering () algorithm is used to search for high density neighborhoods. The new clustering algorithm integrates into the -means clustering process to improve the performance of the -means algorithm while preserving the -means efficiency. is enhanced with a new cell-based neighborhood search method to accelerate the search for initial cluster centers. A merging method is employed to filter out insignificant initial centers to avoid too many clusters being generated. Experimental results on synthetic data sets have shown significant improvements in clustering accuracy in comparison with the random -means and the refinement -means algorithms.

- Ensemble Learning | Pp. 189-198

doi: 10.1007/11731139_24

Uncertain Data Mining: An Example in Clustering Location Data

Michael Chau; Reynold Cheng; Ben Kao; Jackey Ng

Data uncertainty is an inherent property in various applications due to reasons such as outdated sources or imprecise measurement. When data mining techniques are applied to these data, their uncertainty has to be considered to obtain high quality results. We present UK-means clustering, an algorithm that enhances the K-means algorithm to handle data uncertainty. We apply UK-means to the particular pattern of moving-object uncertainty. Experimental results show that by considering uncertainty, a clustering algorithm can produce more accurate results.

- Ensemble Learning | Pp. 199-204

doi: 10.1007/11731139_25

Parallel Randomized Support Vector Machine

Yumao Lu; Vwani Roychowdhury

A parallel support vector machine based on randomized sampling technique is proposed in this paper. We modeled a new LP-type problem so that it works for general linear-nonseparable SVM training problems unlike the previous work [2]. A unique priority based sampling mechanism is used so that we can prove an average convergence rate that is so far the fastest bounded convergence rate to the best of our knowledge. The numerical results on synthesized data and a real geometric database show that our algorithm has good scalability.

- Support Vector Machines | Pp. 205-214

doi: 10.1007/11731139_26

-Tube Based Pattern Selection for Support Vector Machines

Dongil Kim; Sungzoon Cho

The training time complexity of Support Vector Regression (SVR) is . Hence, it takes long time to train a large dataset. In this paper, we propose a pattern selection method to reduce the training time of SVR. With multiple bootstrap samples, we estimate -tube. Probabilities are computed for each pattern to fall inside -tube. Those patterns with higher probabilities are selected stochastically. To evaluate the new method, the experiments for 4 datasets have been done. The proposed method resulted in the best performance among all methods, and even its performance was found stable.

- Support Vector Machines | Pp. 215-224

doi: 10.1007/11731139_27

Self-adaptive Two-Phase Support Vector Clustering for Multi-Relational Data Mining

Ping Ling; Yan Wang; Chun-Guang Zhou

This paper proposes a novel Self-Adaptive Two-Phase Support Vector Clustering algorithm (STPSVC) to cluster multi-relational data. The algorithm produces an appreciate description of cluster contours and then extracts cluster centers information by iteratively performing classification procedure. An adaptive Kernel function is designed to find a desired width parameter for diverse dispersions. Experimental results indicate that the designed Kernel can capture multi-relational features well and STPSVC is of fine performance.

- Support Vector Machines | Pp. 225-229

doi: 10.1007/11731139_28

One-Class Support Vector Machines for Recommendation Tasks

Yasutoshi Yajima

The present paper proposes new approaches for recommendation tasks based on one-class support vector machines (1-SVMs) with graph kernels generated from a Laplacian matrix. We introduce new formulations for the 1-SVM that can manipulate graph kernels quite efficiently. We demonstrate that the proposed formulations fully utilize the sparse structure of the Laplacian matrix, which enables the proposed approaches to be applied to recommendation tasks having a large number of customers and products in practical computational times. Results of various numerical experiments demonstrating the high performance of the proposed approaches are presented.

- Support Vector Machines | Pp. 230-239

doi: 10.1007/11731139_29

Heterogeneous Information Integration in Hierarchical Text Classification

Huai-Yuan Yang; Tie-Yan Liu; Li Gao; Wei-Ying Ma

Previous work has shown that considering the category distance in the taxonomy tree can improve the performance of text classifiers. In this paper, we propose a new approach to further integrate more categorical information in the text corpus using the principle of multi-objective programming (MOP). That is, we not only consider the distance between categories defined by the branching of the taxonomy tree, but also consider the similarity between categories defined by the document/term distributions in the feature space. Consequently, we get a refined category distance by using MOP to leverage these two kinds of information. Experiments on both synthetic and real-world datasets demonstrated the effectiveness of the proposed algorithm in hierarchical text classification.

- Text and Document Mining | Pp. 240-249

doi: 10.1007/11731139_30

FISA: Feature-Based Instance Selection for Imbalanced Text Classification

Aixin Sun; Ee-Peng Lim; Boualem Benatallah; Mahbub Hassan

Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning time, methods based on FISA delivered much better classification accuracy than those methods using all negative training documents.

- Text and Document Mining | Pp. 250-254