Catálogo de publicaciones - libros

Compartir en
redes sociales

Machine Learning and Data Mining in Pattern Recognition: 5th International Conference, MLDM 2007, Leipzig, Germany, July 18-20, 2007. Proceedings

Petra Perner (eds.)

En conferencia: 5º International Workshop on Machine Learning and Data Mining in Pattern Recognition (MLDM) . Leipzig, Germany . July 18, 2007 - July 20, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Database Management; Data Mining and Knowledge Discovery; Pattern Recognition; Image Processing and Computer Vision

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73498-7

ISBN electrónico

978-3-540-73499-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-73499-4_11

On Applying Dimension Reduction for Multi-labeled Problems

Moonhwi Lee; Cheong Hee Park

Traditional classification problem assumes that a data sample belongs to one class among the predefined classes. On the other hand, in a multi-labeled problem such as text categorization, data samples can belong to multiple classes and the task is to output a set of class labels associated with new unseen data sample. As common in text categorization problem, learning a classifier in a high dimensional space can be difficult, known as the curse of dimensionality. It has been shown that performing dimension reduction as a preprocessing step can improve classification performances greatly. Especially, Linear discriminant analysis (LDA) is one of the most popular dimension reduction methods, which is optimized for classification tasks. However, in applying LDA for a multi-labeled problem some ambiguities and difficulties can arise. In this paper, we study on applying LDA for a multi-labeled problem and analyze how an objective function of LDA can be interpreted in multi-labeled setting. We also propose a LDA algorithm which is effective in a multi-labeled problem. Experimental results demonstrate that by considering multi-labeled structures LDA can achieve computational efficiency and also improve classification performances greatly.

- Feature Selection, Extraction and Dimensionality Reduction | Pp. 131-143

doi: 10.1007/978-3-540-73499-4_12

Nonlinear Feature Selection by Relevance Feature Vector Machine

Haibin Cheng; Haifeng Chen; Guofei Jiang; Kenji Yoshihira

Support vector machine (SVM) has received much attention in feature selection recently because of its ability to incorporate kernels to discover nonlinear dependencies between features. However it is known that the number of support vectors required in SVM typically grows linearly with the size of the training data set. Such a limitation of SVM becomes more critical when we need to select a small subset of relevant features from a very large number of candidates. To solve this issue, this paper proposes a novel algorithm, called the ‘relevance feature vector machine’(RFVM), for nonlinear feature selection. The RFVM algorithm utilizes a highly sparse learning algorithm, the relevance vector machine (RVM), and incorporates kernels to extract important features with both linear and nonlinear relationships. As a result, our proposed approach can reduce many false alarms, e.g. including irrelevant features, while still maintain good selection performance. We compare the performances between RFVM and other state of the art nonlinear feature selection algorithms in our experiments. The results confirm our conclusions.

- Feature Selection, Extraction and Dimensionality Reduction | Pp. 144-159

doi: 10.1007/978-3-540-73499-4_13

Affine Feature Extraction: A Generalization of the Fukunaga-Koontz Transformation

Wenbo Cao; Robert Haralick

Dimension reduction methods are often applied in machine learning and data mining problems. Linear subspace methods are the commonly used ones, such as principal component analysis (), Fisher’s linear discriminant analysis (), et al. In this paper, we describe a novel feature extraction method for binary classification problems. Instead of finding linear subspaces, our method finds lower- dimensional affine subspaces for data observations. Our method can be understood as a generalization of the Fukunaga-Koontz Transformation. We show that the proposed method has a closed-form solution and thus can be solved very efficiently. Also we investigate the information-theoretical properties of the new method and study the relationship of our method with other methods. The experimental results show that our method, as PCA and FDA, can be used as another preliminary data-exploring tool to help solve machine learning and data mining problems.

- Feature Selection, Extraction and Dimensionality Reduction | Pp. 160-173

doi: 10.1007/978-3-540-73499-4_14

A Bounded Index for Cluster Validity

Sandro Saitta; Benny Raphael; Ian F. C. Smith

Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and real-life data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multi-dimensional data sets and is able to accommodate unique and sub-cluster cases.

- Clustering | Pp. 174-187

doi: 10.1007/978-3-540-73499-4_15

Varying Density Spatial Clustering Based on a Hierarchical Tree

Xuegang Hu; Dongbo Wang; Xindong Wu

The high efficiency and quality of clustering for dealing with high-dimensional data are strongly needed with the leap of data scale. Density-based clustering is an effective clustering approach, and its representative algorithm DBSCAN has advantages as clustering with arbitrary shapes and handling noise. However, it also has disadvantages in its high time expense, parameter tuning and inability to varying densities. In this paper, a new clustering algorithm called VDSCHT (Varying Density Spatial Clustering Based on a Hierarchical Tree) is presented that constructs a hierarchical tree to describe subcluster and tune local parameter dynamically. Density-based clustering is adopted to cluster by detecting adjacent spaces of the tree. Both theoretical analysis and experimental results indicate that VDSCHT not only has the advantages of density-based clustering, but can also tune the local parameter dynamically to deal with varying densities. In addition, only one scan of database makes it suitable for mining large-scaled ones.

- Clustering | Pp. 188-202

doi: 10.1007/978-3-540-73499-4_16

Kernel MDL to Determine the Number of Clusters

Ivan O Kyrgyzov; Olexiy O Kyrgyzov; Henri Maître; Marine Campedel

In this paper we propose a new criterion, based on Minimum Description Length (MDL), to estimate an optimal number of clusters. This criterion, called Kernel MDL (KMDL), is particularly adapted to the use of kernel K-means clustering algorithm. Its formulation is based on the definition of MDL derived for Gaussian Mixture Model (GMM). We demonstrate the efficiency of our approach on both synthetic data and real data such as SPOT5 satellite images.

- Clustering | Pp. 203-217

doi: 10.1007/978-3-540-73499-4_17

Critical Scale for Unsupervised Cluster Discovery

Tomoya Sakai; Atsushi Imiya; Takuto Komazaki; Shiomu Hama

This paper addresses the scale-space clustering and a validation scheme. The scale-space clustering is an unsupervised method for grouping spatial data points based on the estimation of probability density function (PDF) using a Gaussian kernel with a variable scale parameter. It has been suggested that the detected cluster, represented as a mode of the PDF, can be validated by observing the lifetime of the mode in scale space. Statistical properties of the lifetime, however, are unclear. In this paper, we propose a concept of the ‘critical scale’ and explore perspectives on handling it for the cluster validation.

- Clustering | Pp. 218-232

doi: 10.1007/978-3-540-73499-4_18

Minimum Information Loss Cluster Analysis for Categorical Data

Jiří Grim; Jan Hora

The EM algorithm has been used repeatedly to identify latent classes in categorical data by estimating finite distribution mixtures of product components. Unfortunately, the underlying mixtures are not uniquely identifiable and, moreover, the estimated mixture parameters are starting-point dependent. For this reason we use the latent class model only to define a set of “elementary” classes by estimating a mixture of a large number components. We propose a hierarchical “bottom up” cluster analysis based on unifying the elementary latent classes sequentially. The clustering procedure is controlled by minimum information loss criterion.

- Clustering | Pp. 233-247

doi: 10.1007/978-3-540-73499-4_19

A Clustering Algorithm Based on Generalized Stars

Airel Pérez Suárez; José E. Medina Pagola

In this paper we present a new algorithm for document clustering called Generalized Star (GStar). This algorithm is a generalization of the Star algorithm proposed by Aslam , and recently improved by them and other researchers. In this method we introduced a new concept of star allowing a different star-shaped form with better overlapping clusters. The evaluation experiments on standard document collections show that the proposed algorithm outperforms previously defined methods and obtains a smaller number of clusters. Since the GStar algorithm is relatively simple to implement and is also efficient, we advocate its use for tasks that require clustering, such as information organization, browsing, topic tracking, and new topic detection.

- Clustering | Pp. 248-262

doi: 10.1007/978-3-540-73499-4_20

Evolving Committees of Support Vector Machines

D. Valincius; A. Verikas; M. Bacauskiene; A. Gelzinis

The main emphasis of the technique developed in this work for evolving committees of support vector machines (SVM) is on a two phase procedure to select salient features. In the first phase, clearly redundant features are eliminated based on the paired -test comparing the SVM output sensitivity-based saliency of the candidate and the noise feature. In the second phase, the genetic search integrating the steps of training, aggregation of committee members, and hyper-parameter as well as feature selection into the same learning process is employed. A small number of genetic iterations needed to find a solution is the characteristic feature of the genetic search procedure developed. The experimental tests performed on five real world problems have shown that significant improvements in correct classification rate can be obtained in a small number of iterations if compared to the case of using all the features available.

- Support Vector Machine | Pp. 263-275