Catálogo de publicaciones - libros

Compartir en
redes sociales


From Data and Information Analysis to Knowledge Engineering: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V. University of Magdeburg, March 9-11, 2005

Myra Spiliopoulou ; Rudolf Kruse ; Christian Borgelt ; Andreas Nürnberger ; Wolfgang Gaul (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-31313-7

ISBN electrónico

978-3-540-31314-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Berlin · Heidelberg 2006

Tabla de contenidos

An Indicator for the Number of Clusters: Using a Linear Map to Simplex Structure

Marcus Weber; Wasinee Rungsarityotin; Alexander Schliep

The problem of clustering data can be formulated as a graph partitioning problem. In this setting, spectral methods for obtaining optimal solutions have received a lot of attention recently. We describe Perron Cluster Cluster Analysis (PCCA) and establish a connection to spectral graph partitioning. We show that in our approach a clustering can be efficiently computed by mapping the eigenvector data onto a simplex. To deal with the prevalent problem of noisy and possibly overlapping data we introduce the Min-chi indicator which helps in confirming the existence of a partition of the data and in selecting the number of clusters with quite favorable performance. Furthermore, if no hard partition exists in the data, the Min-chi can guide in selecting the number of modes in a mixture model. We close with showing results on simulated data generated by a mixture of Gaussians.

- Clustering | Pp. 103-110

On the Use of Some Classification Quality Measure to Construct Mean Value Estimates Under Nonresponse

Wojciech Gamrot

Several procedures have been developed for estimating the mean value of population characteristic under nonresponse. Usually estimators use available auxiliary information as a basis for the nonresponse correction. Some of them rely on classification procedures which allow to divide the population under study into subsets of units which are similar to sample respondents or sample nonrespondents. This allows to approximate the proportion of respondent and nonrespondent stratum in the population. Nonrespondents are then subsampled and estimates of population parameters are constructed. Such estimators are more accurate than the standard estimator for two-phase sample when distributions of auxiliary variables in respondent and nonrespondent stratum differ significantly. However, in the case when these distributions are similar the improvement disappears and classification-based estimator may be less accurate than the standard one. In this paper another mean value estimator is proposed in order to eliminate this disadvantage. It is constructed as a combination of a standard (unbiased) two-phase estimator and a classification-based estimator. The weights of this combination are functions of some classification quality measure. The proposed mean value estimator should behave like a classification-based estimator when auxiliary characteristics seem to be useful for classification and behave like a standard estimator otherwise. The results of Monte Carlo simulation experiments aimed at assessing the properties of the proposed combined estimator are presented.

- Discriminant Analysis | Pp. 111-118

A Wrapper Feature Selection Method for Combined Tree-based Classifiers

Eugeniusz Gatnar

The aim of feature selection is to find the subset of features that maximizes the classifier performance. Recently, we have proposed a correlation-based feature selection method for the classifier ensembles based on Hellwig heuristic (CFSH).

In this paper we show that further improvement of the ensemble accuracy can be achieved by combining the CFSH method with the wrapper approach.

- Discriminant Analysis | Pp. 119-125

Input Variable Selection in Kernel Fisher Discriminant Analysis

Nelmarie Louw; Sarel J. Steel

Variable selection serves a dual purpose in statistical classification problems: it enables one to identify the input variables which separate the groups well, and a classification rule based on these variables frequently has a lower error rate than the rule based on all the input variables. Kernel Fisher discriminant analysis (KFDA) is a recently proposed powerful classification procedure, frequently applied in cases characterized by large numbers of input variables. The important problem of eliminating redundant input variables before implementing KFDA is addressed in this paper. A backward elimination approach is employed, and a criterion which can be used for recursive elimination of input variables is proposed. The merit of the proposal is evaluated in a simulation study and in terms of its performance when applied to two benchmark data sets.

- Discriminant Analysis | Pp. 126-133

The Wavelet Packet Based Cepstral Features for Open Set Speaker Classification in Marathi

Hemant A. Patil; P. K. Dutta; T. K. Basu

In this paper, a new method of feature extraction based on perceptually meaningful subband decomposition of speech signal has been described. based speaker classification in Marathi language has been attempted in the open set mode using a polynomial classifier. The method consists of dividing the speech signal into nonuniform subbands in approximate Mel-scale using an admissible wavelet packet filterbank and modeling each dialectal zone with the 2 and 3 order polynomial expansions of feature vector.

- Discriminant Analysis | Pp. 134-141

A New Effective Algorithm for Stepwise Principle Components Selection in Discriminant Analysis

Ekaterina Serikova; Eugene Zhuk

The problem of reducing the dimensionality of multivariate Gaussian observations is considered. The efficiency of discriminant analysis procedure based on well-known method of principle components selection is analytically investigated. The average decrease of interclass distances square is presented as a new criterion of feature selection directly connected with the classification error probability. New stepwise discriminant analysis procedure in the space of principal components based on this criterion is proposed and its efficiency is experimentally and analytically investigated.

- Discriminant Analysis | Pp. 142-149

A Comparison of Validation Methods for Learning Vector Quantization and for Support Vector Machines on Two Biomedical Data Sets

David Sommer; Martin Golz

We compare two comprehensive classification algorithms, support vector machines (SVM) and several variants of learning vector quantization (LVQ), with respect to different validation methods. The generalization ability is estimated by “multiple-hold-out” (MHO) and by “leave-one-out” (LOO) cross v method. The -method, a further estimation method, which is only applicable for SVM and is computationally more efficient, is also used.

Calculations on two different biomedical data sets generated of experimental data measured in our own laboratory are presented. The first data set contains 748 feature vectors extracted of posturographic signals which were obtained in investigations of balance control in upright standing of 48 young adults. Two different classes are labelled as “without alcoholic impairment” and “with alcoholic impairment”. This classification task aims the detection of small unknown changes in a relative complex signal with high inter-individual variability.

The second data set contains 6432 feature vectors extracted of electroencephalographic and electroocculographic signals recorded during overnight driving simulations of 22 young adults. Short intrusions of sleep during driving, so-called microsleep events, were observed. They form examples of the first class. The second class contains examples of fatigue states, whereas driving is still possible. If microsleep events happen in typical states of brain activity, the recorded signals should contain typical alterations, and therefore discrimination from signals of the second class, which do not refer to such states, should be possible.

Optimal kernel parameters of SVM are found by searching minimal test errors with all three validation methods. Results obtained on both different biomedical data sets show different optimal kernel parameters depending on the validation method. It is shown, that the -method seems to be biased and therefore LOO or MHO method should be preferred.

A comparison of eight different variants of LVQ and six other classification methods using MHO validation yields that SVM performs best for the second and more complex data set and SVM, GRLVQ and OLVQ1 show nearly the same performance for the first data set.

- Discriminant Analysis | Pp. 150-157

Discriminant Analysis of Polythetically Described Older Palaeolithic Stone Flakes: Possibilities and Questions

Thomas Weber

Archaeological inventories of flaked stone artefacts are the most important sources for the reconstruction of mankind’s earliest history. It is necessary to evaluate also the blanks of tool production (“waste”) as the most numerous artefact category using statistical methods including features like absolute measurements and form quotients of the pieces and their striking platforms, the flaking angles, and the dorsal degradation data. In Central Europe, these three major chrono-technological groups of finds can be determined: from the Middle Pleistocene interglacial(s) 250,000 or 300,000, the Early Saalian glacial perhaps 200,000, and from the Early Weichselian glacial 100,000-60,000 years ago—represented by the inventories from Wallendorf, Markkleeberg, and Königsaue B. In this study, the attempt has been undertaken to separate these flake inventories using linear discriminant analysis and to use the results for the comparison with other artefact complexes with rather unclear chrono-technological positions.

- Discriminant Analysis | Pp. 158-165

Model-based Density Estimation by Independent Factor Analysis

Daniela G. Calò; Angela Montanari; Cinzia Viroli

In this paper we propose a model based density estimation method which is rooted in Independent Factor Analysis (IFA). IFA is in fact a generative latent variable model, whose structure closely resembles the one of an ordinary factor model but which assumes that the latent variables are mutually independent and distributed according to Gaussian mixtures. From these assumptions, the possibility of modelling the observed data density as a mixture of Gaussian distributions too derives. The number of free parameters is controlled through the dimension of the latent factor space. The model is proved to be a special case of mixture of factor analyzers which is less parameterized than the original proposal by McLachlan and Peel (2000). We illustrate the use of IFA density estimation for supervised classification both on real and simulated data.

- Classification with Latent Variable Models | Pp. 166-173

Identifying Multiple Cluster Structures Through Latent Class Models

Giuliano Galimberti; Gabriele Soffritti

Many studies addressing the problem of selecting or weighting variables for cluster analysis assume that all the variables define a unique classification of units. However it is also possible that different classifications of units can be obtained from different subsets of variables. In this paper this problem is considered from a model-based perspective. Limitations and drawbacks of standard latent class cluster analysis are highlighted and a new procedure able to overcome these difficulties is proposed. The results obtained from the application of this procedure on simulated and real data sets are presented and discussed.

- Classification with Latent Variable Models | Pp. 174-181