Catálogo de publicaciones - libros

Compartir en
redes sociales


From Data and Information Analysis to Knowledge Engineering: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V. University of Magdeburg, March 9-11, 2005

Myra Spiliopoulou ; Rudolf Kruse ; Christian Borgelt ; Andreas Nürnberger ; Wolfgang Gaul (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-31313-7

ISBN electrónico

978-3-540-31314-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Berlin · Heidelberg 2006

Tabla de contenidos

Gene Selection in Classification Problems via Projections onto a Latent Space

Marilena Pillati; Cinzia Viroli

The analysis of gene expression data involves the observation of a very large number of variables (genes) on a few units (tissues). In such a context the recourse to conventional classification methods may be hard both for analytical and interpretative reasons. In this work a gene selection procedure for classification problems is addressed. The dimensionality reduction is based on the projections of genes along suitable directions obtained by Independent Factor Analysis (IFA). The performances of the proposed procedure are evaluated in the context of both supervised and unsupervised classification problems for different real data sets.

- Classification with Latent Variable Models | Pp. 182-189

The Recovery Performance of Two-mode Clustering Methods: Monte Carlo Experiment

Sabine Krolak-Schwerdt; Michael Wiedenbeck

In this paper, a Monte Carlo study on the performance of two-mode cluster methods is presented. The synthetical data sets were generated to correspond to two types of data consisting of overlapping as well as disjoint clusters. Furthermore, the data sets differed in cluster number, degrees of within-group homogeneity and between-group heterogeneity as well as degree of cluster overlap. We found that the methods performed very differently depending on type of data, number of clusters, homogeneity and cluster overlap.

- Multiway Classification and Data Analysis | Pp. 190-197

On the Comparability of Relialibility Measures: Bifurcation Analysis of Two Measures in the Case of Dichotomous Ratings

Thomas Ostermann; Reinhard Schuster

The problem of analysing interrater — agreement and — reliability is known both in human decision making and in machine interaction. Several measures have been developped in the last 100 years for this purpose, with Cohen’s Kappacoefficient to be the most popular one. Due to methodological considerations, the validity of kappa-type measures for interrater agreement has been discussed in a variety of papers. However, a global comparison of properties of these measures is currently still deficient. In our approach, we constructed an integral measure to evaluate the differences between two reliability measures for dichotomous ratings. Additionally, we studied bifurcation properties of the difference of these measures to quantify areas of minimal differences. From the methodological point of view, our integral-measure can also be used to construct other measures for interrater agreement.

- Multiway Classification and Data Analysis | Pp. 198-205

On Active Learning in Multi-label Classification

Klaus Brinker

In conventional multiclass classification learning, we seek to induce a prediction function from the domain of input patterns to a mutually exclusive set of class labels. As a straightforward generalization of this category of learning problems, so-called multi-label classification allows for input patterns to be associated with multiple class labels simultaneously. Text categorization is a domain of particular relevance which can be viewed as an instance of this setting. While the process of labeling input patterns for generating training sets already constitutes a major issue in conventional classification learning, it becomes an even more substantial matter of relevance in the more complex multi-label classification setting. We propose a novel active learning strategy for reducing the labeling effort and conduct an experimental study on the well-known Reuters-21578 text categorization benchmark dataset to demonstrate the efficiency of our approach.

- Ranking, Multi-label Classification, Preferences | Pp. 206-213

From Ranking to Classification: A Statistical View

Stéphan Clémençon; Gábor Lugosi; Nicolas Vayatis

In applications related to , the goal is not only to build a classifier for deciding whether a document among a list is relevant or not, but to learn a scoring function : → ℝ for ranking all possible documents with respect to their relevancy. Here we show how the boils down to binary classification with dependent data when accuracy is measured by the . The natural estimate of the risk being of the form of a , consistency of methods based on empirical risk minimization is studied using the theory of -processes. Taking advantage of this specific form, we prove that fast rates of convergence may be achieved under general noise assumptions.

- Ranking, Multi-label Classification, Preferences | Pp. 214-221

Assessing Unidimensionality within PLS Path Modeling Framework

Karin Sahmer; Mohamed Hanafi; Mostafa El Qannari

In very many applications and, in particular, in PLS path modeling, it is of paramount importance to assess whether a set of variables is unidimensional. For this purpose, different methods are discussed. In addition to methods generally used in PLS path modeling, methods for the determination of the number of components in principal components analysis are considered. Two original methods based on permutation procedures are also proposed. The methods are compared to each others by means of a simulation study.

- PLS Path Modeling, PLS Regression and Classification | Pp. 222-229

The Partial Robust M-approach

Sven Serneels; Christophe Croux; Peter Filzmoser; Pierre J. Van Espen

The PLS approach is a widely used technique to estimate path models relating various blocks of variables measured from the same population. It is frequently applied in the social sciences and in economics. In this type of applications, deviations from normality and outliers may occur, leading to an efficiency loss or even biased results. In the current paper, a robust path model estimation technique is being proposed, the partial robust M (PRM) approach. In an example its benefits are illustrated.

- PLS Path Modeling, PLS Regression and Classification | Pp. 230-237

Classification in PLS Path Models and Local Model Optimisation

Silvia Squillacciotti

In this paper, a methodology is proposed which can be used for the identification of classes of units showing homogeneous behavioural models estimated through PLS Path Modelling. The proposed methodology aims at discovering or validating the existence of classes of units in PLS Path models in a predictive-oriented logic, such as it has been proposed, in the framework of PLS Regression, with PLS Typological Regression. An application to a study on customer satisfaction and loyalty is shown.

- PLS Path Modeling, PLS Regression and Classification | Pp. 238-245

Hierarchical Clustering by Means of Model Grouping

Claudio Agostinelli; Paolo Pellizzari

In many applications we are interested in finding clusters of data that share the same properties, like linear shape. We propose a hierarchical clustering procedure that merges groups if they are fitted well by the same linear model. The representative orthogonal model of each cluster is estimated robustly using iterated LQS regressions. We apply the method to two artificial datasets, providing a comparison of results against other non-hierarchical methods that can estimate linear clusters.

- Robust Methods in Multivariate Statistics | Pp. 246-253

Deepest Points and Least Deep Points: Robustness and Outliers with MZE

Claudia Becker; Sebastian Paris Scholz

Multivariate outlier identification is often based on robust location and scatter estimates and usually performed relative to an elliptically shaped distribution. On the other hand, the idea of outlying observations is closely related to the notion of data depth, where observations with minimum depth are potential outliers. Here, we are not generally bound to the idea of an elliptical shape of the underlying distribution. Koshevoy and Mosler (1997) introduced zonoid trimmed regions which define a data depth. Recently, Paris Scholz (2002) and Becker and Paris Scholz (2004) investigated a new approach for robust estimation of convex bodies resulting from zonoids. We follow their approach and explore how the minimum volume zonoid (MZE) estimators can be used for multivariate outlier identification in the case of non-elliptically shaped null distributions.

- Robust Methods in Multivariate Statistics | Pp. 254-261