Catálogo de publicaciones - libros

Compartir en
redes sociales


Selected Contributions in Data Analysis and Classification

Paula Brito ; Guy Cucumel ; Patrice Bertrand ; Francisco de Carvalho (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Statistical Theory and Methods; Data Mining and Knowledge Discovery; Pattern Recognition

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73558-8

ISBN electrónico

978-3-540-73560-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Quality Issues in Symbolic Data Analysis

Haralambos Papageorgiou; Maria Vardaki

Symbolic Data Analysis is an extension of Classical Data Analysis to more complex data types and tables through the application of certain conditions, where underlying concepts are vital for their further processing. Therefore, the assessment of the quality of Symbolic Data depends extensively on the quality of the collected classical data. However, even though various criteria and indicators have been established to assess quality in classsical statistics, the specificities of Symbolic Data construction challenge the efficacy of the classical quality assessment components. In this paper we initially refer to the quality dimensions that can be considered for the classical data and then emphasize on the extent that these can be applied to symbolic data, taking into account the peculiarities of symbolic approach.

Part I - Analysis of Symbolic Data | Pp. 113-122

Dynamic Clustering of Histogram Data: Using the Right Metric

Rosanna Verde; Antonio Irpino

In this paper we present a review of some metrics to be proposed as allocation functions in the Dynamic Clustering Algorithm (DCA) when data are distribution or histograms of values. The choice of the most suitable distance plays a central role in the DCA because it is related to the criterion function that is optimized. Moreover, it has to be consistent with the which represents the cluster. In such a way, for each proposed metric, we identify the corresponding according to the minimization of the criterion function and then to the best fitting between the partition and the best representation of the clusters. Finally, we focus our attention on a Wassertein based distance showing its optimality in partitioning a set of histogram data with respect to a representation of the clusters by means of their barycenter expressed in terms of distributions.

Part I - Analysis of Symbolic Data | Pp. 123-134

Beyond the Pyramids: Rigid Clustering Systems

Jean-Pierre Barthélemy; Gentian Gusho; Christophe Osswald

This paper is devoted to, more or less new extensions of the notion of pyramid introduced by Diday (1984, 1986) and Fichet (1984, 1986). It is related to the notion of “rigid clustering system” or “rigid hypergraph” (topics related to combinatorial theory). Pyramids are representations of clusterings systems whose classes are connected subgraphs of a path (or, in other words, intervals of some linear order). More generally, we shall consider clustering systems whose classes are connected components of some graph. After reviewing some classical results, we shall emphasize relations between rigidity and minimal spanning trees.

Part II - Clustering Methods | Pp. 137-150

Indirect Blockmodeling of 3-Way Networks

Vladimir Batagelj; Anuška Ferligoj; Patrick Doreian

An approach to the indirect blockmodeling of 3-way network data is presented for structural equivalence. This equivalence type is defined formally and expressed in terms of an interchangeability condition that is used to construct a compatible dissimilarity. Using Ward’s method, the three dimensional partitioning is obtained via hierarchical clustering and represented diagrammatically. Artificial and real data are used to illustrate these methods.

Part II - Clustering Methods | Pp. 151-159

Clustering Methods: A History of -Means Algorithms

Hans-Hermann Bock

This paper surveys some historical issues related to the well-known k-means algorithm in cluster analysis. It shows to which authors the different versions of this algorithm can be traced back, and which were the underlying applications. We sketch various generalizations (with references also to Diday’s work) and thereby underline the usefulness of the -means approach in data analysis.

Part II - Clustering Methods | Pp. 161-172

Overlapping Clustering in a Graph Using -Means and Application to Protein Interactions Networks

Irène Charon; Lucile Denoeud; Olivier Hudry

In this article, we design an overlapping clustering method in a graph in order to deal with a biological issue: the proteins annotation. Given an unweighted and undirected graph , we search for subgraphs of that are dense in edges. The method consists in three steps. First we determine some intial kernels of the classes by means of a local density function; then we improve these kernels using a -means process; last the kernels are extended to overlapping classes. The method is tested on random graphs and finally applied to a protein interactions network.

Part II - Clustering Methods | Pp. 173-182

Species Clustering via Classical and Interval Data Representation

Marie Chavent

Consider a data table where objects are described by numerical variables and a qualitative variable with m categories. Interval data representation and interval data clustering methods are useful for clustering the categories. We study in this paper a data set of fish contaminated with mercury. We will see how classical or interval data representation can be used for clustering the species of fish and not the fishes themselves. We will compare the results obtained with the two approaches (classical or interval) in the particular case of this application in Ecotoxicology.

Part II - Clustering Methods | Pp. 183-191

Looking for High Density Zones in a Graph

Tristan Colombo; Alain Guénoche

The aim of this paper is to introduce new methods to build dense classes of vertices in a graph. These classes correspond to connected parts having a proportion of inner edges which is higher than the average on the whole graph. They are progressively built; a kernel of each class is first established, then they are extended to connected elements and finally to a partition. Several density fonctions are compared. A Monte-Carlo validation of this method is made from random graphs fulfilling some density conditions.

Part II - Clustering Methods | Pp. 193-201

Block Bernoulli Parsimonious Clustering Models

Gérard Govaert; Mohamed Nadif

When the data consists of a set of objects described by a set of binary variables, we have embedded the block clustering problem of binary table in the mixture approach. In using a Bernoulli model and adopting the classification maximum likelihood principle we perform an adapted version of the block CEM algorithm. In this paper, we propose different parsimonious models by imposing constraints on the Bernoulli parameter.

Part II - Clustering Methods | Pp. 203-212

Cluster Analysis Based on Posets

Melvin F. Janowitz

When dissimilarities are measured in a space other than the reals, it is argued that previous models for cluster analysis are not adequate. Possible new models will be explored. It is also shown that formal concept analysis may be viewed as a special case of a Boolean dissimilarity coefficient. A persistent underlying theme involves generalized notions of adjoints of order preserving mappings between posets.

Part II - Clustering Methods | Pp. 213-223