Catálogo de publicaciones - libros

Compartir en
redes sociales


Selected Contributions in Data Analysis and Classification

Paula Brito ; Guy Cucumel ; Patrice Bertrand ; Francisco de Carvalho (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Statistical Theory and Methods; Data Mining and Knowledge Discovery; Pattern Recognition

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73558-8

ISBN electrónico

978-3-540-73560-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Hybrid -Means: Combining Regression-Wise and Centroid-Based Criteria for QSAR

Robert Stanforth; Evgueni Kolossov; Boris Mirkin

This paper further extends the ‘kernel’-based approach to clustering proposed by E. Diday in early 70s. According to this approach, a cluster’s centroid can be represented by parameters of any analytical model, such as linear regression equation, built over the cluster. We address the problem of producing regression-wise clusters to be separated in the input variable space by building a hybrid clustering criterion that combines the regression-wise clustering criterion with the conventional centroid-based one.

Part II - Clustering Methods | Pp. 225-233

Partitioning by Particle Swarm Optimization

Javier Trejos-Zelaya; Mario Villalobos-Arias

We propose a clustering algorithm using particle swarm optimization (PSO) for partitioning a set of objects in clusters, by defining a familiy of agents-partitions, each agent is defined by centroids in a -dimensional space; a centroid has an associated cluster, which is defined by the allocation of the objects to the nearest centroid. The agents move in the space according to PSO principles, that is, they move with random intensity in the direction of a vector called velocity, which results from the random sum of the best past position of this agent, the best overall agent, and the last direction. We compare the performance of the method with other heuristics also proposed by the authors, and with two classical methods.

Part II - Clustering Methods | Pp. 235-244

Concepts of a Discrete Random Variable

Richard Emilion

A formal concept is defined in the literature as a pair (extent, intent) with respect to a context which is usually empirical, as for example a sample of transactions. This is somewhat unsatisfying since concepts, though born from experiences, should not depend on them. In this paper we consider the above concepts as ‘empirical concepts’ and we define the notion of concept, in a context-free framework, as a limit intent, by proving, applying the large number law, that : Given a random variable χ taking its value in a countable σ-semilattice, the random intents of empirical concepts, with respect to a sample of χ, converge almost everywhere to a fixed deterministic limit, called a concept, whose identification shows that it only depends on the distribution P of χ. Moreover, the set of such concepts is the σ-semilattice generated by the support of χ and has even a structure of σ-lattice: the lattice concept of a random variable.

We also compute the mean number of concepts and frequent itemsets for a hierarchical Bernoulli mixtures model. Last, we propose an algorithm to find out maximal frequent itemsets by using minimal winning coalitions of P.

Part III - Conceptual Analysis of Data | Pp. 247-258

Mining Description Logics Concepts with Relational Concept Analysis

Marianne Huchard; Amedeo Napoli; Mohamed Rouane Hacene; Petko Valtchev

were originally intended to bring both more structure in data and more intelligibility in final results to statistical data analysis. We present here a framework of similar motivation, i.e., combining a data analysis method, — the concept analysis () — with a knowledge description language inspired by description logic () formalism. The focus is hence on proper handling of relations between individuals in the construction of formal concepts. We illustrate the relational concept analysis () framework which complements standard with a dedicated data format, a set of scaling operators, an iterative process for lattice construction, and translations to and from a language.

Part III - Conceptual Analysis of Data | Pp. 259-270

Representation of Concept Description by Multivalued Taxonomic Preordonance Variables

Israël-César Lerman; Philippe Peter

Mathematical representation of complex data knowledge is one of the most important problems in Classification and Data Mining. In this contribution we present an original and very general formalization of various types of knowledge. The specific data are endowed with biological descriptions of phlebotomine sandfly species. Relative to a descriptive categorical variable, subsets of categories values have to be distinguished. On the other hand, hierarchical dependencies between the descriptive variables, associated with the mother → daughter relation, have to be taken into account. Additionally, an ordinal similarity function on the modality set of each categorical variable. The knowledge description is formalized by means of a new type of descriptor that we call “Taxonomic preordonance variable with multiple choice”. Probabilistic similarity index between concepts described by such variables can be built.

Part III - Conceptual Analysis of Data | Pp. 271-284

Recent Advances in Conceptual Clustering: CLUSTER3

Ryszard S. Michalski; William D. Seeman

Conceptual clustering is a form of unsupervised learning that seeks clusters in data that represent simple and understandable concepts, rather than groupings of entities with high intra-cluster and low inter-cluster similarity, as conventional clustering. Another difference from conventional clustering is that conceptual clustering produces not only clusters but also their generalized descriptions, and that the descriptions are used for cluster evaluation, interpretation, and classification of new, previously unseen entities. Basic methodology of conceptual clustering and program CLUSTER3 implementing recent advances are briefly described. One important novelty in CLUSTER3 is the ability to generate clusters according to the from which clustering is to be performed. This is achieved through the (VAS) method. CLUSTER3’s performance is illustrated by its application to clustering a database of automobile fatality accidents.

Part III - Conceptual Analysis of Data | Pp. 285-297

Symbolic Dynamics in Text: Application to Automated Construction of Concept Hierarchies

Fionn Murtagh

Following a symbolic encoding of selected terms used in text, we determine symmetries that are furnished by local hierarchical structure. We develop this study so that hierarchical fragments can be used in a concept hierarchy, or ontology. By “letting the data speakrd in this way, we avoid the arbitrariness of approximately fitting a model to the data.

Part III - Conceptual Analysis of Data | Pp. 299-306

Average Consensus and Infinite Norm Consensus : Two Methods for Ultrametric Trees

Guy Cucumel

Consensus methods are widely used to combine hierarchies defined on a common set of n object. Many methods have been proposed during the last decade to combine hierarchies. One of these, the average consensus method, allows one to obtain a consensus solution that is representative of the initial profile of trees by minimizing the sum of the squared distances between this profile and the consensus solution. This problem is known to be NP-complete and one has to rely on heuristics to obtain a consensus result in such cases. As a consequence, the uniqueness and optimality of the solution is not guaranteed. The L-consensus that yields to a universal solution in a maximum of steps is an alternative to the average consensus procedure. The two methods will be presented and compared on a numerical example.

Part IV - Consensus Methods | Pp. 309-315

Consensus from Frequent Groupings

Bruno Leclerc

Let be a profile of classifications of a given set . We aim to aggregate into a unique consensus classification Classifications considered here are sets of classes which are not included into each other. To any integer p comprised between 1 and (both included), one makes correspond a which returns the maximal subsets of included in elements of at least of the . We give some properties and three characterizations of such consensus rules.

Part IV - Consensus Methods | Pp. 317-324

Consensus of Star Tree Hypergraphs

Fred R. McMorris; Robert C. Powers

Popular methods for forming the consensus of several hypergraphs of a given type (e.g., hierarchies, weak hierarchies) place a cluster in the output if it appears sufficiently often among the input hypergraphs. The simplest type of tree hypergraph is one whose clusters are subtrees of a star. This paper considers the possibility of forming consensus by simply counting the frequency of occurances of clusters for star hypergraphs.

Part IV - Consensus Methods | Pp. 325-329