Catálogo de publicaciones - libros
Selected Contributions in Data Analysis and Classification
Paula Brito ; Guy Cucumel ; Patrice Bertrand ; Francisco de Carvalho (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Statistical Theory and Methods; Data Mining and Knowledge Discovery; Pattern Recognition
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-73558-8
ISBN electrónico
978-3-540-73560-1
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
Hybrid -Means: Combining Regression-Wise and Centroid-Based Criteria for QSAR
Robert Stanforth; Evgueni Kolossov; Boris Mirkin
This paper further extends the ‘kernel’-based approach to clustering proposed by E. Diday in early 70s. According to this approach, a cluster’s centroid can be represented by parameters of any analytical model, such as linear regression equation, built over the cluster. We address the problem of producing regression-wise clusters to be separated in the input variable space by building a hybrid clustering criterion that combines the regression-wise clustering criterion with the conventional centroid-based one.
Part II - Clustering Methods | Pp. 225-233
Partitioning by Particle Swarm Optimization
Javier Trejos-Zelaya; Mario Villalobos-Arias
We propose a clustering algorithm using particle swarm optimization (PSO) for partitioning a set of objects in clusters, by defining a familiy of agents-partitions, each agent is defined by centroids in a -dimensional space; a centroid has an associated cluster, which is defined by the allocation of the objects to the nearest centroid. The agents move in the space according to PSO principles, that is, they move with random intensity in the direction of a vector called velocity, which results from the random sum of the best past position of this agent, the best overall agent, and the last direction. We compare the performance of the method with other heuristics also proposed by the authors, and with two classical methods.
Part II - Clustering Methods | Pp. 235-244
Concepts of a Discrete Random Variable
Richard Emilion
A formal concept is defined in the literature as a pair (extent, intent) with respect to a context which is usually empirical, as for example a sample of transactions. This is somewhat unsatisfying since concepts, though born from experiences, should not depend on them. In this paper we consider the above concepts as ‘empirical concepts’ and we define the notion of concept, in a context-free framework, as a limit intent, by proving, applying the large number law, that : Given a random variable χ taking its value in a countable σ-semilattice, the random intents of empirical concepts, with respect to a sample of χ, converge almost everywhere to a fixed deterministic limit, called a concept, whose identification shows that it only depends on the distribution P of χ. Moreover, the set of such concepts is the σ-semilattice generated by the support of χ and has even a structure of σ-lattice: the lattice concept of a random variable.
We also compute the mean number of concepts and frequent itemsets for a hierarchical Bernoulli mixtures model. Last, we propose an algorithm to find out maximal frequent itemsets by using minimal winning coalitions of P.
Part III - Conceptual Analysis of Data | Pp. 247-258
Mining Description Logics Concepts with Relational Concept Analysis
Marianne Huchard; Amedeo Napoli; Mohamed Rouane Hacene; Petko Valtchev
were originally intended to bring both more structure in data and more intelligibility in final results to statistical data analysis. We present here a framework of similar motivation, i.e., combining a data analysis method, — the concept analysis () — with a knowledge description language inspired by description logic () formalism. The focus is hence on proper handling of relations between individuals in the construction of formal concepts. We illustrate the relational concept analysis () framework which complements standard with a dedicated data format, a set of scaling operators, an iterative process for lattice construction, and translations to and from a language.
Part III - Conceptual Analysis of Data | Pp. 259-270
Representation of Concept Description by Multivalued Taxonomic Preordonance Variables
Israël-César Lerman; Philippe Peter
Mathematical representation of complex data knowledge is one of the most important problems in Classification and Data Mining. In this contribution we present an original and very general formalization of various types of knowledge. The specific data are endowed with biological descriptions of phlebotomine sandfly species. Relative to a descriptive categorical variable, subsets of categories values have to be distinguished. On the other hand, hierarchical dependencies between the descriptive variables, associated with the mother → daughter relation, have to be taken into account. Additionally, an ordinal similarity function on the modality set of each categorical variable. The knowledge description is formalized by means of a new type of descriptor that we call “Taxonomic preordonance variable with multiple choice”. Probabilistic similarity index between concepts described by such variables can be built.
Part III - Conceptual Analysis of Data | Pp. 271-284
Recent Advances in Conceptual Clustering: CLUSTER3
Ryszard S. Michalski; William D. Seeman
Conceptual clustering is a form of unsupervised learning that seeks clusters in data that represent simple and understandable concepts, rather than groupings of entities with high intra-cluster and low inter-cluster similarity, as conventional clustering. Another difference from conventional clustering is that conceptual clustering produces not only clusters but also their generalized descriptions, and that the descriptions are used for cluster evaluation, interpretation, and classification of new, previously unseen entities. Basic methodology of conceptual clustering and program CLUSTER3 implementing recent advances are briefly described. One important novelty in CLUSTER3 is the ability to generate clusters according to the from which clustering is to be performed. This is achieved through the (VAS) method. CLUSTER3’s performance is illustrated by its application to clustering a database of automobile fatality accidents.
Part III - Conceptual Analysis of Data | Pp. 285-297
Symbolic Dynamics in Text: Application to Automated Construction of Concept Hierarchies
Fionn Murtagh
Following a symbolic encoding of selected terms used in text, we determine symmetries that are furnished by local hierarchical structure. We develop this study so that hierarchical fragments can be used in a concept hierarchy, or ontology. By “letting the data speakrd in this way, we avoid the arbitrariness of approximately fitting a model to the data.
Part III - Conceptual Analysis of Data | Pp. 299-306
Average Consensus and Infinite Norm Consensus : Two Methods for Ultrametric Trees
Guy Cucumel
Consensus methods are widely used to combine hierarchies defined on a common set of n object. Many methods have been proposed during the last decade to combine hierarchies. One of these, the average consensus method, allows one to obtain a consensus solution that is representative of the initial profile of trees by minimizing the sum of the squared distances between this profile and the consensus solution. This problem is known to be NP-complete and one has to rely on heuristics to obtain a consensus result in such cases. As a consequence, the uniqueness and optimality of the solution is not guaranteed. The L-consensus that yields to a universal solution in a maximum of steps is an alternative to the average consensus procedure. The two methods will be presented and compared on a numerical example.
Part IV - Consensus Methods | Pp. 309-315
Consensus from Frequent Groupings
Bruno Leclerc
Let be a profile of classifications of a given set . We aim to aggregate into a unique consensus classification Classifications considered here are sets of classes which are not included into each other. To any integer p comprised between 1 and (both included), one makes correspond a which returns the maximal subsets of included in elements of at least of the . We give some properties and three characterizations of such consensus rules.
Part IV - Consensus Methods | Pp. 317-324
Consensus of Star Tree Hypergraphs
Fred R. McMorris; Robert C. Powers
Popular methods for forming the consensus of several hypergraphs of a given type (e.g., hierarchies, weak hierarchies) place a cluster in the output if it appears sufficiently often among the input hypergraphs. The simplest type of tree hypergraph is one whose clusters are subtrees of a star. This paper considers the possibility of forming consensus by simply counting the frequency of occurances of clusters for star hypergraphs.
Part IV - Consensus Methods | Pp. 325-329