Catálogo de publicaciones - libros
Selected Contributions in Data Analysis and Classification
Paula Brito ; Guy Cucumel ; Patrice Bertrand ; Francisco de Carvalho (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Statistical Theory and Methods; Data Mining and Knowledge Discovery; Pattern Recognition
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-73558-8
ISBN electrónico
978-3-540-73560-1
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
Knowledge Management in Environmental Sciences with : Application to Systematics of Corals of the Mascarene Archipelago
Noel Conruyt; David Grosser
Systematics, the scientific discipline that deals with listing, describing, naming, classifying and identifying living organisms is a central point in environmental sciences. Expertise is becoming rare and for future biodiversity studies relying on species identification, environmental technicians will only be left with monographic descriptions and collections in museums.
With the emergence of knowledge management, it is possible to enhance the use of systematician’s expertise, by providing them with collaborative tools to widely manage, share and transmit their knowledge. Knowledge engineering in Systematics means to revise taxa and descriptions of specimens. We have designed an Iterative Knowledge Base System — — for achieving these goals. It applies the scientific method in biology (conjecture and test) with a natural process of knowledge management. The product of such a tool is a collaborative knowledge base of a domain, that can evolve (by updating the knowledge) and be connected to distributed databases (bibliographic, photographic, geographic, taxonomic, etc.) that will yield information on species after the identification process of a new specimen.
This paper presents an overview of the methodology, the methods (identification tree and case-based reasoning) and the validation process used to build knowledge bases in Systematics. An application on corals of the Mascarene Archipelago is given as a case study.
Part V - Data Analysis, Data Mining, and KDD | Pp. 333-343
Unsupervised Learning Informational Limit in Case of Sparsely Described Examples
Jean-Gabriel Ganascia; Julien Velcin
This paper presents a model characterizing unsupervised learning from an information theoretic point of view. Under some hypothesis, it defines a theoretical quality criterion, which corresponds to the informational limit that bounds the learning ability of any clustering algorithm. This quality criterion depends on the information content of the learning set. It is relevant when examples are sparsely described, i.e. when most of the descriptors are missing. This theoretical limit of any unsupervised learning algorithm is then compared to the actual learning quality of different clustering algorithms (EM, COBWEB and PRESS). This empirical comparison is based on the use of artificial data sets, which are randomly degraded. Finally, the paper shows that the results of PRESS, an algorithm specifically designed to learn from sparsely described examples, are very closed to the theoretical upper bound quality.
Part V - Data Analysis, Data Mining, and KDD | Pp. 345-355
Data Analysis and Operations Research
Wolfgang Gaul
Data Analysis and Operations Research are two overlapping sciences as there are, e.g., many data problems in which optimization techniques from Operations Research have to be applied to detect best fitting structures (under suitable constraints) in the underlying data. On the other hand, Operations Research is often based on model formulations for which some model parameters might be unknown or even unobservable. In such cases Operations Research problems consist of a data collection and analysis part and an optimization part in which solutions dependent on model parameters (derived from available information via Data Analysis techniques) are calculated.
We give typical examples for research directions where Data Analysis and Operations Research overlap, start with the topic of pyramidal clustering as one of the fields of interest of Edwin Diday, and present methodology how selected problems can be tackled via a combination of techniques from both scientific areas.
Part V - Data Analysis, Data Mining, and KDD | Pp. 357-366
Reduction of Redundant Rules in Statistical Implicative Analysis
Régis Gras; Pascale Kuntz
Quasi-implications, also called association rules in data mining, have become the major concept to represent implicative trends between itemset patterns. To make their interpretation easier, two problems have become crucial: filtering the most interestingness rules and structuring them to highlight their relationships. In this paper, we put ourselves in the Statistical Implicative Analysis framework, and we propose a new methodology for reducing rule sets by detecting redundant rules. We define two new measures based on the Shannon’s entropy and the Gini’s coefficient.
Part V - Data Analysis, Data Mining, and KDD | Pp. 367-376
Mining Personal Banking Data to Detect Fraud
David J. Hand
Fraud detection in the retail banking sector poses some novel and challenging statistical problems. For example, the data sets are large, and yet each transaction must be examined and decisions must be made in real time, the transactions are often heterogeneous, differing substantially even within an individual account, and the data sets are typically very unbalanced, with only a tiny proportion of transactions belonging to the fraud class. We review the problem, its magnitude, and the various kinds of statistical tools have been developed for this application. The area is particularly unusual because the patterns to be detected change in response to the detection strategies which are developed: the very success of the statistical models leads to the need for new ones to be developed.
Part V - Data Analysis, Data Mining, and KDD | Pp. 377-386
Finding Rules in Data
Tu-Bao Ho
In the first year of my preparation for doctor thesis at INRIA in the group of Edwin, I worked on the construction of an inference engine and a knowledge base, by consulting various group members, for building an expert system guiding the data analysis package SICLA of the group. One day, Edwin asked me whether one can automatically generate rules for expert systems from data, and I started my new research direction. Since that time, my main work has been machine learning, especially finding rules in data. This paper briefly presents some learning methods we have developed.
Part V - Data Analysis, Data Mining, and KDD | Pp. 387-396
Mining Biological Data Using Pyramids
Géraldine Polaillon; Laure Vescovo; Magali Michaut; Jean-Christophe Aude
This paper is a review of promising applications of pyramidal classification to biological data. We show that overlapping and ordering properties can give new insights that can not be achieved using more classical methods. We examplify our point using three applications: (i) a genome scale sequence analysis, (ii) a new progressive multiple sequence alignment method, (iii) a cluster analysis of transcriptomic data.
Part V - Data Analysis, Data Mining, and KDD | Pp. 397-408
Association Rules for Categorical and Tree Data
Henri Ralambondrainy; Jean Diatta
The association rule mining problem is among the most popular data mining techniques. Association rules, whose significance is measured via quality indices, have been intensively studied for binary data. In this paper, we deal with association rules in the framework of categorical or tree-like-valued attributes.
Part V - Data Analysis, Data Mining, and KDD | Pp. 409-417
Induction Graphs for Data Mining
Djamel Abdelkader Zighed
Induction graphs, which are a generalization of decision trees, have a special place among the methods of Data Mining. Indeed, they generate lattice graphs instead of trees. They perform well, are capable of handling data in large volumes, are relatively easy for a non-specialist to interpret, and are applicable without restriction on data of any type (qualitative, quantitative). The explosion of softwares based on the paradigm of decision trees and more generally induction graphs is a rather strong evidence of their success. In this article, we present a complete method of induction graphs; the method SIPINA.
Part V - Data Analysis, Data Mining, and KDD | Pp. 419-430
Clustering of Molecules: Influence of the Similarity Measures
Samia Aci; Gilles Bisson; Sylvaine Roy; Samuel Wieczorek
In this paper, we present the results of an experimental study to analyze the effect of various similarity (or distance) measures on the clustering quality of a set of molecules. We mainly focused on the clustering approaches able to directly deal with the 2D representation of the molecules (, graphs). In such a context, we found that it seems relevant to use an approach based on asymmetrical measures of similarity. Our experiments are carried out on a dataset coming from the High Throughput Screening HTS domain.
Part VI - Dissimilarities: Structures and Indices | Pp. 433-444