Catálogo de publicaciones - libros

Compartir en
redes sociales

Selected Contributions in Data Analysis and Classification

Paula Brito ; Guy Cucumel ; Patrice Bertrand ; Francisco de Carvalho (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Statistical Theory and Methods; Data Mining and Knowledge Discovery; Pattern Recognition

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73558-8

ISBN electrónico

978-3-540-73560-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-73560-1_31

Knowledge Management in Environmental Sciences with : Application to Systematics of Corals of the Mascarene Archipelago

Noel Conruyt; David Grosser

Systematics, the scientific discipline that deals with listing, describing, naming, classifying and identifying living organisms is a central point in environmental sciences. Expertise is becoming rare and for future biodiversity studies relying on species identification, environmental technicians will only be left with monographic descriptions and collections in museums.

With the emergence of knowledge management, it is possible to enhance the use of systematician’s expertise, by providing them with collaborative tools to widely manage, share and transmit their knowledge. Knowledge engineering in Systematics means to revise taxa and descriptions of specimens. We have designed an Iterative Knowledge Base System — — for achieving these goals. It applies the scientific method in biology (conjecture and test) with a natural process of knowledge management. The product of such a tool is a collaborative knowledge base of a domain, that can evolve (by updating the knowledge) and be connected to distributed databases (bibliographic, photographic, geographic, taxonomic, etc.) that will yield information on species after the identification process of a new specimen.

This paper presents an overview of the methodology, the methods (identification tree and case-based reasoning) and the validation process used to build knowledge bases in Systematics. An application on corals of the Mascarene Archipelago is given as a case study.

Part V - Data Analysis, Data Mining, and KDD | Pp. 333-343

doi: 10.1007/978-3-540-73560-1_32

Unsupervised Learning Informational Limit in Case of Sparsely Described Examples

Jean-Gabriel Ganascia; Julien Velcin

This paper presents a model characterizing unsupervised learning from an information theoretic point of view. Under some hypothesis, it defines a theoretical quality criterion, which corresponds to the informational limit that bounds the learning ability of any clustering algorithm. This quality criterion depends on the information content of the learning set. It is relevant when examples are sparsely described, i.e. when most of the descriptors are missing. This theoretical limit of any unsupervised learning algorithm is then compared to the actual learning quality of different clustering algorithms (EM, COBWEB and PRESS). This empirical comparison is based on the use of artificial data sets, which are randomly degraded. Finally, the paper shows that the results of PRESS, an algorithm specifically designed to learn from sparsely described examples, are very closed to the theoretical upper bound quality.

Part V - Data Analysis, Data Mining, and KDD | Pp. 345-355

doi: 10.1007/978-3-540-73560-1_33

Data Analysis and Operations Research

Wolfgang Gaul

Data Analysis and Operations Research are two overlapping sciences as there are, e.g., many data problems in which optimization techniques from Operations Research have to be applied to detect best fitting structures (under suitable constraints) in the underlying data. On the other hand, Operations Research is often based on model formulations for which some model parameters might be unknown or even unobservable. In such cases Operations Research problems consist of a data collection and analysis part and an optimization part in which solutions dependent on model parameters (derived from available information via Data Analysis techniques) are calculated.

We give typical examples for research directions where Data Analysis and Operations Research overlap, start with the topic of pyramidal clustering as one of the fields of interest of Edwin Diday, and present methodology how selected problems can be tackled via a combination of techniques from both scientific areas.

Part V - Data Analysis, Data Mining, and KDD | Pp. 357-366

doi: 10.1007/978-3-540-73560-1_34

Reduction of Redundant Rules in Statistical Implicative Analysis

Régis Gras; Pascale Kuntz

Quasi-implications, also called association rules in data mining, have become the major concept to represent implicative trends between itemset patterns. To make their interpretation easier, two problems have become crucial: filtering the most interestingness rules and structuring them to highlight their relationships. In this paper, we put ourselves in the Statistical Implicative Analysis framework, and we propose a new methodology for reducing rule sets by detecting redundant rules. We define two new measures based on the Shannon’s entropy and the Gini’s coefficient.

Part V - Data Analysis, Data Mining, and KDD | Pp. 367-376

doi: 10.1007/978-3-540-73560-1_35

Mining Personal Banking Data to Detect Fraud

David J. Hand

Fraud detection in the retail banking sector poses some novel and challenging statistical problems. For example, the data sets are large, and yet each transaction must be examined and decisions must be made in real time, the transactions are often heterogeneous, differing substantially even within an individual account, and the data sets are typically very unbalanced, with only a tiny proportion of transactions belonging to the fraud class. We review the problem, its magnitude, and the various kinds of statistical tools have been developed for this application. The area is particularly unusual because the patterns to be detected change in response to the detection strategies which are developed: the very success of the statistical models leads to the need for new ones to be developed.

Part V - Data Analysis, Data Mining, and KDD | Pp. 377-386

doi: 10.1007/978-3-540-73560-1_36

Finding Rules in Data

Tu-Bao Ho

In the first year of my preparation for doctor thesis at INRIA in the group of Edwin, I worked on the construction of an inference engine and a knowledge base, by consulting various group members, for building an expert system guiding the data analysis package SICLA of the group. One day, Edwin asked me whether one can automatically generate rules for expert systems from data, and I started my new research direction. Since that time, my main work has been machine learning, especially finding rules in data. This paper briefly presents some learning methods we have developed.

Part V - Data Analysis, Data Mining, and KDD | Pp. 387-396

doi: 10.1007/978-3-540-73560-1_37

Mining Biological Data Using Pyramids

Géraldine Polaillon; Laure Vescovo; Magali Michaut; Jean-Christophe Aude

This paper is a review of promising applications of pyramidal classification to biological data. We show that overlapping and ordering properties can give new insights that can not be achieved using more classical methods. We examplify our point using three applications: (i) a genome scale sequence analysis, (ii) a new progressive multiple sequence alignment method, (iii) a cluster analysis of transcriptomic data.

Part V - Data Analysis, Data Mining, and KDD | Pp. 397-408

doi: 10.1007/978-3-540-73560-1_38

Association Rules for Categorical and Tree Data

Henri Ralambondrainy; Jean Diatta

The association rule mining problem is among the most popular data mining techniques. Association rules, whose significance is measured via quality indices, have been intensively studied for binary data. In this paper, we deal with association rules in the framework of categorical or tree-like-valued attributes.

Part V - Data Analysis, Data Mining, and KDD | Pp. 409-417

doi: 10.1007/978-3-540-73560-1_39

Induction Graphs for Data Mining

Djamel Abdelkader Zighed

Induction graphs, which are a generalization of decision trees, have a special place among the methods of Data Mining. Indeed, they generate lattice graphs instead of trees. They perform well, are capable of handling data in large volumes, are relatively easy for a non-specialist to interpret, and are applicable without restriction on data of any type (qualitative, quantitative). The explosion of softwares based on the paradigm of decision trees and more generally induction graphs is a rather strong evidence of their success. In this article, we present a complete method of induction graphs; the method SIPINA.

Part V - Data Analysis, Data Mining, and KDD | Pp. 419-430

doi: 10.1007/978-3-540-73560-1_40

Clustering of Molecules: Influence of the Similarity Measures

Samia Aci; Gilles Bisson; Sylvaine Roy; Samuel Wieczorek

In this paper, we present the results of an experimental study to analyze the effect of various similarity (or distance) measures on the clustering quality of a set of molecules. We mainly focused on the clustering approaches able to directly deal with the 2D representation of the molecules (, graphs). In such a context, we found that it seems relevant to use an approach based on asymmetrical measures of similarity. Our experiments are carried out on a dataset coming from the High Throughput Screening HTS domain.

Part VI - Dissimilarities: Structures and Indices | Pp. 433-444