Catálogo de publicaciones - libros

Compartir en
redes sociales

Foundations and Novel Approaches in Data Mining

Tsau Young Lin ; Setsuo Ohsuga ; Churn-Jung Liau ; Xiaohua Hu (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Appl.Mathematics/Computational Methods of Engineering; Artificial Intelligence (incl. Robotics)

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-28315-7

ISBN electrónico

978-3-540-31229-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ingeniería eléctrica, electrónica e informática

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11539827_1

Commonsense Causal Modeling in the Data Mining Context

Lawrence J. Mazlack

Commonsense causal reasoning is important to human reasoning. Causality itself as well as human understanding of causality is imprecise, sometimes necessarily so. Causal reasoning plays anessential role in commonsense human decision-making. A difficulty is striking a good balance between precise formalism and commonsense imprecise reality. Today, data mining holds the promise of extracting unsuspected information from very large databases. The most common methods build rules. Inmany ways, the interest in rules is that they offer the promise (or illusion) of causal, or at least, predictive relationships. However, the most common rule form (association rules) only calculates a joint occurrence frequency; they do not express a causal relationship. Without understanding the underlying causality in rules, a naïve use of association rules can lead to undesirableactions. This paper explores the commonsense representation of causality in large data sets.

Pp. 1-22

doi: 10.1007/11539827_2

Definability of Association Rules in Predicate Calculus

Jan Rauch

Observational calculi are special logical calculi in which statements concerning observed data can be formulated. Their special case is predicate observational calculus. It can be obtainedby modifications of classical predicate calculus - only finite models are allowed and generalisedquantifiers are added. Association rules can be understood as special formulas of predicate observational calculi. Such association rules correspond to general relations of two Boolean attributes. A problem of the possibility to express association rule by the means of classical predicate calculus is investigated. A reasonable criterion of classical definability of association rules is presented.

Pp. 23-40

doi: 10.1007/11539827_3

A Measurement-Theoretic Foundation of Rule Interestingness Evaluation

Yiyu Yao; Yaohua Chen; Xuedong Yang

Many measures have been proposed and studied extensively in data mining for evaluating the interestingness (or usefulness) of discovered rules. They are usually defined based on structural characteristics or statistical information about the rules. The meaningfulness of each measure was interpreted based either on intuitive arguments or mathematical properties. There does not exist a framework in which one is able to representthe user judgment explicitly, precisely, and formally. Since the usefulness of discovered rules must be eventually judged by users, a framework that takes user preference or judgment into consideration will be very valuable. The objective of this paper is to propose such a framework based on the notion of user preference. Theresults are useful in establishing a measurement-theoretic foundation of rule interestingness evaluation.

Pp. 41-59

doi: 10.1007/11539827_4

Statistical Independence as Linear Dependence in a Contingency Table

Shusaku Tsumoto

A contingency table summarizes the conditional frequencies of two attributes and shows how these two attributes are dependent on each other. Thus, this table is a fundamental tool for pattern discovery with conditional probabilities, such as rule discovery. In this paper, a contingency table is interpreted from the viewpoint of granular computing. The first important observation is that a contingency table compares two attributes with respect to the number of equivalence classes. The second important observation is that matrix algebra is a key point of analysis of this table. Especially, the degree of independence, rank plays a very important role in extracting a probabilistic model from a given contingency table.

Pp. 61-73

doi: 10.1007/11539827_5

Foundations of Classification

J. T. Yao; Y. Y. Yao; Y. Zhao

Classification is one of the main tasks in machine learning, data mining, and pattern recognition. A granular computing model is suggested for learning two basic issues of concept formation and concept relationship identification. A classification problem can be considered as a search for suitable granules organized under a partial order. The structures of search space, solutions to a consistent classification problem, and the structures of solution space are discussed. A classification rule induction method is proposed. Instead of searching for a suitable partition, we concentrate on the search for a suitable covering of the given universe. This method is more generalthan partition-based methods. For the design of covering granule selection heuristics, several measures on granules are suggested.

Pp. 75-97

doi: 10.1007/11539827_6

Data Mining as Generalization: A Formal Model

Ernestina Menasalvas1; Anita Wasilewska2

The model we present here formalizes the definition of Data Mining as the process of information generalization. In the model the Data Mining algorithms are defined as generalization operators. We show that only three generalizations operators: classification operator, clustering operator, and association operator are needed to express all Data Mining algorithms for classification, clustering, and association, respectively. The framework of the model allows to describe formally the hybrid systems; combination of classifiers into multi-classifiers, and combination of clustering with classification.

Pp. 99-126

doi: 10.1007/11539827_7

SVM-OD: SVM Method to Detect Outliers1

Jiaqi Wang; Chengqi Zhang; Xindong Wu; Hongwei Qi; Jue Wang

Outlier detection is an important task in data mining because outliers can be either useful knowledge or noise. Many statistical methods have been applied to detect outliers, but they usually assume a given distribution of data and it is difficult to deal with high dimensional data. The Statistical Learning Theory (SLT) established by Vapnik et al. provides a new way to overcome these drawbacks. According to SLT Schölkopf et al. proposed a ν-Support Vector Machine (ν-SVM) andapplied it to detect outliers. However, it is still difficult for data mining users to decide onekey parameter in ν-SVM. This paper proposes a new SVM method to detect outliers, SVM-OD, which can avoid this parameter. We provide the theoretical analysis based on SLT as well as experiments to verify the effectiveness of our method. Moreover, an experiment on synthetic data shows that SVM-OD can detect some local outliers near the cluster with some distribution while ν-SVM cannot dothat.

Pp. 129-141

doi: 10.1007/11539827_8

Extracting Rules from Incomplete Decision Systems: System ERID

Agnieszka Dardzińska; Zbigniew W. Raśs

We present a new method for extracting rules from incomplete Information Systems (IS) whichare generalizations of information systems introduced by Pawlak [7]. Namely, we allow to use a set of weighted attribute values instead of a single value to describe objects in IS. The proposed strategy has some similarities with system LERS [3]. It is a bottom-up strategy, guided by two thresholds values (minimum support and minimum confidence) and generating sets of weighted objects with descriptions of minimal length. The algorithm starts with identifying sets of objects havingdescriptions of length one (values of attributes). Some of these sets satisfy both thresholds values and they are used for constructing rules. They are marked as successful. All sets having a number of supporting objects below the threshold value are marked as unsuccessful. Pairs of descriptions of all remaining sets (unmarked) are used to construct new sets of weighted objects having descriptions of length 2. This process is continued recursively by moving to sets of weighted objects having k-value properties. In [10], [1], ERID is used as a null value imputation toll for knowledge discovery based chase algorithm.

Pp. 143-153

doi: 10.1007/11539827_9

Mining for Patterns Based on Contingency Tables by – First Experience

Jan Rauch; Milan šimůnek; Václav Lín

A new datamining procedure called is presented. The procedure mines for various patterns based on evaluation of two–dimensional contingency tables, including patterns of statistical or information theoretic nature. The procedure is aresult of continued development of the academic system LISp-Miner for KDD.

Pp. 155-167

doi: 10.1007/11539827_10

Knowledge Discovery in Fuzzy Databases Using Attribute-Oriented Induction

Rafal A. Angryk; Frederick E. Petry

In this paper we analyze an attribute-oriented data induction technique for discovery of generalized knowledge from large data repositories. We employ a fuzzy relational database as the medium carrying the original information, where the lack of precise information about an entity canbe reflected via multiple attribute values, and the classical equivalence relation is replaced with relation of the fuzzy proximity. Following a well-known approach for exact data generalizationin the ordinary databases [1], we propose three ways in which the original methodology can be successfully implemented in the environment of fuzzy databases. During our investigation we point out both the advantages and the disadvantages of the developed tactics when applied to mine knowledge from fuzzy tuples.

Pp. 169-196