Catálogo de publicaciones - libros
Foundations and Novel Approaches in Data Mining
Tsau Young Lin ; Setsuo Ohsuga ; Churn-Jung Liau ; Xiaohua Hu (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Appl.Mathematics/Computational Methods of Engineering; Artificial Intelligence (incl. Robotics)
Disponibilidad
| Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
|---|---|---|---|---|
| No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-28315-7
ISBN electrónico
978-3-540-31229-1
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin/Heidelberg 2006
Cobertura temática
Tabla de contenidos
doi: 10.1007/11539827_1
Commonsense Causal Modeling in the Data Mining Context
Lawrence J. Mazlack
Commonsense causal reasoning is important to human reasoning. Causality itself as well as human understanding of causality is imprecise, sometimes necessarily so. Causal reasoning plays anessential role in commonsense human decision-making. A difficulty is striking a good balance between precise formalism and commonsense imprecise reality. Today, data mining holds the promise of extracting unsuspected information from very large databases. The most common methods build rules. Inmany ways, the interest in rules is that they offer the promise (or illusion) of causal, or at least, predictive relationships. However, the most common rule form (association rules) only calculates a joint occurrence frequency; they do not express a causal relationship. Without understanding the underlying causality in rules, a naïve use of association rules can lead to undesirableactions. This paper explores the commonsense representation of causality in large data sets.
Pp. 1-22
doi: 10.1007/11539827_2
Definability of Association Rules in Predicate Calculus
Jan Rauch
Observational calculi are special logical calculi in which statements concerning observed data can be formulated. Their special case is predicate observational calculus. It can be obtainedby modifications of classical predicate calculus - only finite models are allowed and generalisedquantifiers are added. Association rules can be understood as special formulas of predicate observational calculi. Such association rules correspond to general relations of two Boolean attributes. A problem of the possibility to express association rule by the means of classical predicate calculus is investigated. A reasonable criterion of classical definability of association rules is presented.
Pp. 23-40
doi: 10.1007/11539827_3
A Measurement-Theoretic Foundation of Rule Interestingness Evaluation
Yiyu Yao; Yaohua Chen; Xuedong Yang
Many measures have been proposed and studied extensively in data mining for evaluating the interestingness (or usefulness) of discovered rules. They are usually defined based on structural characteristics or statistical information about the rules. The meaningfulness of each measure was interpreted based either on intuitive arguments or mathematical properties. There does not exist a framework in which one is able to representthe user judgment explicitly, precisely, and formally. Since the usefulness of discovered rules must be eventually judged by users, a framework that takes user preference or judgment into consideration will be very valuable. The objective of this paper is to propose such a framework based on the notion of user preference. Theresults are useful in establishing a measurement-theoretic foundation of rule interestingness evaluation.
Pp. 41-59
doi: 10.1007/11539827_4
Statistical Independence as Linear Dependence in a Contingency Table
Shusaku Tsumoto
A contingency table summarizes the conditional frequencies of two attributes and shows how these two attributes are dependent on each other. Thus, this table is a fundamental tool for pattern discovery with conditional probabilities, such as rule discovery. In this paper, a contingency table is interpreted from the viewpoint of granular computing. The first important observation is that a contingency table compares two attributes with respect to the number of equivalence classes. The second important observation is that matrix algebra is a key point of analysis of this table. Especially, the degree of independence, rank plays a very important role in extracting a probabilistic model from a given contingency table.
Pp. 61-73
doi: 10.1007/11539827_5
Foundations of Classification
J. T. Yao; Y. Y. Yao; Y. Zhao
Classification is one of the main tasks in machine learning, data mining, and pattern recognition. A granular computing model is suggested for learning two basic issues of concept formation and concept relationship identification. A classification problem can be considered as a search for suitable granules organized under a partial order. The structures of search space, solutions to a consistent classification problem, and the structures of solution space are discussed. A classification rule induction method is proposed. Instead of searching for a suitable partition, we concentrate on the search for a suitable covering of the given universe. This method is more generalthan partition-based methods. For the design of covering granule selection heuristics, several measures on granules are suggested.
Pp. 75-97
doi: 10.1007/11539827_6
Data Mining as Generalization: A Formal Model
Ernestina Menasalvas1; Anita Wasilewska2
The model we present here formalizes the definition of Data Mining as the process of information generalization. In the model the Data Mining algorithms are defined as generalization operators. We show that only three generalizations operators: classification operator, clustering operator, and association operator are needed to express all Data Mining algorithms for classification, clustering, and association, respectively. The framework of the model allows to describe formally the hybrid systems; combination of classifiers into multi-classifiers, and combination of clustering with classification.
Pp. 99-126
doi: 10.1007/11539827_7
SVM-OD: SVM Method to Detect Outliers1
Jiaqi Wang; Chengqi Zhang; Xindong Wu; Hongwei Qi; Jue Wang
Outlier detection is an important task in data mining because outliers can be either useful knowledge or noise. Many statistical methods have been applied to detect outliers, but they usually assume a given distribution of data and it is difficult to deal with high dimensional data. The Statistical Learning Theory (SLT) established by Vapnik et al. provides a new way to overcome these drawbacks. According to SLT Schölkopf et al. proposed a ν-Support Vector Machine (ν-SVM) andapplied it to detect outliers. However, it is still difficult for data mining users to decide onekey parameter in ν-SVM. This paper proposes a new SVM method to detect outliers, SVM-OD, which can avoid this parameter. We provide the theoretical analysis based on SLT as well as experiments to verify the effectiveness of our method. Moreover, an experiment on synthetic data shows that SVM-OD can detect some local outliers near the cluster with some distribution while ν-SVM cannot dothat.
Pp. 129-141
doi: 10.1007/11539827_8
Extracting Rules from Incomplete Decision Systems: System ERID
Agnieszka Dardzińska; Zbigniew W. Raśs
We present a new method for extracting rules from incomplete Information Systems (IS) whichare generalizations of information systems introduced by Pawlak [7]. Namely, we allow to use a set of weighted attribute values instead of a single value to describe objects in IS. The proposed strategy has some similarities with system LERS [3]. It is a bottom-up strategy, guided by two thresholds values (minimum support and minimum confidence) and generating sets of weighted objects with descriptions of minimal length. The algorithm starts with identifying sets of objects havingdescriptions of length one (values of attributes). Some of these sets satisfy both thresholds values and they are used for constructing rules. They are marked as successful. All sets having a number of supporting objects below the threshold value are marked as unsuccessful. Pairs of descriptions of all remaining sets (unmarked) are used to construct new sets of weighted objects having descriptions of length 2. This process is continued recursively by moving to sets of weighted objects having k-value properties. In [10], [1], ERID is used as a null value imputation toll for knowledge discovery based chase algorithm.
Pp. 143-153
doi: 10.1007/11539827_9
Mining for Patterns Based on Contingency Tables by – First Experience
Jan Rauch; Milan šimůnek; Václav Lín
A new datamining procedure called is presented. The procedure mines for various patterns based on evaluation of two–dimensional contingency tables, including patterns of statistical or information theoretic nature. The procedure is aresult of continued development of the academic system LISp-Miner for KDD.
Pp. 155-167
doi: 10.1007/11539827_10
Knowledge Discovery in Fuzzy Databases Using Attribute-Oriented Induction
Rafal A. Angryk; Frederick E. Petry
In this paper we analyze an attribute-oriented data induction technique for discovery of generalized knowledge from large data repositories. We employ a fuzzy relational database as the medium carrying the original information, where the lack of precise information about an entity canbe reflected via multiple attribute values, and the classical equivalence relation is replaced with relation of the fuzzy proximity. Following a well-known approach for exact data generalizationin the ordinary databases [1], we propose three ways in which the original methodology can be successfully implemented in the environment of fuzzy databases. During our investigation we point out both the advantages and the disadvantages of the developed tactics when applied to mine knowledge from fuzzy tuples.
Pp. 169-196