Catálogo de publicaciones - libros
Foundations of Data Mining and Knowledge Discovery
Tsau Young Lin ; Setsuo Ohsuga ; Churn-Jung Liau ; Xiaohua Hu ; Shusaku Tsumoto (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Theory of Computation; Appl.Mathematics/Computational Methods of Engineering; Artificial Intelligence (incl. Robotics)
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-26257-2
ISBN electrónico
978-3-540-32408-9
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin/Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11498186_11
Decision Making Based on Hybrid of Multi-Knowledge and Naïve Bayes Classifier
QingXiang Wu; David Bell; Martin McGinnity; Gongde Guo
In general, knowledge can be represented by a mapping from a hypothesis space to a decision space. Usually, multiple mappings can be obtained from an instance information system. A set of mappings, which are created based on multiple reducts in the instance information system by means of rough set theory, is defined as multi-knowledge in this paper. Uncertain rules are introduced to represent multi-knowledge. A hybrid approach of multi-knowledge and the Naïve Bayes Classifier is proposed to make decisions for unseen instances or for instances with missing attribute values. The data sets from the UCI Machine Learning Repository are applied to test this decision-making algorithm. The experimental results show that the decision accuracies for unseen instances are higher than by using other approaches in a single body of knowledge.
Pp. 171-184
doi: 10.1007/11498186_12
First-Order Logic Based Formalism for Temporal Data Mining
Paul Cotofrei; Kilian Stoffel
In this article we define a formalism for a methodology that has as purpose the discovery of knowledge, represented in the form of general Horn clauses, inferred from databases with a temporal dimension. To obtain what we called temporal rules, a discretisation phase that extracts events from raw data is applied first, followed by an induction phase, which constructs classification trees from these events. The theoretical framework we proposed, based on first-order temporal logic, permits us to define the main notions (event, temporal rule, constraint) in a formal way. The concept of consistent linear time structure allows us to introduce the notions of general interpretation and of confidence. These notions open the possibility to use statistical approaches in the design of algorithms for inferring higher order temporal rules, denoted temporal meta-rules.
Pp. 185-210
doi: 10.1007/11498186_13
An Alternative Approach to Mining Association Rules
Jan Rauch; Milan Šimůnek
An alternative approach to mining association rules is presented. It is based on representation of analysed data by suitable strings of bits. This approach was developed for the GUHA method of mechanising hypothesis formation more than 30 years ago. The procedure 4ft-Miner that is contemporary application of this approach is described. It mines for various types of association rules including conditional association rules. The 4ft-Miner procedure is a part of the academic system LISp-Miner for KDD research and teaching.
Pp. 211-231
doi: 10.1007/11498186_14
Direct Mining of Rules from Data with Missing Values
Vladimir Gorodetsky; Oleg Karsaev; Vladimir Samoilov
The paper presents an approach to and technique for direct mining of binary data with missing values aiming at extraction of classification rules, whose premises are represented in a conjunctive form. This approach does not assume an imputation of missing values. The idea is (1) to generate two sets of rules serving as the upper and low bounds for any other sets of rules corresponding to all arbitrary assignments of missing values, and then, (2) based on these upper and low bounds of the rules’ sets, on testing procedure and on a classification criterion to select a subset of rules to be used for classification. The approach is primarily oriented to the application domains where an imputation is either cannot be theoretically justified or is impossible at all. Examples of such applications are given by domains where information used for classification is composed of asynchronous data streams of various frequencies and thus possessing different “life time”, or such information is missing due to peculiarities of information collection system. Instead of missing value imputation, the proposed approach uses training dataset to cut down the potential rules set via forming its low and upper bounds with the subsequent testing the rules of the upper bound against the new dataset with missing values and selection of the most appropriate rules. The approach was applied to learning of intrusions detection in computer network based on asynchronous data streams incoming from multiple data sources. Experimental results confirm that the proposed approach to direct mining of data with missing values can yield good results.
Pp. 233-264
doi: 10.1007/11498186_15
Cluster Identification Using Maximum Configuration Entropy
C.H. Li
Clustering is an important task in data mining and machine learning. In this paper, a normalized graph sampling algorithm for clustering that improves the solution of clustering via the incorporation of a priori constraint in a stochastic graph sampling procedure is adopted. The important question of how many clusters exists in the dataset and when to terminate the clustering algorithm is solved via computing the ensemble average change in entropy. Experimental results show the feasibility of the suggested approach.
Pp. 265-276
doi: 10.1007/11498186_16
Mining Small Objects in Large Images Using Neural Networks
Mengjie Zhang
Since the late 1980s, neural networks have been widely applied to data mining. However, they are often criticised and regarded as a “black box” due to the lack of interpretation ability. This chapter describes a domain independent approach to the use of neural networks for mining multiple class, small objects in large images. In this approach, the networks are trained by the back propagation algorithm on examples which have been cut out from the large images. The trained networks are then applied, in a moving window fashion, over the large images to mine the objects of interest. During the mining process, both the classes and locations of the objects are determined. The network behaviour is interpreted by analysing the weights in learned networks. Visualisation of these weights not only gives an intuitive way of representing hidden patterns encoded in learned neural networks for object mining problems, but also shows that neural networks are not just a black box but an expression or a model of hidden patterns discovered in the data mining process.
Pp. 277-303
doi: 10.1007/11498186_17
Improved Knowledge Mining with the Multimethod Approach
Mitja Lenič; Peter Kokol; Milan Zorman; Petra Povalej; Bruno Stiglic; Ryuichi Yamamoto
Automatic induction from examples has a long tradition and represents an important technique used in data mining. Trough induction a method builds a hypothesis to explain observed facts. Many knowledge extraction methods have been developed, unfortunately each has advantages and limitations and in general there is no such method that would outperform all others on all problems. One of the possible approaches to overcome this problem is to combine different methods in one hybrid method. Recent research is mainly focused on a specific combination of methods, contrary, multimethod approach combines different induction methods in an unique manner – it applies different methods on the same knowledge model in no predefined order where each method may contain inherent limitations with the expectation that the combined multiple methods may produce better results. In this paper we present the overview of an idea, concrete integration and possible improvements.
Pp. 305-318
doi: 10.1007/11498186_18
Posting Act Tagging Using Transformation-Based Learning
Tianhao Wu; Faisal M. Khan; Todd A. Fisher; Lori A. Shuler; William M. Pottenger
In this article we present the application of transformation-based learning (TBL) [1] to the task of assigning tags to postings in online chat conversations. We define a list of posting tags that have proven useful in chat-conversation analysis. We describe the templates used for posting act tagging in the context of template selection. We extend traditional approaches used in part-of-speech tagging and dialogue act tagging by incorporating regular expressions into our templates. We close with a presentation of results that compare favorably with the application of TBL in dialogue act tagging.
Pp. 319-331
doi: 10.1007/11498186_19
Identification of Critical Values in Latent Semantic Indexing
April Kontostathis; William M. Pottenger; Brian D. Davison
In this chapter we analyze the values used by Latent Semantic Indexing (LSI) for information retrieval. By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero). This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries. We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI). Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested. Our approach additionally has the computational benefit of reducing memory requirements and query response time.
Pp. 333-346
doi: 10.1007/11498186_20
Reporting Data Mining Results in a Natural Language
Petr Strossa; Zdeněk Černý; Jan Rauch
An attempt to report results of data mining in automatically generated natural language sentences is described. Several types of association rules are introduced. The presented attempt concerns implicational rules – one of the presented types. Formulation patterns that serve as a generative language model for formulating implicational rules in a natural language are described. An experimental software system AR2NL that can convert implicational rules both into English and Czech is presented. Possibilities of application of the presented principles to other types of association rules are also mentioned.
Pp. 347-361