Catálogo de publicaciones - libros

Compartir en
redes sociales

Foundations of Data Mining and Knowledge Discovery

Tsau Young Lin ; Setsuo Ohsuga ; Churn-Jung Liau ; Xiaohua Hu ; Shusaku Tsumoto (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Theory of Computation; Appl.Mathematics/Computational Methods of Engineering; Artificial Intelligence (incl. Robotics)

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26257-2

ISBN electrónico

978-3-540-32408-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11498186_11

Decision Making Based on Hybrid of Multi-Knowledge and Naïve Bayes Classifier

QingXiang Wu; David Bell; Martin McGinnity; Gongde Guo

In general, knowledge can be represented by a mapping from a hypothesis space to a decision space. Usually, multiple mappings can be obtained from an instance information system. A set of mappings, which are created based on multiple reducts in the instance information system by means of rough set theory, is defined as multi-knowledge in this paper. Uncertain rules are introduced to represent multi-knowledge. A hybrid approach of multi-knowledge and the Naïve Bayes Classifier is proposed to make decisions for unseen instances or for instances with missing attribute values. The data sets from the UCI Machine Learning Repository are applied to test this decision-making algorithm. The experimental results show that the decision accuracies for unseen instances are higher than by using other approaches in a single body of knowledge.

Pp. 171-184

doi: 10.1007/11498186_12

First-Order Logic Based Formalism for Temporal Data Mining

Paul Cotofrei; Kilian Stoffel

In this article we define a formalism for a methodology that has as purpose the discovery of knowledge, represented in the form of general Horn clauses, inferred from databases with a temporal dimension. To obtain what we called temporal rules, a discretisation phase that extracts events from raw data is applied first, followed by an induction phase, which constructs classification trees from these events. The theoretical framework we proposed, based on first-order temporal logic, permits us to define the main notions (event, temporal rule, constraint) in a formal way. The concept of consistent linear time structure allows us to introduce the notions of general interpretation and of confidence. These notions open the possibility to use statistical approaches in the design of algorithms for inferring higher order temporal rules, denoted temporal meta-rules.

Pp. 185-210

doi: 10.1007/11498186_13

An Alternative Approach to Mining Association Rules

Jan Rauch; Milan Šimůnek

An alternative approach to mining association rules is presented. It is based on representation of analysed data by suitable strings of bits. This approach was developed for the GUHA method of mechanising hypothesis formation more than 30 years ago. The procedure 4ft-Miner that is contemporary application of this approach is described. It mines for various types of association rules including conditional association rules. The 4ft-Miner procedure is a part of the academic system LISp-Miner for KDD research and teaching.

Pp. 211-231

doi: 10.1007/11498186_14

Direct Mining of Rules from Data with Missing Values

Vladimir Gorodetsky; Oleg Karsaev; Vladimir Samoilov

The paper presents an approach to and technique for direct mining of binary data with missing values aiming at extraction of classification rules, whose premises are represented in a conjunctive form. This approach does not assume an imputation of missing values. The idea is (1) to generate two sets of rules serving as the upper and low bounds for any other sets of rules corresponding to all arbitrary assignments of missing values, and then, (2) based on these upper and low bounds of the rules’ sets, on testing procedure and on a classification criterion to select a subset of rules to be used for classification. The approach is primarily oriented to the application domains where an imputation is either cannot be theoretically justified or is impossible at all. Examples of such applications are given by domains where information used for classification is composed of asynchronous data streams of various frequencies and thus possessing different “life time”, or such information is missing due to peculiarities of information collection system. Instead of missing value imputation, the proposed approach uses training dataset to cut down the potential rules set via forming its low and upper bounds with the subsequent testing the rules of the upper bound against the new dataset with missing values and selection of the most appropriate rules. The approach was applied to learning of intrusions detection in computer network based on asynchronous data streams incoming from multiple data sources. Experimental results confirm that the proposed approach to direct mining of data with missing values can yield good results.

Pp. 233-264

doi: 10.1007/11498186_15

Cluster Identification Using Maximum Configuration Entropy

C.H. Li

Clustering is an important task in data mining and machine learning. In this paper, a normalized graph sampling algorithm for clustering that improves the solution of clustering via the incorporation of a priori constraint in a stochastic graph sampling procedure is adopted. The important question of how many clusters exists in the dataset and when to terminate the clustering algorithm is solved via computing the ensemble average change in entropy. Experimental results show the feasibility of the suggested approach.

Pp. 265-276

doi: 10.1007/11498186_16

Mining Small Objects in Large Images Using Neural Networks

Mengjie Zhang

Since the late 1980s, neural networks have been widely applied to data mining. However, they are often criticised and regarded as a “black box” due to the lack of interpretation ability. This chapter describes a domain independent approach to the use of neural networks for mining multiple class, small objects in large images. In this approach, the networks are trained by the back propagation algorithm on examples which have been cut out from the large images. The trained networks are then applied, in a moving window fashion, over the large images to mine the objects of interest. During the mining process, both the classes and locations of the objects are determined. The network behaviour is interpreted by analysing the weights in learned networks. Visualisation of these weights not only gives an intuitive way of representing hidden patterns encoded in learned neural networks for object mining problems, but also shows that neural networks are not just a black box but an expression or a model of hidden patterns discovered in the data mining process.

Pp. 277-303

doi: 10.1007/11498186_17

Improved Knowledge Mining with the Multimethod Approach

Mitja Lenič; Peter Kokol; Milan Zorman; Petra Povalej; Bruno Stiglic; Ryuichi Yamamoto

Automatic induction from examples has a long tradition and represents an important technique used in data mining. Trough induction a method builds a hypothesis to explain observed facts. Many knowledge extraction methods have been developed, unfortunately each has advantages and limitations and in general there is no such method that would outperform all others on all problems. One of the possible approaches to overcome this problem is to combine different methods in one hybrid method. Recent research is mainly focused on a specific combination of methods, contrary, multimethod approach combines different induction methods in an unique manner – it applies different methods on the same knowledge model in no predefined order where each method may contain inherent limitations with the expectation that the combined multiple methods may produce better results. In this paper we present the overview of an idea, concrete integration and possible improvements.

Pp. 305-318

doi: 10.1007/11498186_18

Posting Act Tagging Using Transformation-Based Learning

Tianhao Wu; Faisal M. Khan; Todd A. Fisher; Lori A. Shuler; William M. Pottenger

In this article we present the application of transformation-based learning (TBL) [1] to the task of assigning tags to postings in online chat conversations. We define a list of posting tags that have proven useful in chat-conversation analysis. We describe the templates used for posting act tagging in the context of template selection. We extend traditional approaches used in part-of-speech tagging and dialogue act tagging by incorporating regular expressions into our templates. We close with a presentation of results that compare favorably with the application of TBL in dialogue act tagging.

Pp. 319-331

doi: 10.1007/11498186_19

Identification of Critical Values in Latent Semantic Indexing

April Kontostathis; William M. Pottenger; Brian D. Davison

In this chapter we analyze the values used by Latent Semantic Indexing (LSI) for information retrieval. By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero). This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries. We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI). Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested. Our approach additionally has the computational benefit of reducing memory requirements and query response time.

Pp. 333-346

doi: 10.1007/11498186_20

Reporting Data Mining Results in a Natural Language

Petr Strossa; Zdeněk Černý; Jan Rauch

An attempt to report results of data mining in automatically generated natural language sentences is described. Several types of association rules are introduced. The presented attempt concerns implicational rules – one of the presented types. Formulation patterns that serve as a generative language model for formulating implicational rules in a natural language are described. An experimental software system AR2NL that can convert implicational rules both into English and Czech is presented. Possibilities of application of the presented principles to other types of association rules are also mentioned.

Pp. 347-361