Catálogo de publicaciones - libros

Compartir en
redes sociales


Knowledge Discovery in Inductive Databases: 5th International Workshop, KDID 2006 Berlin, Germany, September 18, 2006 Revised Selected and Invited Papers

Sašo Džeroski ; Jan Struyf (eds.)

En conferencia: 5º International Workshop on Knowledge Discovery in Inductive Databases (KDID) . Berlin, Germany . September 18, 2006 - September 18, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-75548-7

ISBN electrónico

978-3-540-75549-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Value, Cost, and Sharing: Open Issues in Constrained Clustering

Kiri L. Wagstaff

Clustering is an important tool for data mining, since it can identify major patterns or trends without any supervision (labeled data). Over the past five years, semi-supervised (constrained) clustering methods have become very popular. These methods began with incorporating pairwise constraints and have developed into more general methods that can learn appropriate distance metrics. However, several important open questions have arisen about which constraints are most useful, how they can be actively acquired, and when and how they should be propagated to neighboring points. This position paper describes these open questions and suggests future directions for constrained clustering research.

- Invited Talk | Pp. 1-10

Mining Bi-sets in Numerical Data

Jérémy Besson; Céline Robardet; Luc De Raedt; Jean-François Boulicaut

Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers whose dimensions denote objects and properties). Therefore, using efficient 0/1 mining techniques needs for tedious Boolean property encoding phases. This is, e.g., the case, when considering microarray data mining and its impact for knowledge discovery in molecular biology. We consider the possibility to mine directly numerical data to extract collections of relevant bi-sets, i.e., couples of associated sets of objects and attributes which satisfy some user-defined constraints. Not only we propose a new pattern domain but also we introduce a complete solver for computing the so-called numerical bi-sets. Preliminary experimental validation is given.

- Contributed Papers | Pp. 11-23

Extending the Soft Constraint Based Mining Paradigm

Stefano Bistarelli; Francesco Bonchi

The paradigm of pattern discovery based on constraints has been recognized as a core technique in inductive querying: constraints provide to the user a tool to drive the discovery process towards potentially patterns, with the positive side effect of achieving a more efficient computation. So far the research on this paradigm has mainly focussed on the latter aspect: the development of efficient algorithms for the evaluation of constraint-based mining queries. Due to the lack of research on methodological issues, the constraint-based pattern mining framework still suffers from many problems which limit its practical relevance. In our previous work [5], we analyzed such limitations and showed how they flow out from the same source: the fact that in the classical constraint-based mining, a constraint is a rigid boolean function which returns either or . To overcome such limitations we introduced the new paradigm of pattern discovery based on , and instantiated our idea to the fuzzy soft constraints. In this paper we extend the framework to deal with probabilistic and weighted soft constraints: we provide theoretical basis and detailed experimental analysis. We also discuss a straightforward solution to deal with - queries. Finally we show how the ideas presented in this paper have been implemented in a real Inductive Database system.

- Contributed Papers | Pp. 24-41

On Interactive Pattern Mining from Relational Databases

Francesco Bonchi; Fosca Giannotti; Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Roberto Trasarti

In this paper we present , a constraint based querying system devised with the aim of supporting the intrinsically exploratory (i.e., human-guided, interactive, iterative) nature of pattern discovery. Following the vision, our framework provides users with an expressive constraint based query language which allows the discovery process to be effectively driven toward potentially interesting patterns. Such constraints are also exploited to reduce the cost of pattern mining computation. We implemented a comprehensive mining system that can access real world relational databases from which extract data. After a preprocessing step, mining queries are answered by an efficient pattern mining engine which entails several data and search space reduction techniques. Resulting patterns are then presented to the user, and possibly stored in the database. New user-defined constraints can be easily added to the system in order to target the particular application considered.

- Contributed Papers | Pp. 42-62

Analysis of Time Series Data with Predictive Clustering Trees

Sašo Džeroski; Valentin Gjorgjioski; Ivica Slavkov; Jan Struyf

Predictive clustering is a general framework that unifies clustering and prediction. This paper investigates how to apply this framework to cluster time series data. The resulting system, Clus-TS, constructs predictive clustering trees (PCTs) that partition a given set of time series into homogeneous clusters. In addition, PCTs provide a symbolic description of the clusters. We evaluate Clus-TS on time series data from microarray experiments. Each data set records the change over time in the expression level of yeast genes as a response to a change in environmental conditions. Our evaluation shows that Clus-TS is able to cluster genes with similar responses, and to predict the time series based on the description of a gene. Clus-TS is part of a larger project where the goal is to investigate how global models can be combined with inductive databases.

- Contributed Papers | Pp. 63-80

Integrating Decision Tree Learning into Inductive Databases

Élisa Fromont; Hendrik Blockeel; Jan Struyf

In inductive databases, there is no conceptual difference between data and the models describing the data: both can be stored and queried using some query language. The approach that adheres most strictly to this philosophy is probably the one proposed by Calders et al. (2006): in this approach, models are stored in relational tables and queried using standard SQL. The approach has been described in detail for association rule discovery. In this work, we study how decision tree induction can be integrated in this approach. We propose a representation format for decision trees similar to the format proposed earlier for association rules, and queryable using standard SQL; and we present a prototype system in which part of the needed functionality is implemented. In particular, we have developed an exhaustive tree learning algorithm able to answer a wide range of constrained queries.

- Contributed Papers | Pp. 81-96

Using a Reinforced Concept Lattice to Incrementally Mine Association Rules from Closed Itemsets

Arianna Gallo; Rosa Meo

In the Data Mining area, discovering association rules is one of the most important task. It is well known that the number of these rules rapidly grows to be unwieldy as the frequency requirements become less strict, especially when collected data is highly correlated or dense. Since a big number of the frequent itemsets turns out to be redundant, it is sufficient to consider only the rules among or . In order to efficiently generate them, it is often essential to know the Concept Lattice, that also allows the user to better understand the relationships between the closed itemsets. We propose an algorithm that mines all the closed itemsets, reading the data . The Concept Lattice is incrementally updated using a simple but essential structure directly connected to it. This structure allows to speed up the execution time and makes the algorithm applicable on both static and dynamic stream data and very dense datasets.

- Contributed Papers | Pp. 97-115

An Integrated Multi-task Inductive Database VINLEN: Initial Implementation and Early Results

Kenneth A. Kaufman; Ryszard S. Michalski; Jarosław Pietrzykowski; Janusz Wojtusiak

A brief review of the current research on the development of the VINLEN multitask inductive database and decision support system is presented. The aim of this research is to integrate a wide range of knowledge generation operators in one system that given input data and relevant domain knowledge generates new knowledge according to the user’s goal. The central VINLEN operator is that generates hypotheses from data in the form of attributional rules that resemble natural language expressions, and are easy to understand and interpret. This operator is illustrated by an application to discovering relationships between lifestyles and diseases of men age 50-65 in a large database created by the American Medical Association. The conclusion outlines plans for future research.

- Contributed Papers | Pp. 116-133

Beam Search Induction and Similarity Constraints for Predictive Clustering Trees

Dragi Kocev; Jan Struyf; Sašo Džeroski

Much research on inductive databases (IDBs) focuses on local models, such as item sets and association rules. In this work, we investigate how IDBs can support global models, such as decision trees. Our focus is on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction builds PCTs top-down, using a greedy algorithm, similar to that of C4.5. We propose a new induction algorithm for PCTs based on beam search. This has three advantages over the regular method: (a) it returns a set of PCTs satisfying the user constraints instead of just one PCT; (b) it better allows for pushing of user constraints into the induction algorithm; and (c) it is less susceptible to myopia. In addition, we propose similarity constraints for PCTs, which improve the diversity of the resulting PCT set.

- Contributed Papers | Pp. 134-151

Frequent Pattern Mining and Knowledge Indexing Based on Zero-Suppressed BDDs

Shin-ichi Minato; Hiroki Arimura

Frequent pattern mining is one of the fundamental techniques for knowledge discovery and data mining. During the last decade, several efficient algorithms for frequent pattern mining have been presented, but most algorithms have focused on enumerating the patterns that satisfy the given conditions, considering the storage and indexing of the pattern results for efficient inductive analysis to be a separate issue. In this paper, we propose a fast algorithm for extracting all/maximal frequent patterns from transaction databases and simultaneously indexing a huge number of patterns using Zero-suppressed Binary Decision Diagrams (ZBDDs). Our method is comparably fast as existing state-of-the-art algorithms and not only enumerates/lists the patterns but also compactly indexes the output data in main memory. After mining, the pattern results can be analyzed efficiently by using algebraic operations. BDD-based data structures have previously been used successfully in VLSI logic design, but our method is the first practical application of BDD-based techniques in the data mining area.

- Contributed Papers | Pp. 152-169