Catálogo de publicaciones - libros
Knowledge Discovery in Inductive Databases: 4th International Workshop, KDID 2005, Porto, Portugal, October 3, 2005, Revised Selected and Invited Papers
Francesco Bonchi ; Jean-François Boulicaut (eds.)
En conferencia: 4º International Workshop on Knowledge Discovery in Inductive Databases (KDID) . Porto, Portugal . October 3, 2005 - October 3, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-33292-3
ISBN electrónico
978-3-540-33293-0
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11733492_1
Data Mining in Inductive Databases
Arno Siebes
Ever since the seminal paper by Imielinski and Mannila [11], inductive databases have been a constant theme in the data mining literature. Operationally, such an inductive database is a database in which models and patterns are first class citizens.
In the extensive literature on inductive databases there is at least one consequence of this operational definition that is conspicuously missing. That is the question: if we have models and patterns in our inductive database, how does this help to discover other models and patterns? This question is the topic of this paper.
- Invited Papers | Pp. 1-23
doi: 10.1007/11733492_2
Mining Databases and Data Streams with Query Languages and Rules
Carlo Zaniolo
Among data-intensive applications that are beyond the reach of traditional Data Base Management Systems (DBMS), data mining stands out because of practical importance and the complexity of the research problems that must be solved before the vision of Inductive DBMS can become a reality. In this paper, we first discuss technical developments that have occurred since the very notion of Inductive DBMS emerged as a result of the seminal papers authored by Imielinski and Mannila a decade ago. The research progress achieved since then can be subdivided into three main problem subareas as follows: (i) language (ii) optimization, and (iii) representation. We discuss the problems in these three areas and the different approaches to Inductive DBMS that are made possible by recent technical advances. Then, we pursue a language-centric solution, and introduce simple SQL extensions that have proven very effective at supporting data mining. Finally, we turn our attention to the related problem of supporting data stream mining using Data Stream Management Systems (DSMS) and introduce the notion of Inductive DSMS. In addition to continuous query languages, DSMS provide support for synopses, sampling, load shedding, and other built-in functions that are needed for data stream mining. Moreover, we show that Inductive DSMS can be achieved by generalizing DSMS to assure that their continuous query languages support efficiently data stream mining applications. Thus, DSMS extended with inductive capabilities will provide a uniquely supportive environment for data stream mining applications.
- Invited Papers | Pp. 24-37
doi: 10.1007/11733492_3
Memory-Aware Frequent -Itemset Mining
Maurizio Atzori; Paolo Mancarella; Franco Turini
In this paper we show that the well known problem of computing frequent -itemsets (i.e. itemsets of cardinality ) in a given dataset can be reduced to the problem of finding iceberg queries from a stream of queries suitably constructed from the original dataset. Hence, algorithms for computing frequent -itemsets can be obtained by adapting algorithms for computing iceberg queries. In the paper we show that, for sparse datasets, this can be done directly, i.e. without generating frequent -itemsets, for each < , as done in the most common algorithms based on a level-wise approach. We exploit a recent algorithm for finding iceberg queries and define an algorithm which requires only three sequential passes over the dataset to compute the frequent -itemsets (even for > 3). An important feature of the algorithm is that the amount of main memory required can be determined in advance, and it is shown to be very low for sparse datasets. Experiments show that for very large datasets with millions of small transactions our proposal outperforms the state-of-the-art algorithms. Furthermore, we sketch a first extension of our algorithm that works over data streams.
- Contributed Papers | Pp. 38-54
doi: 10.1007/11733492_4
Constraint-Based Mining of Fault-Tolerant Patterns from Boolean Data
Jérémy Besson; Ruggero G. Pensa; Céline Robardet; Jean-François Boulicaut
Thanks to an important research effort during the last few years, inductive queries on local patterns (e.g., set patterns) and their associated complete solvers have been proved extremely useful to support knowledge discovery. The more we use such queries on real-life data, e.g., biological data, the more we are convinced that inductive queries should return fault-tolerant patterns. This is obviously the case when considering formal concept discovery from noisy datasets. Therefore, we study various extensions of this kind of bi-set towards fault-tolerance. We compare three declarative specifications of fault-tolerant bi-sets by means of a constraint-based mining approach. Our framework enables a better understanding of the needed trade-off between extraction feasibility, completeness, relevance, and ease of interpretation of these fault-tolerant patterns. An original empirical evaluation on both synthetic and real-life medical data is given. It enables a comparison of the various proposals and it motivates further directions of research.
- Contributed Papers | Pp. 55-71
doi: 10.1007/11733492_5
Experiment Databases: A Novel Methodology for Experimental Research
Hendrik Blockeel
Data mining and machine learning are experimental sciences: a lot of insight in the behaviour of algorithms is obtained by implementing them and studying how they behave when run on datasets. However, such experiments are often not as extensive and systematic as they ideally would be, and therefore the experimental results must be interpreted with caution. In this paper we present a new experimental methodology that is based on the concept of “experiment databases”. An experiment database can be seen as a special kind of inductive database, and the experimental methodology consists of filling and then querying this database. We show that the novel methodology has numerous advantages over the existing one. As such, this paper presents a novel and interesting application of inductive databases that may have a significant impact on experimental research in machine learning and data mining.
- Contributed Papers | Pp. 72-85
doi: 10.1007/11733492_6
Quick Inclusion-Exclusion
Toon Calders; Bart Goethals
Many data mining algorithms make use of the well-known Inclusion-Exclusion principle. As a consequence, using this principle efficiently is crucial for the success of all these algorithms. Especially in the context of condensed representations, such as NDI, and in computing interesting measures, a quick inclusion-exclusion algorithm can be crucial for the performance. In this paper, we give an overview of several algorithms that depend on the inclusion-exclusion principle and propose an efficient algorithm to use it and evaluate its complexity. The theoretically obtained results are supported by experimental evaluation of the quick IE technique in isolation, and of an example application.
- Contributed Papers | Pp. 86-103
doi: 10.1007/11733492_7
Towards Mining Frequent Queries in Star Schemes
Tao-Yuan Jen; Dominique Laurent; Nicolas Spyratos; Oumar Sy
The problem of mining frequent queries in a database is intractable, even if we consider conjunctive queries only. In this paper, we study this problem under reasonable restrictions on the database, namely: () the database scheme is a star scheme; () the data in the database satisfies a set of functional dependencies and a set of referential constraints.
We note that star schemes are considered to be the most appropriate schemes for data warehouses, while functional dependencies and referential constraints are the most common constraints that one encounters in real databases. Our approach is based on the weak instance semantics of databases and considers the class of selection-projection queries over weak instances. In such a context, we show that frequent queries can be mined using level-wise algorithms such as Apriori.
- Contributed Papers | Pp. 104-123
doi: 10.1007/11733492_8
Inductive Databases in the Relational Model: The Data as the Bridge
Stefan Kramer; Volker Aufschild; Andreas Hapfelmeier; Alexander Jarasch; Kristina Kessler; Stefan Reckow; Jörg Wicker; Lothar Richter
We present a new and comprehensive approach to inductive databases in the relational model. The main contribution is a new inductive query language extending SQL, with the goal of supporting the whole knowledge discovery process, from pre-processing via data mining to post-processing. A prototype system supporting the query language was developed in the SINDBAD (structured inductive database development) project. Setting aside models and focusing on distance-based and instance-based methods, closure can easily be achieved. An example scenario from the area of gene expression data analysis demonstrates the power and simplicity of the concept. We hope that this preliminary work will help to bring the fundamental issues, such as the integration of various pattern domains and data mining techniques, to the attention of the inductive database community.
- Contributed Papers | Pp. 124-138
doi: 10.1007/11733492_9
Transaction Databases, Frequent Itemsets, and Their Condensed Representations
Taneli Mielikäinen
Mining frequent itemsets is a fundamental task in data mining. Unfortunately the number of frequent itemsets describing the data is often too large to comprehend. This problem has been attacked by condensed representations of frequent itemsets that are subcollections of frequent itemsets containing only the frequent itemsets that cannot be deduced from other frequent itemsets in the subcollection, using some deduction rules. In this paper we review the most popular condensed representations of frequent itemsets, study their relationship to transaction databases and each other, examine their combinatorial and computational complexity, and describe their relationship to other important concepts in combinatorial data analysis, such as Vapnik-Chervonenkis dimension and hypergraph transversals.
- Contributed Papers | Pp. 139-164
doi: 10.1007/11733492_10
Multi-class Correlated Pattern Mining
Siegfried Nijssen; Joost N. Kok
To mine databases in which examples are tagged with class labels, the minimum correlation constraint has been studied as an alternative to the minimum frequency constraint. We reformulate previous approaches and show that a minimum correlation constraint can be transformed into a disjunction of minimum frequency constraints. We prove that this observation extends to the multi-class correlation measure, and thus obtain an efficient new () prune test. We illustrate how the relation between correlation measures and minimum support thresholds allows for the reuse of previously discovered pattern sets, thus avoiding unneccessary database evaluations. We conclude with experimental results to assess the effectivity of algorithms based on our observations.
- Contributed Papers | Pp. 165-187