Catálogo de publicaciones - libros

Compartir en
redes sociales


Data Warehousing and Knowledge Discovery: 7th International Conference, DaWak 2005, Copenhagen, Denmark, August 22-26, 2005, Proceedings

A Min Tjoa ; Juan Trujillo (eds.)

En conferencia: 7º International Conference on Data Warehousing and Knowledge Discovery (DaWaK) . Copenhagen, Denmark . August 22, 2005 - August 26, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-28558-8

ISBN electrónico

978-3-540-31732-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Discovering Richer Temporal Association Rules from Interval-Based Data

Edi Winarko; John F. Roddick

Temporal association rule mining promises the ability to discover time-dependent correlations or patterns between events in large volumes of data. To date, most temporal data mining research has focused on events existing at a point in time rather than over a temporal interval. In comparison to static rules, mining with respect to time points provides semantically richer rules. However, accommodating temporal intervals offers rules that are richer still. In this paper we outline a new algorithm to discover frequent temporal patterns and to generate richer interval-based temporal association rules.

- Association Rules | Pp. 315-325

Semantic Query Expansion Combining Association Rules with Ontologies and Information Retrieval Techniques

Min Song; Il-Yeol Song; Xiaohua Hu; Robert Allen

Query expansion techniques are used to find the desired set of query terms to improve retrieval performance. One of the limitations with the query expansion techniques is that a query is often expanded only by the linguistic features of terms. This paper presents a novel semantic query expansion technique that combines association rules with ontologies and information retrieval techniques. We propose to use the association rule discovery to find good candidate terms to improve the retrieval performance. These candidate terms are automatically derived from collections and added to the original query. Our method is differentiated from others in that 1) it utilizes the semantics as well as linguistic properties of unstructured text corpus and 2) it makes use of contextual properties of important terms discovered by association rules. Experiments conducted on a subset of TREC collections give quite encouraging results. We achieve from 15.49% to 20.98% improvement in term of P@20 with TREC5 ad hoc queries.

- Association Rules | Pp. 326-335

Maintenance of Generalized Association Rules Under Transaction Update and Taxonomy Evolution

Ming-Cheng Tseng; Wen-Yang Lin; Rong Jeng

Mining generalized association rules among items in the presence of taxonomies has been recognized as an important model in data mining. Earlier work on mining generalized association rules ignore the fact that the taxonomies of items cannot be kept static while new transactions are continuously added into the original database. How to effectively update the discovered generalized association rules to reflect the database change with taxonomy evolution and transaction update is a crucial task. In this paper, we examine this problem and propose a novel algorithm, called IDTE, which can incrementally update the discovered generalized association rules when the taxonomy of items is evolved with new transactions insertion to the database. Empirical evaluations show that our algorithm can maintain its performance even in large amounts of incremental transactions and high degree of taxonomy evolution, and is more than an order of magnitude faster than applying the best generalized associations mining algorithms to the whole updated database.

- Association Rules | Pp. 336-345

Prince: An Algorithm for Generating Rule Bases Without Closure Computations

T. Hamrouni; S. Ben Yahia; Y. Slimani

The problem of the relevance and the usefulness of extracted association rules is becoming of primary importance, since an overwhelming number of association rules may be derived, even from reasonably sized databases. To overcome such drawback, the extraction of reduced size generic bases of association rules seems to be promising. Using the concept of minimal generator, we propose an algorithm, called , allowing a shrewd extraction of generic bases of rules. To this end, builds the partial order. Its originality is that this partial order is maintained between minimal generators and no more between closed itemsets. A structure called is then built, from which the derivation of the generic association rules becomes straightforward. An intensive experimental evaluation, carried out on benchmarking sparse and dense datasets, showed that largely outperforms the pioneer level-wise algorithms, , , and .

- Association Rules | Pp. 346-355

Efficient Compression of Text Attributes of Data Warehouse Dimensions

Jorge Vieira; Jorge Bernardino; Henrique Madeira

This paper proposes the compression of data in Relational Database Management Systems (RDBMS) using existing text compression algorithms. Although the technique proposed is general, we believe it is particularly advantageous for the compression of medium size and large dimension tables in data warehouses. In fact, dimensions usually have a high number of text attributes and a reduction in their size has a big impact in the execution time of queries that join dimensions with fact tables. In general, the high complexity and long execution time of most data warehouse queries make the compression of dimension text attributes (and possible text attributes that may exist in the fact table, such as false facts) an effective approach to speed up query response time. The proposed approach has been evaluated using the well-known TPC-H benchmark and the results show that speed improvements greater than 40% can be achieved for most of the queries.

- Text Processing and Classification | Pp. 356-367

Effectiveness of Document Representation for Classification

Ding-Yi Chen; Xue Li; Zhao Yang Dong; Xia Chen

Conventionally, document classification researches focus on improving the learning capabilities of classifiers. Nevertheless, according to our observation, the effectiveness of classification is limited by the suitability of document representation. Intuitively, the more features that are used in representation, the more comprehensive that documents are represented. However, if a representation contains too many irrelevant features, the classifier would suffer from not only the curse of high dimensionality, but also overfitting. To address this problem of suitableness of document representations, we present a classifier-independent approach to measure the effectiveness of document representations. Our approach utilises a labelled document corpus to estimate the distribution of documents in the feature space. By looking through documents in this way, we can clearly identify the contributions made by different features toward the document classification. Some experiments have been performed to show how the effectiveness is evaluated. Our approach can be used as a tool to assist feature selection, dimensionality reduction and document classification.

- Text Processing and Classification | Pp. 368-377

2-PS Based Associative Text Classification

Tieyun Qian; Yuanzhen Wang; Hao Long; Jianlin Feng

Recent studies reveal that associative classification can achieve higher accuracy than traditional approaches. The main drawback of this approach is that it generates a huge number of rules, which makes it difficult to select a subset of rules for accurate classification. In this study, we propose a novel association-based approach especially suitable for text classification. The approach first builds a classifier through a 2-PS (Two-Phase) method. The first phase aims for pruning rules locally, i.e., rules mined within every category are pruned by a sentence-level constraint, and this makes the rules more semantically correlated and less redundant. In the second phase, all the remaining rules are compared and selected with a global view, i.e., training examples from different categories are merged together to evaluate these rules. Moreover, when labeling a new document, the multiple sentence-level appearances of a rule are taken into account. Experimental results on the well-known text corpora show that our method can achieve higher accuracy than many well-known methods. In addition, the performance study shows that our method is quite efficient in comparison with other classification methods.

- Text Processing and Classification | Pp. 378-387

Intrusion Detection via Analysis and Modelling of User Commands

Matthew Gebski; Raymond K. Wong

Since computers have become a mainstay of everyday life, techniques and methods for detecting intrusions as well as protecting systems and data from unwanted parties have received significant attention recently. We focus on detecting improper use of computer systems through the analysis of user command data. Our approach looks at the structure of the commands used and generates a model which can be used to test new commands. This is accompanied by an analysis of the performance of the proposed approach. Although we focus on commands, the techniques presented in this paper can be extended to allow analysis of other data, such as system calls.

- Miscellaneous Applications | Pp. 388-397

Dynamic Schema Navigation Using Formal Concept Analysis

Jon Ducrou; Bastian Wormuth; Peter Eklund

This paper introduces a framework for relational schema navigation via a Web-based browser application that uses Formal Concept Analysis as the metaphor for analysis and interaction. Formal Concept Analysis is a rich framework for data analysis based on applied lattice and order theory. The application we develop, , is intended to provide users untrained in Formal Concept Analysis with practical and intuitive access to the core functionality of Formal Concept Analysis for the purpose of exploring relational database schema. is a Web-based information systems architecture that supports natural search processes over a preexisting database schema and its content.

- Miscellaneous Applications | Pp. 398-407

FMC: An Approach for Privacy Preserving OLAP

Ming Hua; Shouzhi Zhang; Wei Wang; Haofeng Zhou; Baile Shi

To preserve private information while providing thorough analysis is one of the significant issues in OLAP systems. One of the challenges in it is to prevent inferring the sensitive value through the more aggregated non-sensitive data. This paper presents a novel algorithm FMC to eliminate the inference problem by hiding additional data besides the sensitive information itself, and proves that this additional information is both necessary and sufficient. Thus, this approach could provide as much information as possible for users, as well as preserve the security. The strategy does not impact on the online performance of the OLAP system. Systematic analysis and experimental comparison are provided to show the effectiveness and feasibility of FMC.

- Security and Privacy Issues | Pp. 408-417