Data Warehousing and Knowledge Discovery: 7th International Conference, DaWak 2005, Copenhagen, Denmark, August 22-26, 2005, Proceedings

A Min Tjoa ; Juan Trujillo (eds.)

7º International Conference on Data Warehousing and Knowledge Discovery (DaWaK) . Copenhagen, Denmark . August 22, 2005 - August 26, 2005

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

A Tree Comparison Approach to Detect Changes in Data Warehouse Structures

Johann Eder; Christian Koncilia; Karl Wiggisser

We present a technique for discovering and representing changes between versions of data warehouse structures. We select a tree comparison algorithm, adapt it for the particularities of multidimensional data structures and extend it with a module for detection of node renamings. The result of these algorithms are so called editscripts consisting of transformation operations which, when executed in sequence, transform the earlier version to the later, and thus show the relationships between the elements of different versions of data warehouse structures. This procedure helps data warehouse administrators to register changes. We describe a prototypical implementation of the concept which imports multidimensional structures from Hyperion Essbase data warehouses, compares these versions and generates a list of differences.

Data Warehouse I

Extending the UML for Designing Association Rule Mining Models for Data Warehouses

José Jacobo Zubcoff; Juan Trujillo

Association rules (AR) are one of the most popular data mining techniques in searching databases for frequently occurring patterns. In this paper, we present a novel approach to accomplish the conceptual design of data warehouses together with data mining association rules, allowing us to implement the association rules defined in the conceptual modeling phase. The great advantage of our approach is that the association rules are specified from the early stages of a data warehouse project and based on the main final user requirements and data warehouse goals, instead of specifying them on the final database implementation structures such as tables, rows or columns. Finally, to show the benefit of our approach we implement the specified association rules on a commercial data warehouse management server.

Data Warehouse I

Event-Feeded Dimension Solution

Tho Manh Nguyen; Jaromir Nemec; Martin Windisch

From the point of view of a data warehouse system its part of collecting and receiving information from other systems is crucial for all subsequent business intelligence applications. The incoming information can be classified generally in two types, the state-snapshot data and the state-change or event data usually called transactional data, which contains information about the change processes applied on the instances of information objects. On the way towards active data warehouses it becomes more important to provide complete data with minimal latency. We focus in this paper on dimensional data provided by any data-master application. The information transfer is done via messages containing the change-information of the dimension instances. The receiving data warehouse system is able to validate the event-messages, reconstruct the complete history of the dimension and provide a well applicable “comprehensive slowly changing dimension” (cSCD) interface for well-performing queries on the historical and current state of the dimension. A prototype implementation of “active integration” of a data warehouse is proposed.

Data Warehouse I

XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses

Byung-Kwon Park; Hyoil Han; Il-Yeol Song

Recently, a large number of XML documents are available on the Internet. This trend motivated many researchers to analyze them multi-dimensionally in the same way as relational data. In this paper, we propose a new framework for multidimensional analysis of XML documents, which we call . We base XML-OLAP on XML warehouses where every fact data as well as dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new multidimensional expression language for XML cubes, which we call . XML-MDX statements target XML cubes and use XQuery expressions to designate the measure data. They specify text mining operators for aggregating text constituting the measure data. We evaluate XML-OLAP by applying it to a U.S. patent XML warehouse. We use XML-MDX queries, which demonstrate that XML-OLAP is effective for multi-dimensionally analyzing the U.S. patents.

Data Warehouse I

Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates

Alkis Simitsis; Panos Vassiliadis; Manolis Terrovitis; Spiros Skiadopoulos

Extract-Transform-Load (ETL) workflows are data centric workflows responsible for transferring, cleaning, and loading data from their respective sources to the warehouse. In this paper, we build upon existing graph-based modeling techniques that treat ETL workflows as graphs by (a) extending the activity semantics to incorporate negation, aggregation and self-joins, (b) complementing querying semantics with insertions, deletions and updates, and (c) transforming the graph to allow zoom-in/out at multiple levels of abstraction (i.e., passing from the detailed description of the graph at the attribute level to more compact variants involving programs, relations and queries and vice-versa).

Data Warehouse II

Extending UML 2 Activity Diagrams with Business Intelligence Objects

Veronika Stefanov; Beate List; Birgit Korherr

Data Warehouse (DWH) information is accessed by business processes. Today, no conceptual models exist that make the relationship between the DWH and the business processes transparent. In this paper, we extend a business process modeling diagram, namely the UML 2 activity diagram with a UML profile, which allows to make this relationship explicit. The model is tested with example business processes.

Data Warehouse II

Automatic Selection of Bitmap Join Indexes in Data Warehouses

Kamel Aouiche; Jérôme Darmont; Omar Boussaïd; Fadila Bentayeb

The queries defined on data warehouses are complex and use several join operations that induce an expensive computational cost. This cost becomes even more prohibitive when queries access very large volumes of data. To improve response time, data warehouse administrators generally use indexing techniques such as star join indexes or bitmap join indexes. This task is nevertheless complex and fastidious. Our solution lies in the field of data warehouse auto-administration. In this framework, we propose an automatic index selection strategy. We exploit a data mining technique ; more precisely frequent itemset mining, in order to determine a set of candidate indexes from a given workload. Then, we propose several cost models allowing to create an index configuration composed by the indexes providing the best profit. These models evaluate the cost of accessing data using bitmap join indexes, and the cost of updating and storing these indexes.

Data Warehouse II

A Survey of Open Source Tools for Business Intelligence

Christian Thomsen; Torben Bach Pedersen

The industrial use of open source Business Intelligence (BI) tools is not yet common. It is therefore of interest to explore which possibilities are available for open source BI and compare the tools.

In this survey paper, we consider the capabilities of a number of open source tools for BI. In the paper, we consider three Extract-Transform-Load (ETL) tools, three On-Line Analytical Processing (OLAP) servers, two OLAP clients, and four database management systems (DBMSs). Further, we describe the licenses that the products are released under.

It is argued that the ETL tools are still not very mature for use in industry while the DBMSs are mature and applicable to real-world projects. The OLAP servers and clients are not as powerful as commercial solutions but may be useful in less demanding projects.

Evaluating Data Warehouses and Tools

DWEB: A Data Warehouse Engineering Benchmark

Jérôme Darmont; Fadila Bentayeb; Omar Boussaïd

Data warehouse architectural choices and optimization techniques are critical to decision support query performance. To facilitate these choices, the performance of the designed data warehouse must be assessed. This is usually done with the help of benchmarks, which can either help system users comparing the performances of different systems, or help system engineers testing the effect of various design choices. While the TPC standard decision support benchmarks address the first point, they are not tuneable enough to address the second one and fail to model different data warehouse schemas. By contrast, our Data Warehouse Engineering Benchmark (DWEB) allows to generate various ad-hoc synthetic data warehouses and workloads. DWEB is fully parameterized to fulfill data warehouse design needs. However, two levels of parameterization keep it relatively easy to tune. Finally, DWEB is implemented as a Java free software that can be interfaced with most existing relational database management systems. A sample usage of DWEB is also provided in this paper.

Evaluating Data Warehouses and Tools

A Set of Quality Indicators and Their Corresponding Metrics for Conceptual Models of Data Warehouses

Gema Berenguer; Rafael Romero; Juan Trujillo; Manuel Serrano; Mario Piattini

The quality of Data Warehouses is absolutely relevant for organizations in the decision making process. The sooner we can deal with quality metrics (i.e. conceptual modelling), the more willing we are in achieving a data warehouse (DW) of a high quality. From our point of view, there is a lack of more objective indicators (metrics) to guide the designer in accomplishing an outstanding model that allows us to guarantee the quality of these data warehouses. However, in some cases, the goals and purposes of the proposed metrics are not very clear on their own. Lately, quality indicators have been proposed to properly define the goals of a measurement process and group quality measures in a coherent way. In this paper, we present a framework to design metrics in which each metric is part of a quality indicator we wish to measure. In this way, our method allows us to define metrics (theoretically validated) that are valid and perfectly measure our goals as they are defined together a set of well defined quality indicators.

Evaluating Data Warehouses and Tools