Catálogo de publicaciones - libros

Compartir en
redes sociales


Data Warehousing and Knowledge Discovery: 7th International Conference, DaWak 2005, Copenhagen, Denmark, August 22-26, 2005, Proceedings

A Min Tjoa ; Juan Trujillo (eds.)

En conferencia: 7º International Conference on Data Warehousing and Knowledge Discovery (DaWaK) . Copenhagen, Denmark . August 22, 2005 - August 26, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-28558-8

ISBN electrónico

978-3-540-31732-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Design and Development of a Tool for Integrating Heterogeneous Data Warehouses

Riccardo Torlone; Ivan Panella

In this paper we describe the design of a tool supporting the integration of independently developed data warehouses, a problem that arises in several common scenarios. The basic facility of the tool is a test of the validity of a matching between heterogeneous dimensions, according to a number of desirable properties. Two strategies are then provided to perform the actual integration. The first approach refers to a scenario of loosely coupled integration, in which we just need to identify the common information between sources and perform drill-across queries over them. The goal of the second approach is the derivation of a materialized view built by merging the sources, and refers to a scenario of tightly coupled integration in which queries are performed against the view. We illustrate architecture and functionality of the tool and the underlying techniques that implement the two integration strategies.

- Evaluating Data Warehouses and Tools | Pp. 105-114

An Evolutionary Approach to Schema Partitioning Selection in a Data Warehouse

Ladjel Bellatreche; Kamel Boukhalfa

The problem of selecting an optimal fragmentation schema of a data warehouse is more challenging compared to that in relational and object databases. This challenge is due to the several choices of partitioning star or snowflake schemas. Data partitioning is beneficial if and only if the fact table is fragmented based on the partitioning schemas of dimension tables. This may increase the number of fragments of the fact tables dramatically and makes their maintenance very costly. Therefore, the right selection of fragmenting schemas is important for better performance of OLAP queries. In this paper, we present a genetic algorithm for schema partitioning selection problem. The proposed algorithm gives better solutions since the search space is constrained by the schema partitioning. We conduct several experimental studies using the APB-1 release II benchmark for validating the proposed algorithm.

- Schema Transformations | Pp. 115-125

Using Schema Transformation Pathways for Incremental View Maintenance

Hao Fan

With the increasing amount and diversity of information available on the Internet, there has been a huge growth in information systems that need to integrate data from distributed, heterogeneous data sources. Incrementally maintaining the integrated data is one of the problems being addressed in data warehousing research. This paper presents an incremental view maintenance approach based on schema transformation pathways. Our approach is not limited to one specific data model or query language, and would be useful in any data transformation/integration framework based on sequences of primitive schema transformations.

- Schema Transformations | Pp. 126-135

Data Mapper: An Operator for Expressing One-to-Many Data Transformations

Paulo Carreira; Helena Galhardas; João Pereira; Antónia Lopes

Transforming data is a fundamental operation in application scenarios involving . Data transformations are often implemented as relational queries that aim at leveraging the optimization capabilities of most RDBMSs. However, relational query languages like SQL are not expressive enough to specify an important class of data transformations that produce several output tuples for a single input tuple. This class of data transformations is required for solving the data heterogeneities that occur when source data represents an aggregation of target data.

In this paper, we propose and formally define the as an extension of the relational algebra to address one-to-many data transformations. We supply an algebraic rewriting technique that enables the optimization of data transformation expressions that combine filters expressed as standard relational operators with mappers. Furthermore, we identify the two main factors that influence the expected optimization gains.

- Schema Transformations | Pp. 136-145

Parallel Consistency Maintenance of Materialized Views Using Referential Integrity Constraints in Data Warehouses

Jinho Kim; Byung-Suk Lee; Yang-Sae Moon; Soo-Ho Ok; Wookey Lee

Data warehouses can be considered as materialized views which maintain the online analytical information extracted from distributed data sources. When data sources are changed, materialized views should be maintained correspondingly to keep the consistency between data sources and materialized views. If a view is defined through joining several source relations, an update in one source relation invokes a set of join subqueries thus the view maintenance takes much time of processing. In this paper, we propose a view maintenance algorithm processing these join subqueries in parallel by using referential integrity constraints over source relations. A relation which has several foreign keys can be joined with referenced relations independently. The proposed algorithm processes these join operations in parallel then it merges their results. With the parallel processing, the algorithm can maintain materialized views efficiently. We show the superiority of the proposed algorithm using an analytical cost model.

- Materialized Views | Pp. 146-156

Selective View Materialization in a Spatial Data Warehouse

Songmei Yu; Vijayalakshmi Atluri; Nabil Adam

A spatial data warehouse (SDW) consists of a set of materialized views defined over the source relations, either conventional, spatial, or both. Often, when compared to the traditional data warehouses, the cost of view materialization is more expensive with respect to both computation and space. This is because the spatial data is typically larger in size, which leads to high maintenance cost, and the spatial operations are more expensive to process. In this paper, we address the issue of optimizing the view materialization cost in an SDW. We build a cost model to measure the on-the-fly computation cost versus the space cost for spatial queries. We show that a spatial query can be represented in the form of the query-graph and propose three transformation rules, edge-elimination, query-splitting and query-joining, to selectively materialize spatial views. We present a greedy algorithm for materialized view selection so that the local cost optimality can be achieved.

- Materialized Views | Pp. 157-167

PMC: Select Materialized Cells in Data Cubes

Hongsong Li; Houkuan Huang; Shijin Liu

QC-Trees is one of the most storage-efficient structures for data cubes in a MOLAP system. Although QC-Trees can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, we present an improved structure PMC, which allow us to partially materialize cells in a QC-Trees. There is a sharp contrast between our partially materialization algorithm and other extensively studied materialized view selection algorithms. If a view is selected in a traditional algorithm, then all cells in this selected view are to be materialized. Our algorithm, however, selects and materializes data by cells. Experiments results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time for update the cube. Along with further reduced space and update cost, our algorithm can ensure a stable query performance.

- Materialized Views | Pp. 168-178

Progressive Ranking of Range Aggregates

Hua-Gang Li; Hailing Yu; Divyakant Agrawal; Amr El Abbadi

Ranking-aware queries have been gaining much attention recently in many applications such as search engines and data streams. They are, however, not only restricted to such applications but are also very useful in OLAP applications. In this paper, we introduce queries in OLAP data cubes motivated by an online advertisement tracking data warehouse application. These queries aggregate information over a specified range and then return the ranked order of the aggregated values. They differ from range aggregate queries in that range aggregate queries are mainly concerned with an aggregate operator such as and over the selected ranges of all dimensions in the data cubes. Existing techniques for range aggregate queries are not able to process aggregation ranking queries efficiently. Hence, in this paper we propose new algorithms to handle this problem. The essence of the proposed algorithms is based on both ranking and cumulative information to progressively rank aggregation results. Furthermore we empirically evaluate our techniques and the experimental results show that the query cost is improved significantly.

- Aggregates | Pp. 179-189

On Efficient Storing and Processing of Long Aggregate Lists

Marcin Gorawski; Rafal Malczok

In this paper we present a solution called Materialized Aggregate List designed for the efficient storing and processing of long aggregate lists. An aggregate list contains aggregates, calculated from the data stored in the database. In our approach, once created, the aggregates are materialized for further use. The list structure contains a table divided into pages. We present three different page-filling algorithms used when the list is browsed. We present test results and we use them for estimating the best combination of the configuration parameters: number of pages, size of a single page and number of available database connections. The Materialized Aggregate List can be applied on every aggregation level in various indexing structures, such as, an aR-tree.

- Aggregates | Pp. 190-199

Ad Hoc Star Join Query Processing in Cluster Architectures

Josep Aguilar-Saborit; Victor Muntés-Mulero; Calisto Zuzarte; Josep-L. Larriba-Pey

Processing of large amounts of data in data warehouses is increasingly being done in cluster architectures to achieve scalability. In this paper we look into the problem of ad hoc star join query processing in clusters architectures. We propose a new technique, the Star Hash Join (SHJ), which exploits a combination of multiple bit filter strategies in such architectures. SHJ is a generalization of the Pushed Down Bit Filters for clusters. The objectives of the technique are to reduce (i) the amount of data communicated, (ii) the amount of data spilled to disk during the execution of intermediate joins in the query plan, and (iii) amount of memory used by auxiliary data structures such as bit filters.

- Data Warehouse Queries and Database Processing Issues | Pp. 200-209