Catálogo de publicaciones - libros

Compartir en
redes sociales


Data Warehousing and Knowledge Discovery: 4th International Conference, DaWaK 2002 Aix-en-Provence, France, September 4-6, 2002. Proceedings

Yahiko Kambayashi ; Werner Winiwarter ; Masatoshi Arikawa (eds.)

En conferencia: 4º International Conference on Data Warehousing and Knowledge Discovery (DaWaK) . Aix-en-Provence, France . September 4, 2002 - September 6, 2002

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2002 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-44123-6

ISBN electrónico

978-3-540-46145-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2002

Tabla de contenidos

Optimization of Association Word Knowledge Base through Genetic Algorithm

Su-Jeong Ko; Jung-Hyun Lee

Query expansion in knowledge based on information retrieval system requires knowledge base being considered semantic relations between words. Since Apriori algorithm extracts association word without taking user preference into account, recall is improved but accuracy is reduced. This paper shows how to establish optimized association word knowledge base with improved accuracy only including association word that users prefer among association words being considered semantic relations between words. Toward this end, web documents related to computer are classified into eight classes, and nouns are extracted from web document of each class. Association word is extracted from nouns through Apriori algorithm, and association word that users do not favor is excluded from knowledge base through genetic algorithm.

- Applications | Pp. 212-221

Mining Temporal Patterns from Health Care Data

Weiqiang Lin; Mehmet A. Orgun; Graham J. Williams

This paper describes temporal data mining techniques for extracting information from temporal health records consisting of a time series of elderly diabetic patients’ tests. We propose a data mining procedure to analyse these time sequences in three steps to identify patterns from any longitudinal data set. The first step is a structure-based search using wavelets to find pattern structures. The second step employs a value-based search over the discovered patterns using the statistical distribution of data values. The third step combines the results from the first two steps to form a hybrid model. The hybrid model has the expressive power of both wavelet analysis and the statistical distribution of the values. Global patterns are therefore identified.

- Applications | Pp. 222-231

Adding a Performance-Oriented Perspective to Data Warehouse Design

Pedro Bizarro; Henrique Madeira

Data warehouse design is clearly dominated by the business perspective. Quite often, data warehouse administrators are lead to data models with little room for performance improvement. However, the increasing demands for interactive response time from the users make query performance one of the central problems of data warehousing today. In this paper we defend that data warehouse design must take into account both the business and the performance perspective from the beginning, and we propose the extension to typical design methodologies to include performance concerns in the early design steps. Specific analysis to predicted data warehouse usage profile and meta-data analysis are proposed as new inputs for improving the transition from logical to physical schema. The proposed approach is illustrated and discussed using the TPC-H performance benchmark and it is shown that significant performance improvement can be achieved without jeopardizing the business view required for data warehouse models.

- Data Warehouse Design | Pp. 232-244

Cost Modeling and Estimation for OLAP-XML Federations

Dennis Pedersen; Karsten Riis; Torben Bach Pedersen

The ever-changing data requirements of today’s dynamic businesses are not handled well by current OLAP systems. Physical integration of data into OLAP systems is a time-consuming process, making logical the better choice in many cases. The increasing use of XML suggests that the required data will often be available in XML format. Thus, federations of OLAP and XML databases will be very attractive in many situations. In an efficient implementation of OLAP-XML federations, cost-based optimization is a must, creating a need for an effective cost model for OLAP-XML federations.

In this paper we present a for OLAP-XML federations, and outline techniques for the cost model parameters in a federated OLAP-XML environment. The paper also outlines the cost models for the OLAP and XML components in the federation on which the federation cost model is built. The cost model has been used as the basis for effective cost-based query optimization in OLAP-XML federations. Experiments show that the cost model is precise enough to make a substantial difference in the query optimization process.

- Data Warehouse Design | Pp. 245-254

Constraint-Free Join Processing on Hyperlinked Web Data

Sourav S. Bhowmick; Wee Keong Ng; Sanjay Madria; Mukesh Mohania

In this paper,we introduce the concept of for hyperlinked Web data.A web join is one of the web algebraic in our system called Whoweda (areouse f b ta).It can be used to gather useful, composite information from two .Web join and (a derivative of web join)operations can be used to detect and represen changes to hyper- linked Web data.We discuss the syntax,semantics and algorithms of operators.We also presen how to detect and represen changes to hyperlinked Web data using these wo operations.

- Data Warehouse Design | Pp. 255-264

Focusing on Data Distribution in the WebDW System

Cristina Dutra de Aguiar Ciferri; Fernando da Fonseca de Souza

The WebDW system is a distributed client-server data warehousing environment, which is aimed not only at the data warehouse distribution, but also at the distributed access to these data using the Web technology as an infrastructure. In this paper, we introduce the WebDW system, focusing on one of its main objectives: the data warehouse distribution. Such a system is presented in terms of its main components and their respective functionalities. The paper also describes the algorithm for fragmenting horizontally the warehouse data, which is used as a basis for the WebDW system.

- Data Warehouse Design | Pp. 265-274

A Decathlon in Multidimensional Modeling: Open Issues and Some Solutions

W. Hümmer; W. Lehner; A. Bauer; L. Schlesinger

The concept of multidimensional modeling has proven extremely successful in the area of Online Analytical Processing (OLAP) as one of many applications running on top of a data warehouse installation. Although many different modeling techniques expressed in extended multidimensional data models were proposed in the recent past, we feel that many hot issues are not properly reflected. In this paper we address ten common problems reaching from defects within dimensional structures over multidimensional structures to new analytical requirements and more.

- OLAP | Pp. 275-285

Modeling and Imputation of Large Incomplete Multidimensional Datasets

Xintao Wu; Daniel Barbará

The presence of missing or incomplete data is a commonplace in large real-word databases. In this paper, we study the problem of missing values which occur at the measure dimension of data cube. We propose a two-part mixture model, which combines the logistic model and loglinear model together, to predict and impute the missing values. The logistic model here is applied to predict missing positions while the loglinear model is applied to compute the estimation. Experimental results on real datasets and synthetic datasets are presented.

- OLAP | Pp. 286-295

PartJoin:An Efficient Storage and Query Execution for Data Warehouses

Ladjel Bellatreche; Michel Schneider; Mukesh Mohania; Bharat Bhargava

The performance of OLAP queries can be improved drastically if the warehouse data is properly selected and indexed. The problems of selecting and materializing views and indexing data have been studied extensively in the data warehousing environment. On the other hand, data partitioning can also greatly increase the performance of queries. Data partitioning has advantage over data selection and indexing since the former one does not require additional storage requirement. In this paper,we show that it is beneficial to integrate the data partitioning and indexing (join indexes)techniques for improving the performance of data warehousing queries.We present a data warehouse , called , that decomposes the fact and dimension tables of a star schema and then selects join indexes. This solution takes advantage of these two techniques, i.e., data partitioning and indexing. Finally,we present the results of an experimental evaluation that demonstrates the effectiveness of our strategy in reducing the query processing cost and providing an economical utilisation of the storage space.

- OLAP | Pp. 296-306

A Transactional Approach to Parallel Data Warehouse Maintenance

Bin Liu; Songting Chen; Elk A. Rundensteiner

Data Warehousing is becoming an increasingly important technology for information integration and data analysis.Given the dynamic nature of modern distributed environments, both source data and schema changes are likely to occur autonomously and even concurrently in different sources.We have thus developed a comprehensive solution approach, called TxnWrap,that successfully maintains the warehouse views under any type of concurrent source updates.In this work, we now overcome TxnWrap’s restriction that the maintenance is processed one by one for each source update, since that limits the performance. To overcome this limitation, we exploit the transactional approach of TxnWrap to achieve parallel data warehouse maintenance. For this, we first identify the read/write conflicts among the different warehouse maintenance processes. We then propose a parallel maintenance scheduler (PMS)that generates legal schedules that resolve these conflicts.PMS has been implemented and incorporated into our TxnWrap system.The experimental results confirm that our parallel maintenance scheduler significantly improves the performance of data warehouse maintenance.

- Data Warehouse Maintenance | Pp. 307-316