Catálogo de publicaciones - libros

Compartir en
redes sociales


Database Systems for Advanced Applications: 10th International Conference, DASFAA 2005, Beijing, China, April 17-20, 2005, Proceedings

Lizhu Zhou ; Beng Chin Ooi ; Xiaofeng Meng (eds.)

En conferencia: 10º International Conference on Database Systems for Advanced Applications (DASFAA) . Beijing, China . April 17, 2005 - April 20, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-25334-1

ISBN electrónico

978-3-540-32005-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Zoned-RAID for Multimedia Database Servers

Ali E. Dashti; Seon Ho Kim; Roger Zimmermann

This paper proposes a novel fault-tolerant disk subsystem named (Z-RAID). Z-RAID improves the performance of traditional RAID system by utilizing the property of modern disks which provides multiple zones with different data transfer rates in a disk. This study proposes to optimize data transfer rate of RAID system by constraining placement of data blocks in multi-zone disks. We apply Z-RAID for multimedia database servers such as video servers that require a high data transfer rate as well as fault tolerance. Our analytical and experimental results demonstrate the superiority of Z-RAID to conventional RAID. Z-RAID provides a higher effective data transfer rate in normal mode with no disadvantage. In the presence of a disk failure, Z-RAID still performs as well as RAID.

- Database Performance Issues | Pp. 461-473

Randomized Data Allocation in Scalable Streaming Architectures

Kun Fu; Roger Zimmermann

IP-networked streaming media storage has been increasingly used as a part of many applications. Random placement of data blocks has been proven to be an effective approach to balance heterogeneous workload in multi-disk steaming architectures. However, the main disadvantage of this technique is that statistical variation can still result in short term load imbalances in disk utilization. We propose a (PLR) technique to solve this challenge. We quantify the exact performance trade-off between PLR approach and the traditional (BLR) technique through both theoretical analysis and extensive simulation. Our results show that the PLR technique can achieve much better load balancing in scalable streaming architectures by using more memory space.

- Database Performance Issues | Pp. 474-486

Trace System of iSCSI Storage Access and Performance Improvement

Saneyasu Yamaguchi; Masato Oguchi; Masaru Kitsuregawa

In this paper, an IP-SAN access trace method is proposed and its evaluation is presented. IP-SAN and iSCSI are expected to remedy problems of Fibre Channel (FC)-based SAN. Servers and storage cooperatively work with communications through TCP/IP in IP-SAN system, thus an integrated analysis of both sides is considered to be significant for achieving better performance.

Our system can precisely point out the cause of performance degradation when IP-SAN is used for a remote storage access. In experiment of parallel iSCSI access in a high-latency network, the total performance is limited by a parameter in an implementation of the SCSI layer in the iSCSI protocol stack. Based on the result obtained with our IP-SAN access trace system, the parameter in the layer is modified. As a result, more than 30 times performance improvement is achieved compared with the default value case. Thus it is effective to monitor all the layers in the iSCSI protocol stack and execute an integrated analysis, using our system.

- Database Performance Issues | Pp. 487-497

: Query Processing Based on Collaborative Caching in P2P Systems

Weining Qian; Linhao Xu; Shuigeng Zhou; Aoying Zhou

In this paper, we propose , a P2P query processing architecture that enables sophisticated optimization techniques. is different from existing P2P query processing systems in three ways. First, a coordinator overlay network () maintaining the summary of the whole system is constructed by applying DHT technique to query plan trees. protocol ensures the efficiency for handling dynamic environments. Second, a preliminary cost-based optimization technique for retrieving appropriate cached copies of data is studied. With the help of , we show the possibility of fine optimization in even large scale and dynamic environments. Third, the collaborative caching strategy is presented, with which even small portion of cache storage on each peer may result in great improvement on query processing performance. Extensive experiments over real-world and synthetic settings show the effectiveness and efficiency of .

- Database Performance Issues | Pp. 498-510

Multi-represented NN-Classification for Large Class Sets

Hans-Peter Kriegel; Alexey Pryakhin; Matthias Schubert

The amount of stored information in modern database applications increased tremendously in recent years. Besides their sheer amount, the stored data objects are also more and more complex. Therefore, classification of these complex objects is an important data mining task that yields several new challenges. In many applications, the data objects provide multiple representations. E.g. proteins can be described by text, amino acid sequences or 3D structures. Additionally, many real-world applications need to distinguish thousands of classes. Last but not least, many complex objects are not directly expressible by feature vectors. To cope with all these requirements, we introduce a novel approach to classification of multi-represented objects that is capable to distinguish large numbers of classes. Our method is based on nearest neighbor classification and employs density-based clustering as a new approach to reduce the training instances for instance-based classification. To predict the most likely class, our classifier employs a new method to use several object representations for making accurate class predictions. The introduced method is evaluated by classifying proteins according to the classes of Gene Ontology, one of the most established class systems for biomolecules that comprises several thousand classes.

- Clustering, Classification and Data Warehouses | Pp. 511-522

Enhancing SNNB with Local Accuracy Estimation and Ensemble Techniques

Zhipeng Xie; Qing Zhang; Wynne Hsu; Mong Li Lee

Naïve Bayes, the simplest Bayesian classifier, has shown excellent performance given its unrealistic independence assumption. This paper studies the selective neighborhood-based naïve Bayes (SNNB) for lazy classification, and develops three variant algorithms, SNNB-G, SNNB-L, and SNNB-LV, all with linear computational complexity. The SNNB algorithms use local learning strategy for alleviating the independence assumption. The underlying idea is, for a test example, first to construct multiple classifiers on its multiple neighborhoods with different radius, and then to select out the classifier with the highest estimated accuracy to make decision. Empirical results show that both SNNB-L and SNNB-LV generate more accurate classifiers than naïve Bayes and several other state-of-the-art classification algorithms including C4.5, Naïve Bayes Tree, and Lazy Bayesian Rule. The SNNB-L and SNNB-LV algorithms are also computationally more efficient than the Lazy Bayesian Rule algorithm, especially on the domains with high dimensionality.

- Clustering, Classification and Data Warehouses | Pp. 523-535

MMPClust: A Skew Prevention Algorithm for Model-Based Document Clustering

Xiaoguang Li; Ge Yu; Daling Wang

To support very high dimensionality, model-based clustering is an intuitive choice for document clustering. However, the current model-based algorithms are prone to generating the skewed clusters, which influence the quality of clustering seriously. In this paper, the reasons of skew are examined and determined as the inappropriate initial model, the unfitness of cluster model and the interaction between the decentralization of estimation samples and the over-generalized cluster model. This paper proposes a skew prevention document-clustering algorithm (MMPClust), which has two features: (1) a content-based cluster model is used to model the cluster better; (2) at the re-estimation step, a part of documents most relevant to its corresponding class are selected automatically for each cluster as the estimation samples to break this interaction. MMPClust has less restrictions and more applicability in document clustering than the previous methods.

- Clustering, Classification and Data Warehouses | Pp. 536-547

Designing and Using Views to Improve Performance of Aggregate Queries (Extended Abstract)

Foto Afrati; Rada Chirkova; Shalu Gupta; Charles Loftis

Data-intensive systems routinely use derived data (e.g., indexes or materialized views) to improve query-evaluation performance. We present a system architecture for Query-Performance Enhancement by Tuning (QPET), which combines design and use of derived data in an end-to-end approach to automated query-performance tuning. Our focus is on a tradeo. between (1) the amount of system resources spent on designing derived data and on keeping the data up to date, and (2) the degree of the resulting improvement in query performance. From the technical point of view, the novelty that we introduce is that we combine aggregate query rewriting techniques [1, 2] and view selection techniques [3] to achieve our goal.

- Clustering, Classification and Data Warehouses | Pp. 548-554

Large Relations in Node-Partitioned Data Warehouses

Pedro Furtado

A cheap shared-nothing context can be used to provide significant speedup on large data warehouses, but partitioning and placement decisions are important in such systems as repartitioning requirements can result in much less-than-linear speedup. This problem can be minimized if query workload and schemas are inputs to placement decisions. In this paper we analyze the problem of handling large relations in a node partitioned data warehouse (NPDW) with a basic placement strategy that partitions facts horizontally and replicates dimensions, with the help of a cost model. Then we propose a strategy to improve performance and show both analytical and TPC-H results.

- Clustering, Classification and Data Warehouses | Pp. 555-560

Mining Frequent Tree-Like Patterns in Large Datasets

Tzung-Shi Chen; Shih-Chun Hsu

In this paper, we propose a novel data mining scheme to explore the frequent hierarchical structure patterns, named tree-like patterns, with the relationship of each item on a sequence. By tree-like patterns, we are clear to find out the relation of items between the cause and effect. Finally, we discuss the different characteristics to our mined patterns with others. As a consequence, we can find out that our addressed tree-like patterns can be widely used to explore a variety of different applications.

- Data Mining and Web Data Processing | Pp. 561-567