Catálogo de publicaciones - libros

Compartir en
redes sociales


Database Systems for Advanced Applications: 10th International Conference, DASFAA 2005, Beijing, China, April 17-20, 2005, Proceedings

Lizhu Zhou ; Beng Chin Ooi ; Xiaofeng Meng (eds.)

En conferencia: 10º International Conference on Database Systems for Advanced Applications (DASFAA) . Beijing, China . April 17, 2005 - April 20, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-25334-1

ISBN electrónico

978-3-540-32005-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Database Design with Equality-Generating Dependencies

Junhu Wang

In relational database systems, traditional normalization techniques (eg, BCNF, 4NF) remove data redundancies from a single relation, but can not detect and remove redundancies across multiple relations. However, redundancies among multiple relations are abundant especially in integrated databases. In this paper, we first propose to detect such data redundancies using equality-generating dependencies (EGDs) and propose an extended normal form (ENF) of relational database schema with respect to EGDs. We show that a database has no potential data redundancies with respect to EGDs if and only if the schema is in ENF. For a common special class of EGDs, we provide a set of sound and complete inference rules. A normalization process is presented to losslessly transform a relational database schema to one in ENF. We then extend our EGDs and ENF to XML data, and show how similar data redundancy problems can be detected for data-centric XML documents.

- Extending XML | Pp. 335-346

WDEE: Web Data Extraction by Example

Zhao Li; Wee Keong Ng

Web data extraction systems in use today transform semi-structured Web documents and deliver structured documents to end users. Some systems provide a visual interface to users to generate the extraction rules. However, to end users, the visual effect of Web documents is lost during the transformation process. In this paper, we propose an approach that allows a user to query extracted documents without knowledge of formal query language. We bridge the gap between visual effect of Web documents and structured documents extracted by providing a QBE-like (Query by Example) interface named . The principle component of our method is the notion of a document schema. Document are patterns of structures embedded in documents. generates tree skeletons based on schema information and a user may execute queries by input condition in the skeltons. By maintaining the mapping relation among schemata of Web documents and extracted documents, a visual example may be presented to end users. With the example, allows a user to construct tree skeletons in a manner that resembles the browsing of Web pages.

- Web Services | Pp. 347-358

Concept-Based Retrieval of Alternate Web Services

Dunlu Peng; Sheng Huang; Xiaoling Wang; Aoying Zhou

Web services have attracted much attention in recent years with the development of e-commercial technologies over the Internet. Although there are some standards and protocols for web service technologies, such as WSDL, UDDI and SOAP, the core technologies underlying Web services need further study in order to make these technologies practical and flexible. Efficient services management is the main task for services execution and services composition and there is no good solution until now. In this paper, we present a concept-based method for services management, which is efficient for services selection and alternative. Our method takes advantage of lattice and retrieves the optimal alternates for a given Web service efficiently by employing formal concept analysis and concept lattice. Compared with the former methods, this method is more efficient and accurate because the underlying semantics of Web services and users’ requirements are exploited during the processing of retrieval. Experimental results also verify the efficiency and scalability of our approach.

- Web Services | Pp. 359-371

WSQuery: XQuery for Web Services Integration

Zhimao Guo; Xiaoling Wang; Aoying Zhou

Web services integration is one of key issues in many B2B applications. Yet, current approaches to this problem tend to use a general-purpose programming language, thus incur type mismatches between the type system of XML schema and the object-oriented type system. In this paper, we present a uniform method for integrating Web services. We propose an extension of XQuery, referred to as , which can contain Web service calls. We first present the conceptual evaluation strategy of WSQuery programs. Then, for speeding up the evaluation, we propose to schedule Web service calls to exploit parallelism. We carry out dependency analysis to determine dependency relations among Web service calls. The loops, branches and unknown parameters pose particular challenges for dependency analysis and scheduling. We use unfolding techniques to deal with them.

- Web Services | Pp. 372-384

A New Indexing Method for High Dimensional Dataset

Jiyuan An; Yi-Ping Phoebe Chen; Qinying Xu; Xiaofang Zhou

Indexing high dimensional datasets has attracted extensive attention from many researchers in the last decade. Since R-tree type of index structures are known as suffering “curse of dimensionality” problems, Pyramid-tree type of index structures, which are based on the B-tree, have been proposed to break the curse of dimensionality. However, for high dimensional data, the number of pyramids is often insufficient to discriminate data points when the number of dimensions is high. Its effectiveness degrades dramatically with the increase of dimensionality. In this paper, we focus on one particular issue of “curse of dimensionality”; that is, the surface of a hypercube in a high dimensional space approaches 100% of the total hypercube volume when the number of dimensions approaches infinite. We propose a new indexing method based on the surface of dimensionality. We prove that the Pyramid tree technology is a special case of our method. The results of our experiments demonstrate clear priority of our novel method.

- High-Dimensional Indexing | Pp. 385-397

BM-Tree: A Hyperplane-Based Index Method for High-Dimensional Metric Spaces

Xiangmin Zhou; Guoren Wang; Xiaofang Zhou; Ge Yu

In this paper, we propose a novel high-dimensional index method, the BM-tree, to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a rotary binary hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the M-tree. Compared with the key dimension concept in the M-tree, the binary hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Experimental results using two types of real data sets illustrate a significantly improved filtering efficiency.

- High-Dimensional Indexing | Pp. 398-409

Approaching the Efficient Frontier: Cooperative Database Retrieval Using High-Dimensional Skylines

Wolf-Tilo Balke; Jason Xin Zheng; Ulrich Güntzer

Cooperative database retrieval is a challenging problem: top k retrieval delivers manageable results only when a suitable compensation function (e.g. a weighted mean) is explicitly given. On the other hand skyline queries offer intuitive querying to users, but result set sizes grow exponentially and hence can easily exceed manageable levels. We show how to combine the advantages of skyline queries and top k retrieval in an interactive query processing scheme using user feedback on a manageable, representative sample of the skyline set to derive most adequate weightings for subsequent focused top k retrieval. Hence, each user’s information needs are conveniently and intuitively obtained, and only a limited set of best matching objects is returned. We will demonstrate our scheme’s efficient performance, manageable result sizes, and representativeness of the skyline. We will also show how to effectively estimate users’ compensation functions using their feedback. Our approach thus paves the way to intuitive and efficient cooperative retrieval with vague query predicates.

- High-Dimensional Indexing | Pp. 410-421

False-Negative Frequent Items Mining from Data Streams with Bursting

Zhihong Chong; Jeffrey Xu Yu; Hongjun Lu; Zhengjie Zhang; Aoying Zhou

False-negative frequent items mining from a high speed transactional data stream is to find an approximate set of frequent items with respect to a minimum support threshold, . It controls the possibility of missing frequent items using a reliability parameter . The importance of false-negative frequent items mining is that it can exclude false-positives and therefore significantly reduce the memory consumption for frequent itemsets mining. The key issue of false-negative frequent items mining is how to minimize the possibility of missing frequent items. In this paper, we propose a new false-negative frequent items mining algorithm, called Loss-Negative, for handling bursting in data streams. The new algorithm consumes the smallest memory in comparison with other false-negative and false-positive frequent items algorithms. We present theoretical bound of the new algorithm, and analyze the possibility of minimization of missing frequent items, in terms of two possibilities, namely, in-possibility and out-possibility. The former is about how a frequent item can possibly pass the first pruning. The latter is about how long a frequent item can stay in memory while no occurrences of the item comes in the following data stream for a certain period. The new proposed algorithm is superior to the existing false-negative frequent items mining algorithms in terms of the two possibilities. We demonstrate the effectiveness of the new algorithm in this paper.

- Sensor and Stream Data Processing | Pp. 422-434

Adaptively Detecting Aggregation Bursts in Data Streams

Aoying Zhou; Shouke Qin; Weining Qian

Finding bursts in data streams is attracting much attention in research community due to its broad applications. Existing burst detection methods suffer the problems that 1) the parameters of window size and absolute burst threshold, which are hard to be determined a priori, should be given in advance. 2) Only one side bursts, i.e. either increasing or decreasing bursts, can be detected. 3) Bumps, which are changes of aggregation data caused by noises, are often reported as bursts. The disturbance of bumps causes much effort in subsequent exploration of mining results. In this paper, a general burst model is introduced for overcoming above three problems. We develop an efficient algorithm for detecting adaptive aggregation bursts in a data stream given a burst ratio. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that bursts on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method is relatively good, while experimental results depict the applicability and efficiency of our algorithm in different application settings.

- Sensor and Stream Data Processing | Pp. 435-446

Communication-Efficient Implementation of Join in Sensor Networks

Vishal Chowdhary; Himanshu Gupta

A sensor network is a wireless ad hoc network of resource-constrained sensor nodes. In this article, we address the problem of communication-efficient implementation of the SQL “join” operator in sensor networks. We design an optimal join-implementation algorithm that provably incurs minimum communication cost under certain reasonable assumptions. In addition, we design a much faster suboptimal heuristic that empirically delivers a near-optimal solution. We evaluate the performance of our designed algorithms through extensive simulations.

- Sensor and Stream Data Processing | Pp. 447-460