Catálogo de publicaciones - libros
Database Systems for Advanced Applications: 10th International Conference, DASFAA 2005, Beijing, China, April 17-20, 2005, Proceedings
Lizhu Zhou ; Beng Chin Ooi ; Xiaofeng Meng (eds.)
En conferencia: 10º International Conference on Database Systems for Advanced Applications (DASFAA) . Beijing, China . April 17, 2005 - April 20, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-25334-1
ISBN electrónico
978-3-540-32005-0
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Cobertura temática
Tabla de contenidos
doi: 10.1007/11408079_31
Database Design with Equality-Generating Dependencies
Junhu Wang
In relational database systems, traditional normalization techniques (eg, BCNF, 4NF) remove data redundancies from a single relation, but can not detect and remove redundancies across multiple relations. However, redundancies among multiple relations are abundant especially in integrated databases. In this paper, we first propose to detect such data redundancies using equality-generating dependencies (EGDs) and propose an extended normal form (ENF) of relational database schema with respect to EGDs. We show that a database has no potential data redundancies with respect to EGDs if and only if the schema is in ENF. For a common special class of EGDs, we provide a set of sound and complete inference rules. A normalization process is presented to losslessly transform a relational database schema to one in ENF. We then extend our EGDs and ENF to XML data, and show how similar data redundancy problems can be detected for data-centric XML documents.
- Extending XML | Pp. 335-346
doi: 10.1007/11408079_32
WDEE: Web Data Extraction by Example
Zhao Li; Wee Keong Ng
Web data extraction systems in use today transform semi-structured Web documents and deliver structured documents to end users. Some systems provide a visual interface to users to generate the extraction rules. However, to end users, the visual effect of Web documents is lost during the transformation process. In this paper, we propose an approach that allows a user to query extracted documents without knowledge of formal query language. We bridge the gap between visual effect of Web documents and structured documents extracted by providing a QBE-like (Query by Example) interface named . The principle component of our method is the notion of a document schema. Document are patterns of structures embedded in documents. generates tree skeletons based on schema information and a user may execute queries by input condition in the skeltons. By maintaining the mapping relation among schemata of Web documents and extracted documents, a visual example may be presented to end users. With the example, allows a user to construct tree skeletons in a manner that resembles the browsing of Web pages.
- Web Services | Pp. 347-358
doi: 10.1007/11408079_33
Concept-Based Retrieval of Alternate Web Services
Dunlu Peng; Sheng Huang; Xiaoling Wang; Aoying Zhou
Web services have attracted much attention in recent years with the development of e-commercial technologies over the Internet. Although there are some standards and protocols for web service technologies, such as WSDL, UDDI and SOAP, the core technologies underlying Web services need further study in order to make these technologies practical and flexible. Efficient services management is the main task for services execution and services composition and there is no good solution until now. In this paper, we present a concept-based method for services management, which is efficient for services selection and alternative. Our method takes advantage of lattice and retrieves the optimal alternates for a given Web service efficiently by employing formal concept analysis and concept lattice. Compared with the former methods, this method is more efficient and accurate because the underlying semantics of Web services and users’ requirements are exploited during the processing of retrieval. Experimental results also verify the efficiency and scalability of our approach.
- Web Services | Pp. 359-371
doi: 10.1007/11408079_34
WSQuery: XQuery for Web Services Integration
Zhimao Guo; Xiaoling Wang; Aoying Zhou
Web services integration is one of key issues in many B2B applications. Yet, current approaches to this problem tend to use a general-purpose programming language, thus incur type mismatches between the type system of XML schema and the object-oriented type system. In this paper, we present a uniform method for integrating Web services. We propose an extension of XQuery, referred to as , which can contain Web service calls. We first present the conceptual evaluation strategy of WSQuery programs. Then, for speeding up the evaluation, we propose to schedule Web service calls to exploit parallelism. We carry out dependency analysis to determine dependency relations among Web service calls. The loops, branches and unknown parameters pose particular challenges for dependency analysis and scheduling. We use unfolding techniques to deal with them.
- Web Services | Pp. 372-384
doi: 10.1007/11408079_35
A New Indexing Method for High Dimensional Dataset
Jiyuan An; Yi-Ping Phoebe Chen; Qinying Xu; Xiaofang Zhou
Indexing high dimensional datasets has attracted extensive attention from many researchers in the last decade. Since R-tree type of index structures are known as suffering “curse of dimensionality” problems, Pyramid-tree type of index structures, which are based on the B-tree, have been proposed to break the curse of dimensionality. However, for high dimensional data, the number of pyramids is often insufficient to discriminate data points when the number of dimensions is high. Its effectiveness degrades dramatically with the increase of dimensionality. In this paper, we focus on one particular issue of “curse of dimensionality”; that is, the surface of a hypercube in a high dimensional space approaches 100% of the total hypercube volume when the number of dimensions approaches infinite. We propose a new indexing method based on the surface of dimensionality. We prove that the Pyramid tree technology is a special case of our method. The results of our experiments demonstrate clear priority of our novel method.
- High-Dimensional Indexing | Pp. 385-397
doi: 10.1007/11408079_36
BM-Tree: A Hyperplane-Based Index Method for High-Dimensional Metric Spaces
Xiangmin Zhou; Guoren Wang; Xiaofang Zhou; Ge Yu
In this paper, we propose a novel high-dimensional index method, the BM-tree, to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a rotary binary hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the M-tree. Compared with the key dimension concept in the M-tree, the binary hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Experimental results using two types of real data sets illustrate a significantly improved filtering efficiency.
- High-Dimensional Indexing | Pp. 398-409
doi: 10.1007/11408079_37
Approaching the Efficient Frontier: Cooperative Database Retrieval Using High-Dimensional Skylines
Wolf-Tilo Balke; Jason Xin Zheng; Ulrich Güntzer
Cooperative database retrieval is a challenging problem: top k retrieval delivers manageable results only when a suitable compensation function (e.g. a weighted mean) is explicitly given. On the other hand skyline queries offer intuitive querying to users, but result set sizes grow exponentially and hence can easily exceed manageable levels. We show how to combine the advantages of skyline queries and top k retrieval in an interactive query processing scheme using user feedback on a manageable, representative sample of the skyline set to derive most adequate weightings for subsequent focused top k retrieval. Hence, each user’s information needs are conveniently and intuitively obtained, and only a limited set of best matching objects is returned. We will demonstrate our scheme’s efficient performance, manageable result sizes, and representativeness of the skyline. We will also show how to effectively estimate users’ compensation functions using their feedback. Our approach thus paves the way to intuitive and efficient cooperative retrieval with vague query predicates.
- High-Dimensional Indexing | Pp. 410-421
doi: 10.1007/11408079_38
False-Negative Frequent Items Mining from Data Streams with Bursting
Zhihong Chong; Jeffrey Xu Yu; Hongjun Lu; Zhengjie Zhang; Aoying Zhou
False-negative frequent items mining from a high speed transactional data stream is to find an approximate set of frequent items with respect to a minimum support threshold, . It controls the possibility of missing frequent items using a reliability parameter . The importance of false-negative frequent items mining is that it can exclude false-positives and therefore significantly reduce the memory consumption for frequent itemsets mining. The key issue of false-negative frequent items mining is how to minimize the possibility of missing frequent items. In this paper, we propose a new false-negative frequent items mining algorithm, called Loss-Negative, for handling bursting in data streams. The new algorithm consumes the smallest memory in comparison with other false-negative and false-positive frequent items algorithms. We present theoretical bound of the new algorithm, and analyze the possibility of minimization of missing frequent items, in terms of two possibilities, namely, in-possibility and out-possibility. The former is about how a frequent item can possibly pass the first pruning. The latter is about how long a frequent item can stay in memory while no occurrences of the item comes in the following data stream for a certain period. The new proposed algorithm is superior to the existing false-negative frequent items mining algorithms in terms of the two possibilities. We demonstrate the effectiveness of the new algorithm in this paper.
- Sensor and Stream Data Processing | Pp. 422-434
doi: 10.1007/11408079_39
Adaptively Detecting Aggregation Bursts in Data Streams
Aoying Zhou; Shouke Qin; Weining Qian
Finding bursts in data streams is attracting much attention in research community due to its broad applications. Existing burst detection methods suffer the problems that 1) the parameters of window size and absolute burst threshold, which are hard to be determined a priori, should be given in advance. 2) Only one side bursts, i.e. either increasing or decreasing bursts, can be detected. 3) Bumps, which are changes of aggregation data caused by noises, are often reported as bursts. The disturbance of bumps causes much effort in subsequent exploration of mining results. In this paper, a general burst model is introduced for overcoming above three problems. We develop an efficient algorithm for detecting adaptive aggregation bursts in a data stream given a burst ratio. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that bursts on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method is relatively good, while experimental results depict the applicability and efficiency of our algorithm in different application settings.
- Sensor and Stream Data Processing | Pp. 435-446
doi: 10.1007/11408079_40
Communication-Efficient Implementation of Join in Sensor Networks
Vishal Chowdhary; Himanshu Gupta
A sensor network is a wireless ad hoc network of resource-constrained sensor nodes. In this article, we address the problem of communication-efficient implementation of the SQL “join” operator in sensor networks. We design an optimal join-implementation algorithm that provably incurs minimum communication cost under certain reasonable assumptions. In addition, we design a much faster suboptimal heuristic that empirically delivers a near-optimal solution. We evaluate the performance of our designed algorithms through extensive simulations.
- Sensor and Stream Data Processing | Pp. 447-460