Catálogo de publicaciones - libros

Compartir en
redes sociales


Database and XMLTechnologies: 5th International XML Database Symposium, XSym 2007, Vienna, Austria, September 23-24, 2007. Proceedings

Denilson Barbosa ; Angela Bonifati ; Zohra Bellahsène ; Ela Hunt ; Rainer Unland (eds.)

En conferencia: 5º International XML Database Symposium (XSym) . Vienna, Austria . September 23, 2007 - September 24, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-75287-5

ISBN electrónico

978-3-540-75288-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Normalization Theory for XML

Leonid Libkin

Specifications of XML documents typically consist of typing information (e.g., a DTD), and integrity constraints. Just like relational schema specifications, not all are good – some are prone to redundancies and update anomalies. In the relational world we have a well-developed theory of data design (also known as normalization). A few definitions of XML normal forms have been proposed, but the main question is a particular design is good. In the XML world, we still lack universally accepted query languages such as relational algebra, or update languages that let us reason about storage redundancies, lossless decompositions, and update anomalies. A better approach, therefore, is to come up with notions of good design based on the intrinsic properties of the model itself. We present such an approach, based on Shannon’s information theory, and show how it applies to relational normal forms as well as to XML design, for both native and relational storage.

- Invited Talks | Pp. 1-13

Dynamic Fusion of Web Data

Erhard Rahm; Andreas Thor; David Aumueller

Mashups exemplify a workflow-like approach to dynamically integrate data and services from multiple web sources. Such integration workflows can build on existing services for web search, entity search, database querying, and information extraction and thus complement other data integration approaches. A key challenge is the efficient execution of integration workflows and their query and matching steps at runtime. We relate mashup data integration with other approaches, list major challenges, and outline features of a first prototype design.

- Invited Talks | Pp. 14-16

XPath Query Satisfiability is in PTIME for Real-World DTDs

Manizheh Montazerian; Peter T. Wood; Seyed R. Mousavi

The problem of XPath query satisfiability under DTDs (Document Type Definitions) is to decide, given an XPath query and a DTD , whether or not there is some document valid with respect to on which returns a nonempty result. Recent studies in the literature have shown the problem to be NP-hard or worse for most fragments of XPath. However, in this paper we show that the satisfiability problem is in PTIME for most DTDs used in real-world applications. Firstly, we report on the details of our investigation of real-world DTDs and define two properties that they typically satisfy: being and being . Then we concentrate on the satisfiability problem of XPath queries under such DTDs. We obtain a number of XPath fragments for which the complexity of the satisfiablity problem reduces to PTIME when such real-world DTDs are used.

- XPath Query Answering | Pp. 17-30

Fast Answering of XPath Query Workloads on Web Collections

Mariano P. Consens; Flavio Rizzolo

Several web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?). Dealing with the (highly variable) structure of such web collections poses additional challenges.

This paper introduces DescribeX, a powerful framework that is capable of arbitrarily complex XML summaries of web collections, enabling the efficient evaluation of XPath workloads (supporting all the axes and language constructs in XPath). Experiments validate that DescribeX enables existing document-at-a-time XPath tools to scale up to multi-gigabyte XML collections.

- XPath Query Answering | Pp. 31-45

Let a Single FLWOR Bloom

Matthias Brantner; Carl-Christian Kanne; Guido Moerkotte

To globally optimize execution plans for XQuery expressions, a plan generator must generate and compare plan alternatives. In proven compiler architectures, the unit of plan generation is the query block. Fewer query blocks mean a larger search space for the plan generator and lead to a generally higher quality of the execution plans. The goal of this paper is to provide a toolkit for developers of XQuery evaluators to transform XQuery expressions into expressions with as few query blocks as possible.

Our toolkit takes the form of rewrite rules merging the inner and outer FLWOR expressions into single FLWORs. We focus on previously unpublished rewrite rules and on inner FLWORs occurring in the , , and clauses in the outer FLWOR.

- XQuery Evaluation and Performance | Pp. 46-61

Efficient XQuery Evaluation of Grouping Conditions with Duplicate Removals

Norman May; Guido Moerkotte

Currently, grouping in XQuery must be expressed implicitly with nested FLWOR expressions. With XQuery 1.1, an explicit clause will be part of this query language. As users integrate this new construct into their applications, it becomes important to have efficient evaluation techniques available to process even complex grouping conditions. Among them, the removal of distinct values or distinct nodes in the partitions defined by the clause is not well-supported yet. The evaluation technique proposed in this paper is able to handle duplicate removal in the partitions efficiently. Experiments show the superiority of our solution compared to state-of-the-art query processing.

- XQuery Evaluation and Performance | Pp. 62-76

On the Effectiveness of Flexible Querying Heuristics for XML Data

Zografoula Vagena; Latha Colby; Fatma Özcan; Andrey Balmin; Quanzhong Li

The ability to perform effective XML data retrieval in the absence of schema knowledge has recently received considerable attention. The majority of relevant proposals employs heuristics that identify groups of meaningfully related nodes using information extracted from the input data. These heuristics are employed to effectively prune the search space of all possible node combinations and their popularity is evident by the large number of such heuristics and the systems that use them. However, a comprehensive study detailing the relative merits of these heuristics has not been performed thus far. One of the challenges in performing this study is the fact that these techniques have been proposed within different and not directly comparable contexts. In this paper, we attempt to fill this gap. In particular, we first abstract the common selection problem that is tackled by the relatedness heuristics and show how each heuristic addresses this problem. We then identify data categories where the assumptions made by each heuristic are valid and draw insights on their possible effectiveness. Our findings can help systems implementors understand the strengths and weaknesses of each heuristic and provide simple guidelines for the applicability of each one.

- XQuery Evaluation and Performance | Pp. 77-91

XML Schema Evolution: Incremental Validation and Efficient Document Adaptation

Giovanna Guerrini; Marco Mesiti; Matteo Alberto Sorrenti

XML Schemas describe the structure of valid documents and can be exploited for improving both the efficiency and effectiveness of queries on valid documents. XML Schemas, however, may need to be updated to adhere to new requirements and to face changes in the application domain. Starting from a set of schema modification primitives, in this paper we devise an incremental validation approach that allows to efficiently validate documents, known to be valid for the original schema, for an updated schema. Then, we enhance the approach to adapt the documents to the new schema. Experiments prove that our approach increases the performance of standard validation algorithms in this setting and that the cost of the adaptation process is limited.

- XML Updates, Temporal XML Data and Concurrency | Pp. 92-106

Managing Branch Versioning in Versioned/Temporal XML Documents

Luis J. Arévalo Rosado; Antonio Polo Márquez; Jorge Martínez Gil

Due to the linear nature of time, XML timestamped solutions for the management of XML versions have difficulty in supporting non-lineal versioning. Following up on our previous work, which dealt with a new technique for the management of non-lineal versions of XML graph documents, called versionstamp, we have gone a step forward by adding temporal information to each version included in the document. Not only does it allow us to query the vDocuments on a temporal and version level but also we can manage branch versioning in the temporal axis. Moreover, to check its functionality, we have compared our technique to a timestamped XML solution and a set of Web services has been developed. The easy management of multiple versioning, the large number of queries in different XML standard query languages and its implementation by using only XML technology, are some of the advantages of the proposed technique.

- XML Updates, Temporal XML Data and Concurrency | Pp. 107-121

SXDGL: Snapshot Based Concurrency Control Protocol for XML Data

Peter Pleshachkov; Sergei Kuznetcov

Nowadays, concurrency control for XML data is a big research problem. There are a number of researchers working on this problem, but most of the proposed methods are based on the two-phase locking protocol, which potentially leads to a high blocking rates in data-intensive XML-applications. In this paper we present and evaluate SXDGL, a new snapshot based concurrency control protocol for XML data. SXDGL completely eliminates data contention between read-only and update transactions. Moreover, SXDGL takes into account the hierarchical structure and semantics of XML data model determining conflicts between concurrent XML-operations of update transactions. The conducted evaluation shows significant benefits of SXDGL for processing concurrent transactions in data-intensive XML-applications.

- XML Updates, Temporal XML Data and Concurrency | Pp. 122-136