Catálogo de publicaciones - libros

Compartir en
redes sociales


Database Systems for Advanced Applications: 10th International Conference, DASFAA 2005, Beijing, China, April 17-20, 2005, Proceedings

Lizhu Zhou ; Beng Chin Ooi ; Xiaofeng Meng (eds.)

En conferencia: 10º International Conference on Database Systems for Advanced Applications (DASFAA) . Beijing, China . April 17, 2005 - April 20, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-25334-1

ISBN electrónico

978-3-540-32005-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques

Young-Ho Park; Kyu-Young Whang; Byung Suk Lee; Wook-Shin Han

We propose XIR, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval (IR) techniques. A partial match query is defined as the one having the descendent-or-self axis “//” in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR has its basis on the conventional schema-level methods using relational tables and significantly improves their efficiency using two techniques: an inverted index technique and a novel prefix match join. The former indexes the labels in label paths as keywords in texts, and allows for finding the label paths matching the queries more efficiently than string match used in the conventional methods. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. We compare the efficiency of XIR with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.

- XML Query Processing | Pp. 95-112

PathStack: A Holistic Path Join Algorithm for Path Query with Not-Predicates on XML Data

Enhua Jiao; Tok Wang Ling; Chee-Yong Chan

The evaluation of path queries forms the basis of complex XML query processing which has attracted a lot of research attention. However, none of these works have examined the processing of more complex queries that contain not-predicates. In this paper, we present the first study on evaluating path queries with not-predicates. We propose an efficient holistic path join algorithm, PathStack, which has the following advantages: (1) it requires only one scan of the relevant data to evaluate path queries with not-predicates; (2) it does not generate any intermediate results; and (3) its memory space requirement is bounded by the longest path in the input XML document. We also present an improved variant of PathStack that further minimizes unnecessary computations.

- XML Query Processing | Pp. 113-124

An Improved Prefix Labeling Scheme: A Binary String Approach for Dynamic Ordered XML

Changqing Li; Tok Wang Ling

A number of labeling schemes have been designed to facilitate the query of XML, based on which the ancestor-descendant relationship between any two nodes can be determined quickly. Another important feature of XML is that the elements in XML are intrinsically ordered. However the label update cost is high based on the present labeling schemes. They have to re-label the existing nodes or re-calculate some values when inserting an order-sensitive element. Thus it is important to design a scheme that supports order-sensitive queries, yet it has low label update cost. In this paper, we design a binary string prefix scheme which supports order-sensitive update without any re-labeling or re-calculation. Theoretical analysis and experimental results also show that this scheme is compact compared to the existing dynamic labeling schemes, and it provides efficient support to both ordered and un-ordered queries.

- XML Coding and Metadata Management | Pp. 125-137

Efficiently Coding and Indexing XML Document

Zhongming Han; Congting Xi; Jiajin Le

In this paper, a novel and efficient numbering scheme is presented, which combines the label path information and data path information, and it can efficiently support all kinds of queries. A compact index structure, named HiD, is also proposed in this paper. Query algorithms based this index structure are introduced. At last, the comprehensive experiments are conducted to assess all the technologies in question.

- XML Coding and Metadata Management | Pp. 138-150

XQuery-Based TV-Anytime Metadata Management

Jong-Hyun Park; Byung-Kyu Kim; Yong-Hee Lee; Min-Woo Lee; Min-Ok Jung; Ji-Hoon Kang

Digital broadcasting is a novel paradigm for the next generation broadcasting. It can offer a new opportunity for interactive services such as content-based browsing, non-linear navigation, usage of user preference, and history, etc. On the other hand, one of the important factors for this new broadcasting environment is the interoperability among providers and consumers since the environment is distributed. Therefore a standard metadata for digital broadcasting is required and TV-Anytime metadata is one of the metadata standards for digital broadcasting. It is defined using XML schema, so its instances are XML data. In order to fulfill interoperability, a standard query language is also required and XQuery, which is a forthcoming standard query language for XML data, is a natural choice. In this paper we propose an efficient XML data management system that supports TV-Anytime metadata, especially using XQuery as a query language. Since the volume of metadata would be very large in real situation, our system considers a relational database system as storage. We implement a prototype system and test performance for various typical queries by comparing our system with other general-purpose systems.

- XML Coding and Metadata Management | Pp. 151-162

Effective Database Transformation and Efficient Support Computation for Mining Sequential Patterns

Chung-Wen Cho; Yi-Hung Wu; Arbee L. P. Chen

In this paper, we introduce a novel algorithm for mining sequential patterns from transaction databases. Since the FP-tree based approach is efficient in mining frequent itemsets, we adapt it to find frequent 1-sequences. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one in the symbolic form. We observe that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smallest size. To discover the frequent k-sequences, we design a tree structure to store the candidates. Each customer sequence is then scanned to decide whether the candidates are frequent k-sequences. We propose a technique to avoid redundantly enumerating the identical k-subsequences from a customer sequence to speed up the process. Moreover, the tree structure is designed in a way such that the supports of the candidates can be incremented for a customer sequence by a single sequential traversal of the tree. The experiment results show that our approach outperforms the previous works in various aspects including the scalability and the execution time.

- Data Mining | Pp. 163-174

Mining Succinct Systems of Minimal Generators of Formal Concepts

Guozhu Dong; Chunyu Jiang; Jian Pei; Jinyan Li; Limsoon Wong

Formal concept analysis has become an active field of study for data analysis and knowledge discovery. A formal concept is determined by its extent (the set of objects that fall under ) and its intent (the set of properties or attributes covered by ). The intent for , also called a closed itemset, is the maximum set of attributes that characterize . The minimal generators for are the minimal subsets of ’s intent which can similarly characterize . This paper introduces the (SSMG) as a minimal representation of the minimal generators of all concepts, and gives an efficient algorithm for mining SSMGs. The SSMGs are useful for revealing the equivalence relationship among the minimal generators, which may be important for medical and other scientific discovery; and for revealing the extent-based semantic equivalence among associations. The SSMGs are also useful for losslessly reducing the size of the representation of all minimal generators, similar to the way that closed itemsets are useful for losslessly reducing the size of the representation of all frequent itemsets. The removal of redudancies will help human users to grasp the structure and information in the concepts.

- Data Mining | Pp. 175-187

A General Approach to Mining Quality Pattern-Based Clusters from Microarray Data

Daxin Jiang; Jian Peii; Aidong Zhang

Pattern-based clustering has broad applications in microarray data analysis, customer segmentation, e-business data analysis, etc. However, pattern-based clustering often returns a large number of highly-overlapping clusters, which makes it hard for users to identify interesting patterns from the mining results. Moreover, there lacks of a general model for pattern-based clustering. Different kinds of patterns or different measures on the pattern coherence may require different algorithms. In this paper, we address the above two problems by proposing a general quality-driven approach to mining top- quality pattern-based clusters. We examine our quality-driven approach using real world microarray data sets. The experimental results show that our method is general, effective and efficient.

- Data Mining | Pp. 188-200

Real Datasets for File-Sharing Peer-to-Peer Systems

Shen Tat Goh; Panos Kalnis; Spiridon Bakiras; Kian-Lee Tan

The fundamental drawback of unstructured peer-to-peer (P2P) networks is the flooding-based query processing protocol that seriously limits their scalability. As a result, a significant amount of research work has focused on designing efficient search protocols that reduce the overall communication cost. What is lacking, however, is the availability of real data, regarding the exact content of users’ libraries and the queries that these users ask. Using trace-driven simulations will clearly generate more meaningful results and further illustrate the efficiency of a generic query processing protocol under a real-life scenario.

Motivated by this fact, we developed a Gnutella-style probe and collected detailed data over a period of two months. They involve around 4,500 users and contain the exact files shared by each user, together with any available metadata (e.g., artist for songs) and information about the nodes (e.g., connection speed). We also collected the queries initiated by these users. After filtering, the data were organized in XML format and are available to researchers. Here, we analyze this dataset and present its statistical characteristics. Additionally, as a case study, we employ it to evaluate two recently proposed P2P searching techniques.

- Data Generation and Understanding | Pp. 201-213

SemEQUAL: Multilingual Semantic Matching in Relational Systems

A. Kumaran; Jayant R. Haritsa

In an increasingly multilingual world, it is critical that information management tools organically support the simultaneous use of multiple . A pre-requisite for efficiently achieving this goal is that the underlying database engines must provide seamless matching of text data across languages. We propose here SemEQUAL, a new SQL functionality for semantic matching of multilingual attribute data. Our current implementation defines matches based on the standard WordNet linguistic ontologies. A performance evaluation of SemEQUAL, implemented using standard SQL:1999 features on a suite of commercial database systems indicates unacceptably slow response times. However, by tuning the schema and index choices to match typical linguistic features, we show that the performance can be improved to a level commensurate with online user interaction.

- Data Generation and Understanding | Pp. 214-225