Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Web-Age Information Management: 7th International Conference, WAIM 2006, Hong Kong, China, June 17-19, 2006, Proceedings

Jeffrey Xu Yu ; Masaru Kitsuregawa ; Hong Va Leong (eds.)

En conferencia: 7º International Conference on Web-Age Information Management (WAIM) . Hong Kong, China . June 17, 2006 - June 19, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-35225-9

ISBN electrónico

978-3-540-35226-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Service Matchmaking Based on Semantics and Interface Dependencies

Shuiguang Deng; Jian Wu; Ying Li; Zhaohui Wu

Most of the current service matchmaking algorithms are based on one presupposition, in which all inputs of a service are indispensable to each output of that service. However, this presupposition does not always hold. This paper analyses this presupposition and argues that it exerts a negative influence on the recall rate and precision to current matchmaking algorithms. A formal service model is then introduced, which extends the service profile of OWL-S. A new service matchmaking algorithm based on the model and semantics is proposed. Compared with other algorithms, the proposed one takes interface dependencies into consideration while performing matchmaking. This algorithm has been applied in a service composition framework called DartFlow. Our experimental data show that this novel service matchmaking outperforms others in terms of the recall rate and precision.

- Web Services | Pp. 240-251

Crawling Web Pages with Support for Client-Side Dynamism

Manuel Álvarez; Alberto Pan; Juan Raposo; Justo Hidalgo

There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually known as the Hidden Web. To be able to deal with this problem, it is necessary to solve two tasks: crawling the client-side and crawling the server-side hidden web. In this paper we present an architecture and a set of related techniques for accessing the information placed in web pages with support for client-side dynamism, dealing with aspects such as JavaScript technology, non-standard session maintenance mechanisms, client redirections, pop-up menus, etc. Our approach leverages current browser APIs and implements novel crawling models and algorithms.

- Web Searching | Pp. 252-262

Collecting Recipe Data from WWW Incrementally

Yu Li; Xiaofeng Meng; Liping Wang; Qing Li

WWW has posed itself as the largest data repository ever available in the history of humankind. Utilizing the Internet as a data source seems to be natural and many efforts have been made. In this paper we focus on establishing a robust system to collect structured recipe data from the Web incrementally, which, as we believe, is a critical step towards practical, continuous, reliable web data extraction systems and therefore utilizing WWW as data sources for various database applications. The reasons for advocating such an incremental approach are two-fold: (1) it is impractical to crawl all the recipe pages from relevant web sites as the Web is highly dynamic; (2) it is almost impossible to induce a general wrapper for future extraction from the initial batch of recipe web pages. In this paper, we describe such a system called which targets at incrementally collecting recipe data from WWW. General issues in establishing an incremental data extraction system are considered and techniques are applied to recipe data collection from the Web. Our is actually used as the backend of a fully-fledged multimedia recipe database system being developed jointly by City University of Hong Kong and Renmin University of China.

- Web Searching | Pp. 263-274

CCWrapper: Adaptive Predefined Schema Guided Web Extraction

Jun Gao; Dongqing Yang; Tengjiao Wang

In this paper, we propose a method called CCWrapper (Classification-Cluster) to extract target data items from web pages under the guide of the predefined schema. CCWrapper extracts and combines the different HTML nodes features, including the style, structure, thesaurus and data type attributes into one unified model, and generates the extraction rules with Bayes classification in the training step. When the new HTML page is handled, CCWrapper generates the probability of the target element for each HTML node and clusters the HTML nodes for extraction based on the intra-document relationship in the HTML document tree. The preliminary experimental results on real-life web sites demonstrate CCWrapper is a promising extraction method.

- Web Searching | Pp. 275-286

MiniTasking: Improving Cache Performance for Multiple Query Workloads

Yan Zhang; Zhifeng Chen; Yuanyuan Zhou

This paper proposes a novel idea, called MiniTasking to reduce the number of cache misses by improving the data for concurrent queries. Our idea is based on the observation that, in many workloads such as decision support systems (DSS), there is usually significant amount of data sharing among different concurrent queries. MiniTasking exploits such data sharing characteristics to improve data temporal locality by scheduling query execution at three levels: (1) It batches queries based on their data sharing characteristics and the cache configuration. (2) It groups operators that share certain data. (3) It schedules mini-tasks which are small pieces of computation in operator groups according to their data locality without violating their execution dependencies.

Our experimental results show that, MiniTasking can significantly reduce the execution time up to 12% for joins. For the TPC-H throughput test workload, MiniTasking improves the end performance up to 20%. Even with the Partition Attributes Across (PAX) layout, MiniTasking further reduces the cache misses by 65% and the execution time by 9%.

- Caching and Moving Objects | Pp. 287-299

Cache Consistency in Mobile XML Databases

Stefan Böttcher

Whenever an XML database is used to provide transactional access to mobile clients in multi-hop networks, standard database technologies like query processing and concurrency control have to be adapted to fundamentally different requirements, including limited bandwidth and unforeseeable lost connections. We present a query processing approach that reduces XML data exchange to the exchange of difference XML fragments wherever possible. Additionally, within our approach transactions can even use cached results of outdated queries and of neighbor clients, wherever they result in a reduction of data exchange. Furthermore, our approach supports a pipelined exchange of queries and partial answers. Finally, we present a timestamp-based approach to concurrency control that guarantees cache consistency and minimizes data exchange between the mobile clients and the XML database server.

- Caching and Moving Objects | Pp. 300-312

Bulkloading Updates for Moving Objects

Xiaoyuan Wang; Weiwei Sun; Wei Wang

Supporting frequent updates is a key challenge in moving object indexing. Most of the existing work regards the update as an individual process for each object, and a large number of separate updates are issued respectively in update-intensive environments. In this paper, we propose the bulkloading updates for moving objects (BLU). Based on a common framework, we propose three bulkloading schemes of different spatial biases. By grouping the objects with near positions, BLU prefetches the nodes accessed on the shared update path and combines multiple disk accesses to the same node into one, which avoids I/O overhead for objects within the same group. We also propose a novel MBR-driven flushing algorithm, which utilizes the dynamic spatial correlation and improves the buffer hit ratio. The theoretical analysis and experimental evaluation demonstrate that BLU achieves the good update performance and does not affect the query performance.

- Caching and Moving Objects | Pp. 313-324

Finding the Plateau in an Aggregated Time Series

Min Wang; X. Sean Wang

Given input time series, an aggregated series can be formed by aggregating the values at each time position. It is often useful to find the time positions whose aggregated values are the greatest. Instead of looking for individual top- time positions, this paper gives two algorithms for finding the time interval (called the plateau) in which the aggregated values are close to each other (within a given threshold) and are all no smaller than the aggregated values outside of the interval. The first algorithm is a centralized one assuming that all data are available at a central location, and the other is a distributed search algorithm that does not require such a central location. The centralized algorithm has a linear time complexity with respect to the length of the time series, and the distributed algorithm employs the Threshold Algorithm by Fagin et al. and is quite efficient in reducing the communication cost as shown by the experiments reported in the paper.

- Temporal Database | Pp. 325-336

Compressing Spatial and Temporal Correlated Data in Wireless Sensor Networks Based on Ring Topology

Siwang Zhou; Yaping Lin; Jiliang Wang; Jianming Zhang; Jingcheng Ouyang

In this paper, we propose an algorithm for wavelet based spatio-temporal data compression in wireless sensor networks. By employing a ring topology, the algorithm is capable of supporting a broad scope of wavelets that can simultaneously explore the spatial and temporal correlations among the sensory data. Furthermore, the ring based topology is in particular effective in eliminating the “border effect” generally encountered by wavelet based schemes. We propose a “Hybrid” decomposition based wavelet transform instead of wavelet transform based on the common dyadic decomposition, since temporal compression is local and far cheaper than spatial compression in sensor networks. We show that the optimal level of wavelet transform is different due to diverse sensor network circumstances. Theoretically and experimentally, we conclude the proposed algorithm can effectively explore the spatial and temporal correlation in the sensory data and provide significant reduction in energy consumption and delay compared to other schemes.

- Temporal Database | Pp. 337-348

Discovery of Temporal Frequent Patterns Using TFP-Tree

Long Jin; Yongmi Lee; Sungbo Seo; Keun Ho Ryu

Mining temporal frequent patterns in transaction databases, time-series databases, and many other kinds of databases have been widely studied in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and long patterns. In this paper, we propose an efficient temporal frequent pattern mining method using the TFP-tree (Temporal Frequent Pattern tree). This approach has three advantages: (i) one can scan the transaction only once for reducing significantly the I/O cost; (ii) one can store all transactions in leaf nodes but only save the star calendar patterns in the internal nodes. So we can save a large amount of memory. Moreover, we divide the transactions into many partitions by maximum size domain which significantly saves the memory; (iii) we efficiently discover each star calendar pattern in internal node using the frequent calendar patterns of leaf node. Thus we can reduce significantly the computational time. Our performance study shows that the TFP-tree is efficient and scalable for mining, and is about an order of magnitude faster than the classical frequent pattern mining algorithms.

- Temporal Database | Pp. 349-361