Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Web-Age Information Management: 7th International Conference, WAIM 2006, Hong Kong, China, June 17-19, 2006, Proceedings

Jeffrey Xu Yu ; Masaru Kitsuregawa ; Hong Va Leong (eds.)

En conferencia: 7º International Conference on Web-Age Information Management (WAIM) . Hong Kong, China . June 17, 2006 - June 19, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-35225-9

ISBN electrónico

978-3-540-35226-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

On-Demand Index for Efficient Structural Joins

Kun-Lung Wu; Shyh-Kwei Chen; Philip S. Yu

A structural join finds all occurrences of structural, or containment, relationship between two sets of XML node elements: ancestor and descendant. Prior approaches to structural joins mostly focus on maintaining offline indexes on disks or requiring the elements in both sets to be sorted. However, either one can be expensive. More important, not all node elements are beforehand indexed or sorted. We present an , indexing approach to performing structural joins. There is no need to sort the elements. We discover that there are similarities between the problems of structural joins and stabbing queries. However, previous work on stabbing queries, although efficient in search time, is not directly applicable to structural joins because of high storage costs. We develop two storage reduction techniques to alleviate the problem of high storage costs. Simulations show that our new method outperforms prior approaches.

- Indexing | Pp. 1-12

An Efficient Indexing Scheme for Moving Objects’ Trajectories on Road Networks

Jae-Woo Chang; Jung-Ho Um

Even though moving objects usually move on spatial networks, there has been little research on trajectory indexing schemes for spatial networks, like road networks. In this paper, we propose an efficient indexing scheme for moving objects’ trajectories on road networks. For this, we design a signature-based indexing scheme for efficiently dealing with the trajectories of current moving objects as well as for maintaining those of past moving objects. In addition, we provide both an insertion algorithm to store the initial information of moving objects’ trajectories and one to store their segment information. We also provide a retrieval algorithm to find a set of moving objects whose trajectories match the segments of a query trajectory. Finally, we show that our indexing scheme achieves much better performance on trajectory retrieval than the leading trajectory indexing schemes, such as TB-tree and FNR-tree.

- Indexing | Pp. 13-25

Spatial Index Compression for Location-Based Services Based on a MBR Semi-approximation Scheme

Jongwan Kim; SeokJin Im; Sang-Won Kang; Chong-Sun Hwang

The increased need for spatial data for location-based services or geographical information systems (GISs) in mobile computing has led to more research on spatial indexing, such as R-tree. The R-tree variants approximate spatial data to a minimal bounding rectangle (MBR). Most studies are based on adding or changing various options in R-tree, while a few studies have focused on increasing search performance via MBR compression. This study proposes a novel MBR compression scheme that uses semi-approximation (SA) MBRs and SAR-tree. Since SA decreases the size of MBR keys, halves QMBR enlargement, and increases node utilization, it improves the overall search performance. This scheme decreases quantized space more than existing quantization schemes do, and increases the utilization of each disk allocation unit. This study mathematically analyzes the number of node accesses and evaluates the performance of SAR-tree using real location data. The results show that the proposed index performs better than existing MBR compression schemes.

- Indexing | Pp. 26-35

KCAM: Concentrating on Structural Similarity for XML Fragments

Lingbo Kong; Shiwei Tang; Dongqing Yang; Tengjiao Wang; Jun Gao

This paper proposes a new method, KCAM, to measure the structural similarity of XML fragments satisfying given keywords. Its name is derived directly after the key structure in this method, Keyword Common Ancestor Matrix. One KCAM for one XML fragment is a × upper triangle matrix. Each element stores the level information of the SLCA (Smallest Lowest Common Ancestor) node corresponding to the keywords , . The matrix distance between KCAMs, denoted as (,), can be used as the approximate structural similarity. KCAM is independent of label information in fragments. It is powerful to distinguish the structural difference between XML fragments.

- XML Query Processing | Pp. 36-48

A New Structure for Accelerating XPath Location Steps

Yaokai Feng; Akifumi Makinouchi

Multidimensional indices have been successfully introduced to the field of querying on XML data. Using R*-tree, T. Grust proposed an interesting method to support all XPath axes. In that method, each node of an XML document is labeled with a five-dimensional descriptor. All the nodes of the XML document are mapped to a point set in a five-dimensional space. T. Grust made it clear that each of the XPath axes can be implemented by a range query in the above five-dimensional space. Thus, R*-tree can be used to improve the query performance for XPath axes. However, according to our investigations, most of the range queries for the XPath axes are partially-dimensional range queries. That is, the number of query dimensions in each of the range queries is less than five, although the R*-tree is built in the five-dimensional space. If the existing multidimensional indices are used for such range queries, then a great deal of information that is irrelevant to the queries also has to be read from disk. Based on this observation, a new multidimensional index structure (called Adaptive R*-tree) is proposed in this paper to support the XPath axes more efficiently.

- XML Query Processing | Pp. 49-60

Efficient Evaluation of Multiple Queries on Streamed XML Fragments

Huan Huo; Rui Zhou; Guoren Wang; Xiaoyun Hui; Chuan Xiao; Yongqian Yu

With the prevalence of Web applications, expediting multiple queries over streaming XML has become a core challenge due to one-pass processing and limited resources. Recently proposed Hole-Filler model is low consuming for XML fragments transmission and evaluation; however existing work addressed the multiple query problem over XML tuple streams instead of XML fragment streams. By taking advantage of schema information for XML, this paper proposes a model of tid+ tree to construct multiple queries over XML fragments and to prune off duplicate and dependent operations. Based on tid+ tree, it then proposes a notion of FQ-Index as the core in M-XFPro to index both multiple queries and XML fragments for processing multiple XPath queries involving simple path and twig path patterns. We illustrate the effectiveness of the techniques developed with a detailed set of experiments.

- XML Query Processing | Pp. 61-72

Automated Extraction of Hit Numbers from Search Result Pages

Yanyan Ling; Xiaofeng Meng; Weiyi Meng

When a query is submitted to a search engine, the search engine returns a dynamically generated result page that contains the number of hits (i.e., the number of matching results) for the query. Hit number is a very useful piece of information in many important applications such as obtaining document frequencies of terms, estimating the sizes of search engines and generating search engine summaries. In this paper, we propose a novel technique for automatically identifying the hit number for any search engine and any query. This technique consists of three steps: first segment each result page into a set of blocks, then identify the block(s) that contain the hit number using a machine learning approach, and finally extract the hit number from the identified block(s) by comparing the patterns in multiple blocks from the same search engine. Experimental results indicate that this technique is highly accurate.

- Information Retrieval I | Pp. 73-84

Keyword Extraction Using Support Vector Machine

Kuo Zhang; Hui Xu; Jie Tang; Juanzi Li

This paper is concerned with keyword extraction. By keyword extraction, we mean extracting a subset of words/phrases from a document that can describe the ‘meaning’ of the document. Keywords are of benefit to many text mining applications. However, a large number of documents do not have keywords and thus it is necessary to assign keywords before enjoying the benefit from it. Several research efforts have been done on keyword extraction. These methods make use of the ‘global context information’, which makes the performance of extraction restricted. A thorough and systematic investigation on the issue is thus needed. In this paper, we propose to make use of not only ‘global context information’, but also ‘local context information’ for extracting keywords from documents. As far as we know, utilizing both ‘global context information’ and ‘local context information’ in keyword extraction has not been sufficiently investigated previously. Methods for performing the tasks on the basis of Support Vector Machines have also been proposed in this paper. Features in the model have been defined. Experimental results indicate that the proposed SVM based method can significantly outperform the baseline methods for keyword extraction. The proposed method has been applied to document classification, a typical text mining processing. Experimental results show that the accuracy of document classification can be significantly improved by using the keyword extraction method.

- Information Retrieval I | Pp. 85-96

LSM: Language Sense Model for Information Retrieval

Shenghua Bao; Lei Zhang; Erdong Chen; Min Long; Rui Li; Yong Yu

A lot of work has been done on drawing word senses into retrieval to deal with the word sense ambiguity problem, but most of them achieved negative results. In this paper, we first implement a WSD system for nouns and verbs, then the language sense model (LSM) for information retrieval is proposed. The LSM combines the terms and senses of a document seamlessly through an EM algorithm. Retrieval on TREC collections shows that the LSM outperforms both the vector space model (BM25) and the traditional language model significantly for both medium and long queries (7.53%-16.90%). Based on the experiments, we can also empirically draw the conclusion that the fine-grained senses will improve the retrieval performance when they are properly used.

- Information Retrieval I | Pp. 97-108

Succinct and Informative Cluster Descriptions for Document Repositories

Lijun Chen; Guozhu Dong

Large document repositories need to be organized, summarized and labeled in order to be used effectively. Previous clustering studies focused on organizing, and paid little attention to producing cluster labels. Without informative labels, users need to browse many documents to get a sense of what the clusters contain. Human labeling of clusters is not viable when clustering is performed on demand or for very few users. It is desirable to automatically generate informative cluster descriptions (CDs), in order to give users a high-level sense about the clusters, and to help repository managers to produce the final cluster labels.

This paper studies CDs in the form of small term sets for document clusters, and investigates how to measure the quality or fidelity of CDs and how to construct high quality CDs. We propose to use a CD-based classification for simulating how to interpret CDs, and to use the F-score of the classification to measure CD quality. Since directly searching good CDs using F-score is too expensive, we consider a surrogate quality measure, the CDD measure, which combines three factors: , , and . We give a search strategy for constructing CDs, namely a layer-based replacement method called PagodaCD. Experimental results show that the algorithm is efficient and can produce high quality CDs. CDs produced by PagodaCD also exhibit a monotone quality behavior.

- Information Retrieval II | Pp. 109-121