Catálogo de publicaciones - libros
Database Systems for Advanced Applications: 10th International Conference, DASFAA 2005, Beijing, China, April 17-20, 2005, Proceedings
Lizhu Zhou ; Beng Chin Ooi ; Xiaofeng Meng (eds.)
En conferencia: 10º International Conference on Database Systems for Advanced Applications (DASFAA) . Beijing, China . April 17, 2005 - April 20, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-25334-1
ISBN electrónico
978-3-540-32005-0
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Cobertura temática
Tabla de contenidos
doi: 10.1007/11408079_1
Data Stream Mining and Resource Adaptive Computation
Philip S. Yu
The problem of data streams has gained importance in recent years because of advances in hardware technology. These advances have made it easy to store and record numerous transactions and activities in everyday life in an automated way. The ubiquitous presence of data streams in a number of practical domains has generated a lot of research in this area. Example applications include trade surveillance for security fraud and money laundering, network monitoring for intrusion detection, bio-surveillance for terrorist attack, and others. Data is viewed as a continuous stream in this kind of applications. Problems such as data mining which have been widely studied for traditional data sets cannot be easily solved for the data stream domain. This is because the large volume of data arriving in a stream renders most algorithms to inefficient as most mining algorithms require multiple scans of data which is unrealistic for stream data. More importantly, the characteristics of the data stream can change over time and the evolving pattern needs to be captured. Furthermore, we need to consider the problem of resource allocation in mining data streams. Due to the large volume and the high speed of streaming data, mining algorithms must cope with the effects of system overload. Thus, how to achieve optimum results under various resource constraints becomes a challenging task. In this talk, I’ll provide an overview, discuss the issues and focus on how to mine evolving data streams and perform resource adaptive computation.
- Keynotes | Pp. 1-1
doi: 10.1007/11408079_2
Purpose Based Access Control for Privacy Protection in Database Systems
Elisa Bertino
The development of privacy-preserving data management techniques has been the focus of intense research in the last few years. Such research has resulted in important notions and techniques, such as the notions of Hippocratic database systems and k-anonymity, and various privacy-preserving data mining techniques. However, much work still needs to be carried out to develop high assurance privacy-preserving database management systems. An important requirement in the development of such systems is the need of providing comprehensive and accurate privacy-related metadata, such as data usage purposes. Such metadata represent the core of access control mechanisms specifically tailored towards privacy. In this talk we address such issue. We present a comprehensive approach for privacy preserving access control based on the notion of purpose. Purpose information associated with a given data element specifies the intended use of the data element. Purpose information represents an important form of metadata, because data usage purpose is very often part of privacy policies, such as the case of policies expressed according to P3P. A key feature of our model is that it allows multiple purposes to be associated with each data element and it also supports explicit prohibitions, thus allowing privacy officers to specify that some data should not be used for certain purposes. Another important issue to be addressed is the granularity of data labeling, that is, the units of data with which purposes can be associated. We address this issue in the context of relational databases and propose four different labeling schemes, each providing a different granularity. In the paper we also propose an approach to representing purpose information, which results in very low storage overhead, and we exploit query modification techniques to support data access control based on purpose information. We conclude the talk by outlining future work that includes the application of our purpose management techniques to complex data and its integration into RBAC.
- Keynotes | Pp. 2-2
doi: 10.1007/11408079_3
Complex Networks and Network Data Mining
Deyi Li
We propose a new method for mapping important factors abstracted from a real complex network into the topology of nodes and links. By this method, the effect of a node is denoted with its computable quality, such as the city scale with traffic network, the node throughput of communication network, the hit rates of a web site, and the individual prestige of human relationship. By this method, the interaction between nodes is denoted by the distance or length of links, such as the geographic distance between two cities in the traffic network, the bandwidth between two communication nodes, the number of hyperlinks for a webpage, and the friendship intensity of human relationship. That is, topologically, two-factor operations with node and link are generally expanded to four-factor operations with node, link, distance, and quality. Using this four-factor method, we analyze networking data and simulate the optimization of web mining to form a mining engine by excluding those redundant and irrelevant nodes. The method can lead to the reduction of complicated messy web site structures to a new informative concise graph. In a prototype system for mining informative structure, several experiments for real networking data sets have shown encouraging results in both discovered knowledge and knowledge discovery rate.
- Keynotes | Pp. 3-3
doi: 10.1007/11408079_4
Indexing DNA Sequences Using q-Grams
Xia Cao; Shuai Cheng Li; Anthony K. H. Tung
We have observed in recent years a growing interest in similarity search on large collections of biological sequences. Contributing to the interest, this paper presents a method for indexing the DNA sequences efficiently based on -grams to facilitate similarity search in a DNA database and sidestep the need for linear scan of the entire database. Two level index – hash table and c-trees – are proposed based on the -grams of DNA sequences. The proposed data structures allow the quick detection of sequences within a certain distance to the query sequence. Experimental results show that our method is efficient in detecting similarity regions in a DNA sequence database with high sensitivity.
- Bioinformatics | Pp. 4-16
doi: 10.1007/11408079_5
PADS: Protein Structure Alignment Using Directional Shape Signatures
S. Alireza Aghili; Divyakant Agrawal; Amr El Abbadi
A novel data mining approach for similarity search and knowledge discovery in protein structure databases is proposed. PADS (rotein structure lignment by irectional shape ignatures) incorporates the three dimensional coordinates of the main atoms of each amino acid and extracts a geometrical shape signature along with the direction of each amino acid. As a result, each protein structure is presented by a series of multidimensional feature vectors representing local geometry, shape, direction, and biological properties of its amino acid molecules. Furthermore, a distance matrix is calculated and is incorporated into a local alignment dynamic programming algorithm to find the similar portions of two given protein structures followed by a sequence alignment step for more efficient filtration. The optimal superimposition of the detected similar regions is used to assess the quality of the results. The proposed algorithm is fast and accurate and hence could be used for analysis and knowledge discovery in large protein structures. The method has been compared with the results from CE, DALI, and CTSS using a representative sample of PDB structures. Several new structures not detected by other methods are detected.
- Bioinformatics | Pp. 17-29
doi: 10.1007/11408079_6
LinkageTracker: A Discriminative Pattern Tracking Approach to Linkage Disequilibrium Mapping
Li Lin; Limsoon Wong; Tzeyun Leong; Pohsan Lai
Linkage disequilibrium mapping is a process of inferring the disease gene location from observed associations of marker alleles in affected patients and normal controls. In reality, the presence of disease-associated chromosomes in affected population is relatively low (usually 10% or less). Hence, it is a challenge to locate these disease genes on the chromosomes. In this paper, we propose an algorithm known as LinkageTracker for linkage disequilibrium mapping. Comparing with some of the existing work, LinkageTracker is more robust and does not require any population ancestry information. Furthermore our algorithm is shown to find the disease locations more accurately than a closely related existing work, by reducing the average sum-square error by more than half (from 80.71 to 30.83) over one hundred trials. LinkageTracker was also applied to a real dataset of patients affected with haemophilia, and the disease gene locations found were consistent with several studies in genetic prediction.
- Bioinformatics | Pp. 30-42
doi: 10.1007/11408079_7
Query Optimization in Encrypted Database Systems
Hakan Hacıgümüş; Bala Iyer; Sharad Mehrotra
To ensure the privacy of data in the relational databases, prior work has given techniques to support data encryption and execute SQL queries over the encrypted data. However, the problem of how to put these techniques together in an optimum manner was not addressed, which is equivalent to having an RDBMS without a query optimizer. This paper models and solves that optimization problem.
- Watermarking and Encryption | Pp. 43-55
doi: 10.1007/11408079_8
Watermarking Spatial Trajectory Database
Xiaoming Jin; Zhihao Zhang; Jianmin Wang; Deyi Li
Protection of digital assets from piracy has received increasing interests where sensitive, valuable data need to be released. This paper addresses the problem of watermarking spatial trajectory database. The formal definition of the problem is given and the potential attacks are analyzed. Then a novel watermarking method is proposed, which embed the watermark information by introducing a small error to the trajectory shape rather than certain data values. Experimental results justify the usefulness of the proposed method, and give some empirical conclusions on the parameter settings.
- Watermarking and Encryption | Pp. 56-67
doi: 10.1007/11408079_9
Effective Approaches for Watermarking XML Data
Wilfred Ng; Ho-Lam Lau
Watermarking enables provable rights over content, which has been successfully applied in multimedia applications. However, it is not trivial to apply the known effective watermarking schemes to XML data, since noisy data may not be acceptable due to its structures and node extents. In this paper, we present two different watermarking schemes on XML data: the selective approach and the compression approach. The former allows us to embed non-destructive hidden information content over XML data. The latter takes verbosity and the need in updating XML data in real life into account. We conduct experiments on the efficiency and robustness of both approaches against different forms of attack, which shows that our proposed watermarking schemes are reasonably efficient and effective.
- Watermarking and Encryption | Pp. 68-80
doi: 10.1007/11408079_10
A Unifying Framework for Merging and Evaluating XML Information
Ho-Lam Lau; Wilfred Ng
With the ever increasing connection between XML information systems over the Web, users are able to obtain integrated sources of XML information in a cooperative manner, such as developing an XML mediator schema or using eXtensible Stylesheet Language Transformation (XSLT). However, it is not trivial to evaluate the quality of such merged XML data, even when we have the knowledge of the involved XML data sources. Herein, we present a unifying framework for merging XML data and study the quality issues of merged XML information. We capture the coverage of the object sources as well as the structural diversity of XML data objects, respectively, by the two metrics of Information Completeness (IC) and Data Complexity (DC) of the merged data.
- XML Query Processing | Pp. 81-94