Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings

Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)

En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-33206-0

ISBN electrónico

978-3-540-33207-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Weighted Intra-transactional Rule Mining for Database Intrusion Detection

Abhinav Srivastava; Shamik Sural; A. K. Majumdar

Data mining is the non-trivial process of identifying novel, potentially useful and understandable patterns in data. With most of the organizations starting on-line operations, the threat of security breaches is increasing. Since a database stores a lot of valuable information, its security has become paramount. One mechanism to safeguard the information in these databases is to use an intrusion detection system(IDS). In every database, there are a few attributes or columns that are more important to be tracked or sensed for malicious modifications as compared to the other attributes. In this paper, we propose an intrusion detection algorithm named weighted data dependency rule miner (WDDRM) for finding dependencies among the data items. The transactions that do not follow the extracted data dependency rules are marked as malicious. We show that handles the modification of sensitive attributes quite accurately.

- Outlier and Intrusion Detection | Pp. 611-620

On Robust and Effective K-Anonymity in Large Databases

Wen Jin; Rong Ge; Weining Qian

The challenge of privacy-preserving data mining lies in respecting privacy requirements while discovering the original interesting patterns or structures. Existing methods loose the correlations among attributes by transforming the different attributes independently, or cannot guarantee the minimum abstraction level required by legal policies. In this paper, we propose a novel privacy-preserving transformation framework for distance-based mining operations based on the concept of privacy-preserving MicroClusters that satisfy a privacy constraint as well as a significance constraint. Our framework well extends the robustness of the state-of-the-art -anonymity model by introducing a privacy constraint (minimum radius) while keeping its effectiveness by a significance constraint (minimum number of corresponding data records). The privacy-preserving MicroClusters are made public for data mining purposes, but the original data records are kept private. We present efficient methods for generating and maintaining privacy-preserving MicroClusters and show that data mining operations such as clustering can easily be adapted to the public data represented by MicroClusters instead of the private data records. The experiment demonstrates that the proposed methods achieve accurate clusterings results while preserving the privacy.

- Privacy | Pp. 621-636

Achieving Private Recommendations Using Randomized Response Techniques

Huseyin Polat; Wenliang Du

Collaborative filtering (CF) systems are receiving increasing attention. Data collected from users is needed for CF; however, many users do not feel comfortable to disclose data due to privacy risks. They sometimes refuse to provide information or might decide to give false data. By introducing privacy measures, it is more likely to increase users’ confidence to contribute their data and to provide more truthful data. In this paper, we investigate achieving referrals using item-based algorithms on binary ratings without greatly exposing users’ privacy. We propose to use randomized response techniques (RRT) to perturb users’ data. We conduct experiments to evaluate the accuracy of our scheme and to show how different parameters affect our results using real data sets.

- Privacy | Pp. 637-646

Privacy-Preserving SVM Classification on Vertically Partitioned Data

Hwanjo Yu; Jaideep Vaidya; Xiaoqian Jiang

Classical data mining algorithms implicitly assume complete access to all data, either in centralized or federated form. However, privacy and security concerns often prevent sharing of data, thus derailing data mining projects. Recently, there has been growing focus on finding solutions to this problem. Several algorithms have been proposed that do distributed knowledge discovery, while providing guarantees on the non-disclosure of data. Classification is an important data mining problem applicable in many diverse domains. The goal of classification is to build a model which can predict an attribute (binary attribute in this work) based on the rest of attributes. We propose an efficient and secure privacy-preserving algorithm for support vector machine (SVM) classification over vertically partitioned data.

- Privacy | Pp. 647-656

Data Mining Using Relational Database Management Systems

Beibei Zou; Xuesong Ma; Bettina Kemme; Glen Newton; Doina Precup

Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time.

- Relational Database | Pp. 657-667

Bias-Free Hypothesis Evaluation in Multirelational Domains

Christine Körner; Stefan Wrobel

In propositional domains using a separate test set via random sampling or cross validation is generally considered to be an unbiased estimator of true error. In multirelational domains previous work has already noted that linkage of objects may cause these procedures to be biased and has proposed corrected sampling procedures. However, as we show in this paper, the existing procedures only address one particular case of bias introduced by linkage. In this paper we therefore introduce , a sampling procedure based on bin packing, which ensures that test sets are properly chosen to match the probability of reencountering previously seen objects and which includes previous approaches as a special case. Experiments with data from the Internet Movie Database illustrate the performance of our algorithm.

- Relational Database | Pp. 668-672

Enhanced DB-Subdue: Supporting Subtle Aspects of Graph Mining Using a Relational Approach

Ramanathan Balachandran; Srihari Padmanabhan; Sharma Chakravarthy

This paper addresses subtle aspects of graph mining using an SQL-based approach. The enhancements addressed in this paper include detection of cycles, effect of overlapping substructures on compression, and development of a minimum description length for the relational approach. Extensive performance evaluation has been conducted to evaluate the extensions.

- Relational Database | Pp. 673-678

Multimedia Semantics Integration Using Linguistic Model

Bo Yang; Ali R. Hurson

The integration of multimedia semantics is challenging due to the feature-based representation of multimedia data and the heterogeneity among data sources. From human viewpoint, multimedia data objects are often considered as perceptions of the real world, and therefore can be represented at a semantic-entity level in the linguistic domain. This paper proposes a paradigm that facilitates the integration of multimedia semantics in heterogeneous distributed database environments with the help of linguistic analysis. Specifically, we derive a closed set of logic-based form expressions for the efficient computation of multimedia semantic contents, which include conceptual attributes and linguistic relationships into the consideration. In the expression set, the logic terms give a convenient way to describe semantic contents concisely and precisely, providing a representation of multimedia data that is closer to human perception. The space utilization is also improved through the collective representation of similar semantic contents and feature values. In addition, the optimization can be easily performed on logic expressions using mathematical analysis. By replacing long terms with equivalent terms of shorter lengths, the image representation can be automatically optimized. Using a heterogeneous database infrastructure, the proposed method has been simulated and analyzed.

- Multimedia Mining | Pp. 679-688

A Novel Indexing Approach for Efficient and Fast Similarity Search of Captured Motions

Chuanjun Li; B. Prabhakaran

Indexing of motion data is important for quickly searching similar motions for sign language recognition and gait analysis and rehabilitation. This paper proposes a simple and efficient tree structure for indexing motion data with dozens of attributes. Feature vectors are extracted for indexing by using singular value decomposition (SVD) properties of motion data matrices. By having similar motions with large variations indexed together, searching for similar motions of a query needs only one node traversal at each tree level, and only one feature needs to be considered at one tree level. Experiments show that the majority of irrelevant motions can be pruned while retrieving all similar motions, and one traversal of the indexing tree takes only several microseconds with the existence of motion variations.

- Multimedia Mining | Pp. 689-698

Mining Frequent Spatial Patterns in Image Databases

Wei-Ta Chen; Yi-Ling Chen; Ming-Syan Chen

Mining useful patterns in image databases can not only reveal useful information to users but also help the task of data management. In this paper, we propose an image mining framework, Frequent Spatial Pattern mining in images (FSP), to mine frequent patterns located in a pair of spatial locations of images. A pattern in the FSP is associated with a pair of spatial locations and refers to the occurrence of the same image content in a set of images. This framework is designed to be general so as to accept different levels of representations of image content and different layout forms of spatial representations.

- Multimedia Mining | Pp. 699-703