Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings

Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)

En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-33206-0

ISBN electrónico

978-3-540-33207-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Protection or Privacy? Data Mining and Personal Data

David J. Hand

In order to run countries and economies effectively, governments and governmental institutions need to collect and analyse vast amounts of personal data. Similarly, health service providers, security services, transport planners, and education authorities need to know a great deal about their clients. And, of course, commercial operations run more efficiently and can meet the needs of their customers more effectively the more they know about them. In general then, the more data these organisation have, the better. On the other hand, the more private data which is collated and disseminated, the more individuals are at risk of crimes such as identity theft and financial fraud, not to mention the simple invasion of privacy that such data collection represents. Most work in data mining has concentrated on the positive aspects of extracting useful information from large data sets. But as the technology and its use advances so more awareness of the potential downside is needed. In this paper I look at some of these issues. I examine how data mining tools and techniques are being used by governments and commercial operations to gain insight into individual behaviour. And I look at the concerns that such advances are bringing.

- Keynote Speech | Pp. 1-10

The Changing Face of Web Search

Prabhakar Raghavan

Dr. Prabhakar Raghavan is an invited keynote speaker for PAKDD 2006. Web search has come to dominate our consciousness as a convenience we take for granted, as a medium for connecting advertisers and buyers, and as a fast-growing revenue source for the companies that provide this service. Following a brief overview of the state of the art and how we got there, this talk covers a spectrum of technical challenges arising in web search – ranging from spam detection to auction mechanisms.

- Keynote Speech | Pp. 11-11

Data Mining for Surveillance Applications

Bhavani M. Thuraisingham

Dr. Bhavani M. Thuraisingham is an invited speaker for PAKDD 2006. She is a Professor at the Eric Jonsson School of Engineering and Computer Science, University of Texas at Dallas. She is also director of the Cyber Security Research Center and President of Bhavani Security Consulting.

- Invited Speech | Pp. 12-14

A Multiclass Classification Method Based on Output Design

Qi Qiang; Qinming He

Output coding is a general framework for solving multiclass categorization problems. Some researchers have presented the notion of continuous codes and methods for designing output codes. However these methods are time-consuming and expensive. This paper describes a new framework, which we call Strong-to-Weak-to-Strong (SWS). We transform a “strong” learning algorithm to a “weak” algorithm by decreasing its iterative numbers of optimization while preserving its other characteristics like geometric properties and then make use of the kernel trick for “weak” algorithms to work in high dimensional spaces, finally improve the performances. An inspiring experimental results show that this approach is competitive with the other methods.

- Classification | Pp. 15-19

Regularized Semi-supervised Classification on Manifold

Lianwei Zhao; Siwei Luo; Yanchang Zhao; Lingzhi Liao; Zhihai Wang

Semi-supervised learning gets estimated marginal distribution with a large number of unlabeled examples and then constrains the conditional probability ( | ) with a few labeled examples. In this paper, we focus on a regularization approach for semi-supervised classification. The label information graph is first defined to keep the pairwise label relationship and can be incorporated with neighborhood graph which reflects the intrinsic geometry structure of . Then we propose a novel regularized semi-supervised classification algorithm, in which the regularization term is based on the modified Graph Laplacian. By redefining the Graph Laplacian, we can adjust and optimize the decision boundary using the labeled examples. The new algorithm combines the benefits of both unsupervised and supervised learning and can use unlabeled and labeled examples effectively. Encouraging experimental results are presented on both synthetic and real world datasets.

- Classification | Pp. 20-29

Similarity-Based Sparse Feature Extraction Using Local Manifold Learning

Cheong Hee Park

Feature extraction is an important preprocessing step which is encountered in many areas such as data mining, pattern recognition and scientific visualization. In this paper, a new method for sparse feature extraction using local manifold learning is proposed. Similarities in a neighborhood are first computed to explore local geometric structures, producing sparse feature representation. Based on the constructed similarity matrix, linear dimension reduction is applied to enhance similarities among the elements in the same class and extract optimal features for classification performances. Since it only computes similarities in a neighborhood, sparsity in the similarity matrix can give computational efficiency and memory savings. Experimental results demonstrate superior performances of the proposed method.

- Classification | Pp. 30-34

Generalized Conditional Entropy and a Metric Splitting Criterion for Decision Trees

Dan A. Simovici; Szymon Jaroszewicz

We examine a new approach to building decision tree by introducing a geometric splitting criterion, based on the properties of a family of metrics on the space of partitions of a finite set. This criterion can be adapted to the characteristics of the data sets and the needs of the users and yields decision trees that have smaller sizes and fewer leaves than the trees built with standard methods and have comparable or better accuracy.

- Classification | Pp. 35-44

RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification

Dae-Ki Kang; Adrian Silvescu; Vasant Honavar

Naive Bayes (NB) classifier relies on the assumption that the instances in each class can be described by a generative model. This assumption can be restrictive in many real world classification tasks. We describe RNBL-MN, which relaxes this assumption by constructing a tree of Naive Bayes classifiers for sequence classification, where each individual NB classifier in the tree is based on a multinomial event model (one for each class at each node in the tree). In our experiments on protein sequence and text classification tasks, we observe that RNBL-MN substantially outperforms NB classifier. Furthermore, our experiments show that RNBL-MN outperforms C4.5 decision tree learner (using tests on sequence composition statistics as the splitting criterion) and yields accuracies that are comparable to those of support vector machines (SVM) using similar information.

- Classification | Pp. 45-54

TRIPPER: Rule Learning Using Taxonomies

Flavian Vasile; Adrian Silvescu; Dae-Ki Kang; Vasant Honavar

In many application domains, there is a need for learning algorithms that generate accurate as well as comprehensible classifiers. In this paper, we present TRIPPER – a rule induction algorithm that extends RIPPER, a widely used rule-learning algorithm. TRIPPER exploits knowledge in the form of taxonomies over the values of features used to describe data. We compare the performance of TRIPPER with that of RIPPER on benchmark datasets from the Reuters 21578 corpus using WordNet (a human-generated taxonomy) to guide rule induction by TRIPPER. Our experiments show that the rules generated by TRIPPER are generally more comprehensible and compact and in the large majority of cases at least as accurate as those generated by RIPPER. 

- Classification | Pp. 55-59

Using Weighted Nearest Neighbor to Benefit from Unlabeled Data

Kurt Driessens; Peter Reutemann; Bernhard Pfahringer; Claire Leschi

The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are “pre-labeled” by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this pre-labeled data, the nearest neighbor classifier consistently improves on the original classifier.

- Classification | Pp. 60-69