Catálogo de publicaciones - libros

Compartir en
redes sociales

Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings

Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)

En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-33206-0

ISBN electrónico

978-3-540-33207-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11731139_61

Scoring Method for Tumor Prediction from Microarray Data Using an Evolutionary Fuzzy Classifier

Shinn-Ying Ho; Chih-Hung Hsieh; Kuan-Wei Chen; Hui-Ling Huang; Hung-Ming Chen; Shinn-Jang Ho

In this paper, we propose a novel scoring method for tumor prediction using an evolutionary fuzzy classifier which can provide accurate and interpretable information. The merits of the proposed method are threefold. 1) The score ranged in [0, 100] can further illustrate the degree of tumor status in contrast to the conventional tumor classifier. 2) The derived score system can be used as a tumor classifier using a system-suggested or human-specified threshold value. 3) The derived classifier with a compact fuzzy rule base can generate an interpretable and accurate prediction result. The effectiveness of the proposed method is evaluated and compared using two well-known datasets from microarray data and an existing tumor classifier. It is shown by computer simulation that the proposed scoring method is effective using ROC curves of classification.

- Bio-data Mining | Pp. 520-529

doi: 10.1007/11731139_62

Efficient Discovery of Structural Motifs from Protein Sequences with Combination of Flexible Intra- and Inter-block Gap Constraints

Chen-Ming Hsu; Chien-Yu Chen; Ching-Chi Hsu; Baw-Jhiune Liu

Discovering protein structural signatures directly from their primary information is a challenging task, because the residues associated with a functional motif are not necessarily clustered in one region of the sequence. This work proposes an algorithm that aims to discover conserved sequential blocks interleaved by large irregular gaps from a set of unaligned biological sequences. Different from the previous works that employ only one type of constraint on gap flexibility, we propose using combination of intra- and inter-block gap constraints to discover longer patterns with larger irregular gaps. The smaller flexible intra-block gap constraint is used to relax the restriction in local motif blocks but still keep them compact, and the larger flexible inter-block gap constraint is proposed to allow longer irregular gaps between compact motif blocks. Using two types of gap constraints for different purposes improves the efficiency of mining process while keeping high accuracy of mining results. The efficiency of the algorithm also helps to identify functional motifs that are conserved in only a small subset of the input sequences.

- Bio-data Mining | Pp. 530-539

doi: 10.1007/11731139_63

Finding Consensus Patterns in Very Scarce Biosequence Samples from Their Minimal Multiple Generalizations

Yen Kaow Ng; Takeshi Shinohara

In this paper we examine the issues involved in finding consensus patterns from biosequence data of very small sample sizes, by searching for so-called , that is, a set of patterns that accounts for all the samples. The data we use are the with more conserved consensus patterns for the bacteria . By comparing between the mmgs found over different search spaces, we found that it is possible to derive patterns close to the known consensus patterns by simply making some reasonable requirements on the kinds of patterns to obtain. We also propose some simple measures to evaluate the patterns in an mmg.

- Bio-data Mining | Pp. 540-545

doi: 10.1007/11731139_64

Kernels on Lists and Sets over Relational Algebra: An Application to Classification of Protein Fingerprints

Adam Woźnica; Alexandros Kalousis; Melanie Hilario

In this paper we propose a new class of kernels defined over extended relational algebra structures. The “extension” was recently proposed in [1] and it overcomes one of the main limitation of the standard relational algebra, i.e. difficulties in modeling lists. These new kernels belong to the class of -Convolution kernels in the sense that the computation of the similarity between two complex objects is based on the similarities of objects’ parts computed by means of subkernels. The complex objects (relational instances in our case) are tuples and sets and/or lists of relational instances for which elementary kernels and kernels on sets and lists are applied. The performance of this class of kernels together with the Support Vector Machines (SVM) algorithm is evaluated on the problem of classification of protein fingerprints and by combining different data representations we were able to improve the best accuracy reported so far in the literature.

- Bio-data Mining | Pp. 546-551

doi: 10.1007/11731139_65

Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results

Yaochun Huang; Hui Xiong; Weili Wu; Sam Y. Sung

Hyperclique patterns are groups of objects which are strongly related to each other. Indeed, the objects in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by uncentered Pearson’s correlation coefficient. Recent literature has provided the approach to discovering hyperclique patterns over data sets with binary attributes. In this paper, we introduce algorithms for mining maximal hyperclique patterns in large data sets containing quantitative attributes. An intuitive and simple solution is to partition quantitative attributes into binary attributes. However, there is potential information loss due to partitioning. Instead, our approach is based on a normalization scheme and can directly work on quantitative attributes. In addition, we adopt the algorithm structures of three popular association pattern mining algorithms and add a critical clique pruning technique. Finally, we compare the performance of these algorithms for finding quantitative maximal hyperclique patterns using some real-world data sets.

- Bio-data Mining | Pp. 552-556

doi: 10.1007/11731139_66

A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data

Hongqin Fan; Osmar R. Zaïane; Andrew Foss; Junfeng Wu

We present a novel resolution-based outlier notion and a nonparametric outlier-mining algorithm, which can efficiently identify top listed outliers from a wide variety of datasets. The algorithm generates reasonable outlier results by taking both local and global features of a dataset into consideration. Experiments are conducted using both synthetic datasets and a real life construction equipment dataset from a large building contractor. Comparison with the current outlier mining algorithms indicates that the proposed algorithm is more effective.

- Outlier and Intrusion Detection | Pp. 557-566

doi: 10.1007/11731139_67

A Fast Greedy Algorithm for Outlier Mining

Zengyou He; Shengchun Deng; Xiaofei Xu; Joshua Zhexue Huang

The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Recently, the problem of outlier detection in categorical data is defined as an optimization problem and a local-search heuristic based algorithm (LSA) is presented. However, as is the case with most iterative type algorithms, the LSA algorithm is still very time-consuming on very large datasets. In this paper, we present a very fast greedy algorithm for mining outliers under the same optimization model. Experimental results on real datasets and large synthetic datasets show that: (1) Our new algorithm has comparable performance with respect to those state-of-the-art outlier detection algorithms on identifying true outliers and (2) Our algorithm can be an order of magnitude faster than LSA algorithm.

- Outlier and Intrusion Detection | Pp. 567-576

doi: 10.1007/11731139_68

Ranking Outliers Using Symmetric Neighborhood Relationship

Wen Jin; Anthony K. H. Tung; Jiawei Han; Wei Wang

Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining , i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its -nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top- outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

- Outlier and Intrusion Detection | Pp. 577-593

doi: 10.1007/11731139_69

Construction of Finite Automata for Intrusion Detection from System Call Sequences by Genetic Algorithms

Kyubum Wee; Sinjae Kim

Intrusion detection systems protect normal users and system resources from information security threats. Anomaly detection is an approach of intrusion detection that constructs models of normal behavior of users or systems and detects the behaviors that deviate from the model. Monitoring the sequences of system calls generated during the execution of privileged programs has been known to be an effective means of anomaly detection. Finite automata have been recognized as an appropriate device to model normal behaviors of system call sequences. However, there have been several technical difficulties in constructing finite automata from sequences of system calls. We present our study on how to construct finite automata from system call sequences using genetic algorithms. The resulting system is shown to be very effective in detecting intrusions through various experiments.

- Outlier and Intrusion Detection | Pp. 594-602

doi: 10.1007/11731139_70

An Adaptive Intrusion Detection Algorithm Based on Clustering and Kernel-Method

Hansung Lee; Yongwha Chung; Daihee Park

An adaptive intrusion detection algorithm which combines the Adaptive Resonance Theory(ART) with the Concept Vector and the Mecer-Kernel is presented. Compared to the supervised- and the clustering-based Intrusion Detection Systems(IDSs), our algorithm can detect unknown types of intrusions in on-line by generating clusters incrementally.

- Outlier and Intrusion Detection | Pp. 603-610