Catálogo de publicaciones - libros

Compartir en
redes sociales


Knowledge Discovery in Databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings

Alípio Mário Jorge ; Luís Torgo ; Pavel Brazdil ; Rui Camacho ; João Gama (eds.)

En conferencia: 9º European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) . Porto, Portugal . October 3, 2005 - October 7, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29244-9

ISBN electrónico

978-3-540-31665-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

STochFS: A Framework for Combining Feature Selection Outcomes Through a Stochastic Process

Jerffeson Teixeira de Souza; Nathalie Japkowicz; Stan Matwin

The Feature Selection problem involves discovering a subset of features such that a classifier built only with this subset would have better predictive accuracy than a classifier built from the entire set of features. Ensemble methods, such as Bagging and Boosting, have been shown to increase the performance of classifiers to remarkable levels but surprisingly have not been tried in other parts of the classification process. In this paper, we apply the ensemble approach to feature selection by proposing a systematic way of combining various outcomes of a feature selection algorithm. The proposed framework, named STochFS, have been shown empirically to improve the performance of well-known feature selection algorithms.

Palabras clave: Feature Selection; Feature Subset; Ensemble Method; Deterministic Algorithm; Feature Selection Algorithm.

- Short Papers | Pp. 667-674

Speeding Up Logistic Model Tree Induction

Marc Sumner; Eibe Frank; Mark Hall

Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the tree. We address this issue by using the AIC criterion [1] instead of cross-validation to prevent overfitting these models. In addition, a weight trimming heuristic is used which produces a significant speedup. We compare the training time and accuracy of the new induction process with the original one on various datasets and show that the training time often decreases while the classification accuracy diminishes only slightly.

Palabras clave: Training Time; Training Instance; Linear Logistic Regression; Simple Linear Regression Model; Model Selection Method.

- Short Papers | Pp. 675-683

A Random Method for Quantifying Changing Distributions in Data Streams

Haixun Wang; Jian Pei

In applications such as fraud and intrusion detection, it is of great interest to measure the evolving trends in the data. We consider the problem of quantifying changes between two datasets with class labels. Traditionally, changes are often measured by first estimating the probability distributions of the given data, and then computing the distance, for instance, the K-L divergence, between the estimated distributions. However, this approach is computationally infeasible for large, high dimensional datasets. The problem becomes more challenging in the streaming data environment, as the high speed makes it difficult for the learning process to keep up with the concept drifts in the data. To tackle this problem, we propose a method to quantify concept drifts using a universal model that incurs minimal learning cost. In addition, our model also provides the ability of performing classification.

Palabras clave: Decision Tree; Random Forest; Data Stream; Leaf Node; Training Dataset.

- Short Papers | Pp. 684-691

Deriving Class Association Rules Based on Levelwise Subspace Clustering

Takashi Washio; Koutarou Nakanishi; Hiroshi Motoda

Most approaches of Class Association Rule (CAR) based classification have not intensively addressed the classification of instances including numeric attributes. In this paper, a levelwise subspace clustering method deriving hyper-rectangular clusters is proposed to efficiently provide quantitative, interpretative and accurate CARs.

Palabras clave: Association Rule; Numeric Attribute; Dense Cluster; Subspace Cluster; Categorical Item.

- Short Papers | Pp. 692-700

An Incremental Algorithm for Mining Generators Representation

Lijun Xu; Kanglin Xie

This paper presents an efficient algorithm for maintaining the generator representation in dynamic datasets. The generators representation is a kind of lossless, concise representation of the set of frequent itemsets. Furthermore, the algorithm utilizes a novel optimization based on generators borders for the first time in the literature. Generators borders are the borderline between frequent generators and other itemsets. New frequent generators can be generated through monitoring them. Experiments show that our algorithm is more efficient than previous solutions.

Palabras clave: Association Rule; Frequent Generator; Frequent Itemsets; Concise Representation; Support Threshold.

- Short Papers | Pp. 701-708

Hybrid Technique for Artificial Neural Network Architecture and Weight Optimization

Cleber Zanchettin; Teresa Bernarda Ludermir

This work presents a technique that integrates the heuristics tabu search, simulated annealing, genetic algorithms and backpropagation. This approach obtained promising results in the simultaneous optimization of the artificial neural network architecture and weights.

Palabras clave: Genetic Algorithm; Cost Function; Simulated Annealing; Tabu Search; Current Solution.

- Short Papers | Pp. 709-716