Catálogo de publicaciones - libros

Compartir en
redes sociales

Knowledge Discovery in Databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings

Alípio Mário Jorge ; Luís Torgo ; Pavel Brazdil ; Rui Camacho ; João Gama (eds.)

En conferencia: 9º European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) . Porto, Portugal . October 3, 2005 - October 7, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29244-9

ISBN electrónico

978-3-540-31665-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11564126_71

STochFS: A Framework for Combining Feature Selection Outcomes Through a Stochastic Process

Jerffeson Teixeira de Souza; Nathalie Japkowicz; Stan Matwin

The Feature Selection problem involves discovering a subset of features such that a classifier built only with this subset would have better predictive accuracy than a classifier built from the entire set of features. Ensemble methods, such as Bagging and Boosting, have been shown to increase the performance of classifiers to remarkable levels but surprisingly have not been tried in other parts of the classification process. In this paper, we apply the ensemble approach to feature selection by proposing a systematic way of combining various outcomes of a feature selection algorithm. The proposed framework, named STochFS, have been shown empirically to improve the performance of well-known feature selection algorithms.

Palabras clave: Feature Selection; Feature Subset; Ensemble Method; Deterministic Algorithm; Feature Selection Algorithm.

- Short Papers | Pp. 667-674

doi: 10.1007/11564126_72

Speeding Up Logistic Model Tree Induction

Marc Sumner; Eibe Frank; Mark Hall

Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the tree. We address this issue by using the AIC criterion [1] instead of cross-validation to prevent overfitting these models. In addition, a weight trimming heuristic is used which produces a significant speedup. We compare the training time and accuracy of the new induction process with the original one on various datasets and show that the training time often decreases while the classification accuracy diminishes only slightly.

Palabras clave: Training Time; Training Instance; Linear Logistic Regression; Simple Linear Regression Model; Model Selection Method.

- Short Papers | Pp. 675-683

doi: 10.1007/11564126_73

A Random Method for Quantifying Changing Distributions in Data Streams

Haixun Wang; Jian Pei

In applications such as fraud and intrusion detection, it is of great interest to measure the evolving trends in the data. We consider the problem of quantifying changes between two datasets with class labels. Traditionally, changes are often measured by first estimating the probability distributions of the given data, and then computing the distance, for instance, the K-L divergence, between the estimated distributions. However, this approach is computationally infeasible for large, high dimensional datasets. The problem becomes more challenging in the streaming data environment, as the high speed makes it difficult for the learning process to keep up with the concept drifts in the data. To tackle this problem, we propose a method to quantify concept drifts using a universal model that incurs minimal learning cost. In addition, our model also provides the ability of performing classification.

Palabras clave: Decision Tree; Random Forest; Data Stream; Leaf Node; Training Dataset.

- Short Papers | Pp. 684-691

doi: 10.1007/11564126_74

Deriving Class Association Rules Based on Levelwise Subspace Clustering

Takashi Washio; Koutarou Nakanishi; Hiroshi Motoda

Most approaches of Class Association Rule (CAR) based classification have not intensively addressed the classification of instances including numeric attributes. In this paper, a levelwise subspace clustering method deriving hyper-rectangular clusters is proposed to efficiently provide quantitative, interpretative and accurate CARs.

Palabras clave: Association Rule; Numeric Attribute; Dense Cluster; Subspace Cluster; Categorical Item.

- Short Papers | Pp. 692-700

doi: 10.1007/11564126_75

An Incremental Algorithm for Mining Generators Representation

Lijun Xu; Kanglin Xie

This paper presents an efficient algorithm for maintaining the generator representation in dynamic datasets. The generators representation is a kind of lossless, concise representation of the set of frequent itemsets. Furthermore, the algorithm utilizes a novel optimization based on generators borders for the first time in the literature. Generators borders are the borderline between frequent generators and other itemsets. New frequent generators can be generated through monitoring them. Experiments show that our algorithm is more efficient than previous solutions.

Palabras clave: Association Rule; Frequent Generator; Frequent Itemsets; Concise Representation; Support Threshold.

- Short Papers | Pp. 701-708

doi: 10.1007/11564126_76

Hybrid Technique for Artificial Neural Network Architecture and Weight Optimization

Cleber Zanchettin; Teresa Bernarda Ludermir

This work presents a technique that integrates the heuristics tabu search, simulated annealing, genetic algorithms and backpropagation. This approach obtained promising results in the simultaneous optimization of the artificial neural network architecture and weights.

Palabras clave: Genetic Algorithm; Cost Function; Simulated Annealing; Tabu Search; Current Solution.

- Short Papers | Pp. 709-716