Catálogo de publicaciones - libros

Compartir en
redes sociales


Machine Learning: ECML 2007: 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007. Proceedings

Joost N. Kok ; Jacek Koronacki ; Raomon Lopez de Mantaras ; Stan Matwin ; Dunja Mladenič ; Andrzej Skowron (eds.)

En conferencia: 18º European Conference on Machine Learning (ECML) . Warsaw, Poland . September 17, 2007 - September 21, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Algorithm Analysis and Problem Complexity; Mathematical Logic and Formal Languages; Database Management

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74957-8

ISBN electrónico

978-3-540-74958-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Additive Groves of Regression Trees

Daria Sorokina; Rich Caruana; Mirek Riedewald

We present a new regression algorithm called Groves of trees and show empirically that it is superior in performance to a number of other established regression methods. A Grove is an additive model usually containing a small number of large trees. Trees added to the Grove are trained on the residual error of other trees already in the Grove. We begin the training process with a single small tree in the Grove and gradually increase both the number of trees in the Grove and their size. This procedure ensures that the resulting model captures the additive structure of the response. A single Grove may still overfit to the training set, so we further decrease the variance of the final predictions with bagging. We show that in addition to exhibiting superior performance on a suite of regression test problems, bagged Groves of trees are very resistant to overfitting.

- Long Papers | Pp. 323-334

Efficient Computation of Recursive Principal Component Analysis for Structured Input

Alessandro Sperduti

Recently, a successful extension of Principal Component Analysis for structured input, such as sequences, trees, and graphs, has been proposed. This allows the embedding of discrete structures into vectorial spaces, where all the classical pattern recognition and machine learning methods can be applied. The proposed approach is based on eigenanalysis of extended vectorial representations of the input structures and substructures. One problem with the approach is that eigenanalysis can be computationally quite demanding when considering large datasets of structured objects. In this paper we propose a general approach for reducing the computational burden. Experimental results show a significant speed-up of the computation.

- Long Papers | Pp. 335-346

Hinge Rank Loss and the Area Under the ROC Curve

Harald Steck

In ranking as well as in classification problems, the Area under the ROC Curve (AUC), or the equivalent Wilcoxon-Mann-Whitney statistic, has recently attracted a lot of attention. We show that the AUC can be lower bounded based on the hinge-rank-loss, which simply is the rank-version of the standard (parametric) hinge loss. This bound is asymptotically tight. Our experiments indicate that optimizing the (standard) hinge loss typically is an accurate approximation to optimizing the hinge rank loss, especially when using affine transformations of the data, like e.g. in ellipsoidal machines. This explains for the first time why standard training of support vector machines approximately maximizes the AUC, which has indeed been observed in many experiments in the literature.

- Long Papers | Pp. 347-358

Clustering Trees with Instance Level Constraints

Jan Struyf; Sašo Džeroski

Constrained clustering investigates how to incorporate domain knowledge in the clustering process. The domain knowledge takes the form of constraints that must hold on the set of clusters. We consider instance level constraints, such as must-link and cannot-link. This type of constraints has been successfully used in popular clustering algorithms, such as -means and hierarchical agglomerative clustering. This paper shows how clustering trees can support instance level constraints. Clustering trees are decision trees that partition the instances into homogeneous clusters. Clustering trees provide a symbolic description for each cluster. To handle non-trivial constraint sets, we extend clustering trees to support disjunctive descriptions. The paper’s main contribution is ClusILC, an efficient algorithm for building such trees. We present experiments comparing ClusILC to COP--means.

- Long Papers | Pp. 359-370

On Pairwise Naive Bayes Classifiers

Jan-Nikolas Sulzmann; Johannes Fürnkranz; Eyke Hüllermeier

Class binarizations are effective methods for improving weak learners by decomposing multi-class problems into several two-class problems. This paper analyzes how these methods can be applied to a Naive Bayes learner. The key result is that the pairwise variant of Naive Bayes is equivalent to a regular Naive Bayes. This result holds for several aggregation techniques for combining the predictions of the individual classifiers, including the commonly used voting and weighted voting techniques. On the other hand, Naive Bayes with one-against-all binarization is not equivalent to a regular Naive Bayes. Apart from the theoretical results themselves, the paper offers a discussion of their implications.

- Long Papers | Pp. 371-381

Separating Precision and Mean in Dirichlet-Enhanced High-Order Markov Models

Rikiya Takahashi

Robustly estimating the state-transition probabilities of high-order Markov processes is an essential task in many applications such as natural language modeling or protein sequence modeling. We propose a novel estimation algorithm called Hierarchical Separated Dirichlet Smoothing (HSDS), where Dirichlet distributions are hierarchically assumed to be the prior distributions of the state-transition probabilities. The key idea in HSDS is to the parameters of a Dirichlet distribution into the precision and mean, so that the precision depends on the context while the mean is given by the lower-order distribution. HSDS is designed to outperform Kneser-Ney smoothing especially when the number of states is small, where Kneser-Ney smoothing is currently known as the state-of-the-art technique for -gram natural language models. Our experiments in protein sequence modeling showed the superiority of HSDS both in perplexity evaluation and classification tasks.

- Long Papers | Pp. 382-393

Safe Q-Learning on Complete History Spaces

Stephan Timmer; Martin Riedmiller

In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies comparable to the optimal solution on belief states. The algorithm presented is model-free and can be combined with any method learning history spaces. We also present a procedure able to learn history spaces especially suited for our Q-learning algorithm.

- Long Papers | Pp. 394-405

Random -Labelsets: An Ensemble Method for Multilabel Classification

Grigorios Tsoumakas; Ioannis Vlahavas

This paper proposes an ensemble method for multilabel classification. The RAndom -labELsets (RAKEL) algorithm constructs each member of the ensemble by considering a small random subset of labels and learning a single-label classifier for the prediction of each element in the powerset of this subset. In this way, the proposed algorithm aims to take into account label correlations using single-label classifiers that are applied on subtasks with manageable number of labels and adequate number of examples per label. Experimental results on common multilabel domains involving protein, document and scene classification show that better performance can be achieved compared to popular multilabel classification approaches.

- Long Papers | Pp. 406-417

Seeing the Forest Through the Trees: Learning a Comprehensible Model from an Ensemble

Anneleen Van Assche; Hendrik Blockeel

Ensemble methods are popular learning methods that usually increase the predictive accuracy of a classifier though at the cost of interpretability and insight in the decision process. In this paper we aim to overcome this issue of comprehensibility by learning a single decision tree that approximates an ensemble of decision trees. The new model is obtained by exploiting the class distributions predicted by the ensemble. These are employed to compute heuristics for deciding which tests are to be used in the new tree. As such we acquire a model that is able to give insight in the decision process, while being more accurate than the single model directly learned on the data. The proposed method is experimentally evaluated on a large number of UCI data sets, and compared to an existing approach that makes use of artificially generated data.

- Long Papers | Pp. 418-429

Avoiding Boosting Overfitting by Removing Confusing Samples

Alexander Vezhnevets; Olga Barinova

Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that any overlapping classes’ task can be reduced to a deterministic task with the same Bayesian separating surface. This can be done by removing “confusing samples” – samples that are misclassified by a “perfect” Bayesian classifier. We propose an algorithm for removing confusing samples and experimentally study behavior of AdaBoost trained on the resulting data sets. Experiments confirm that removing confusing samples helps boosting to reduce the generalization error and to avoid overfitting on both synthetic and real world. Process of removing confusing samples also provides an accurate error prediction based on the work with the training sets.

- Long Papers | Pp. 430-441