Catálogo de publicaciones - libros

Compartir en
redes sociales

Machine Learning: ECML 2007: 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007. Proceedings

Joost N. Kok ; Jacek Koronacki ; Raomon Lopez de Mantaras ; Stan Matwin ; Dunja Mladenič ; Andrzej Skowron (eds.)

En conferencia: 18º European Conference on Machine Learning (ECML) . Warsaw, Poland . September 17, 2007 - September 21, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Algorithm Analysis and Problem Complexity; Mathematical Logic and Formal Languages; Database Management

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74957-8

ISBN electrónico

978-3-540-74958-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-74958-5_61

Ensembles of Multi-Objective Decision Trees

Dragi Kocev; Celine Vens; Jan Struyf; Sašo Džeroski

Ensemble methods are able to improve the predictive performance of many base classifiers. Up till now, they have been applied to classifiers that predict a single target attribute. Given the non-trivial interactions that may occur among the different targets in multi-objective prediction tasks, it is unclear whether ensemble methods also improve the performance in this setting. In this paper, we consider two ensemble learning techniques, bagging and random forests, and apply them to multi-objective decision trees (MODTs), which are decision trees that predict multiple target attributes at once. We empirically investigate the performance of ensembles of MODTs. Our most important conclusions are: (1) ensembles of MODTs yield better predictive performance than MODTs, and (2) ensembles of MODTs are equally good, or better than ensembles of single-objective decision trees, i.e., a set of ensembles for each target. Moreover, ensembles of MODTs have smaller model size and are faster to learn than ensembles of single-objective decision trees.

- Short Papers | Pp. 624-631

doi: 10.1007/978-3-540-74958-5_62

Kernel-Based Grouping of Histogram Data

Tilman Lange; Joachim M. Buhmann

Organizing objects into groups based on their co-occurrence with a second, relevance variable has been widely studied with the Information Bottleneck (IB) as one of the most prominent representatives. We present a kernel-based approach to pairwise clustering of discrete histograms using the Jensen-Shannon (JS) divergence, which can be seen as a test. This yields a cost criterion with a solid information-theoretic justification, which can be approximated in polynomial time with arbitrary precision. In addition to that, a relation to optimal hard clustering IB solutions can be established. To our knowledge, we are the first to devise algorithms for the IB with provable approximation guaranties. In practice, one obtains convincing results in the context of image segmentation using fast optimization heuristics.

- Short Papers | Pp. 632-639

doi: 10.1007/978-3-540-74958-5_63

Active Class Selection

R. Lomasky; C. E. Brodley; M. Aernecke; D. Walt; M. Friedl

This paper presents (ACS), a new class of problems for multi-class supervised learning. If one can control the classes from which training data is generated, utilizing feedback during learning to guide the generation of new training data will yield better performance than learning from any fixed class distribution. ACS is the process of iteratively selecting class proportions for data generation. In this paper we present several methods for ACS. In an empirical evaluation, we show that for a fixed number of training instances, methods based on increasing class stability outperform methods that seek to maximize class accuracy or that use random sampling. Finally we present results of a deployed system for our motivating application: training an artificial nose to discriminate vapors.

- Short Papers | Pp. 640-647

doi: 10.1007/978-3-540-74958-5_64

Sequence Labeling with Reinforcement Learning and Ranking Algorithms

Francis Maes; Ludovic Denoyer; Patrick Gallinari

Many problems in areas such as Natural Language Processing, Information Retrieval, or Bioinformatic involve the generic task of sequence labeling. In many cases, the aim is to assign a label to each element in a sequence. Until now, this problem has mainly been addressed with Markov models and Dynamic Programming.

We propose a new approach where the sequence labeling task is seen as a sequential decision process. This method is shown to be very fast with good generalization accuracy. Instead of searching for a globally optimal label sequence, we learn to construct this optimal sequence directly in a greedy fashion. First, we show that sequence labeling can be modelled using Markov Decision Processes, so that several Reinforcement Learning (RL) algorithms can be used for this task. Second, we introduce a new RL algorithm which is based on the ranking of local labeling decisions.

- Short Papers | Pp. 648-657

doi: 10.1007/978-3-540-74958-5_65

Efficient Pairwise Classification

Sang-Hyeun Park; Johannes Fürnkranz

Pairwise classification is a class binarization procedure that converts a multi-class problem into a series of two-class problems, one problem for each pair of classes. While it can be shown that for training, this procedure is more efficient than the more commonly used one-against-all approach, it still has to evaluate a quadratic number of classifiers when computing the predicted class for a given example. In this paper, we propose a method that allows a faster computation of the predicted class when weighted or unweighted voting are used for combining the predictions of the individual classifiers. While its worst-case complexity is still quadratic in the number of classes, we show that even in the case of completely random base classifiers, our method still outperforms the conventional pairwise classifier. For the more practical case of well-trained base classifiers, its asymptotic computational complexity seems to be almost linear.

- Short Papers | Pp. 658-665

doi: 10.1007/978-3-540-74958-5_66

Scale-Space Based Weak Regressors for Boosting

Jin-Hyeong Park; Chandan K. Reddy

Boosting is a simple yet powerful modeling technique that is used in many machine learning and data mining related applications. In this paper, we propose a novel scale-space based boosting framework which applies scale-space theory for choosing the optimal regressors during the various iterations of the boosting algorithm. In other words, the data is considered at different resolutions for each iteration in the boosting algorithm. Our framework chooses the weak regressors for the boosting algorithm that can best fit the current resolution and as the iterations progress, the resolution of the data is increased. The amount of increase in the resolution follows from the wavelet decomposition methods. For regression modeling, we use logitboost update equations based on first derivative of the loss function. We clearly manifest the advantages of using this scale-space based framework for regression problems and show results on different real-world regression datasets.

- Short Papers | Pp. 666-673

doi: 10.1007/978-3-540-74958-5_67

-Means with Large and Noisy Constraint Sets

Dan Pelleg; Dorit Baras

We focus on the problem of clustering with soft instance-level constraints. Recently, the CVQE algorithm was proposed in this context. It modifies the objective function of traditional -means to include penalties for violated constraints. CVQE was shown to efficiently produce high-quality clustering of UCI data. In this work, we examine the properties of CVQE and propose a modification that results in a more intuitive objective function, with lower computational complexity. We present our extensive experimentation, which provides insight into CVQE and shows that our new variant can dramatically improve clustering quality while reducing run time. We show its superiority in a large-scale surveillance scenario with noisy constraints.

- Short Papers | Pp. 674-682

doi: 10.1007/978-3-540-74958-5_68

Towards ‘Interactive’ Active Learning in Multi-view Feature Sets for Information Extraction

Katharina Probst; Rayid Ghani

Research in multi-view active learning has typically focused on algorithms for selecting the next example to label. This is often at the cost of lengthy wait-times for the user between each query iteration. We deal with a real-world information extraction task, extracting attribute-value pairs from product descriptions, where the learning system needs to be interactive and the user’s time needs to be used efficiently. The first step uses coEM with naive Bayes as the semi-supervised algorithm. This paper focuses on the second step which is an interactive active learning phase. We present an approximation to coEM with naive Bayes that can incorporate user feedback almost instantly and can use any sample-selection strategy for active learning. Our experimental results show high levels of accuracy while being orders of magnitude faster than using the standard coEM with naive Bayes, making our IE system practical by optimizing user time.

- Short Papers | Pp. 683-690

doi: 10.1007/978-3-540-74958-5_69

Principal Component Analysis for Large Scale Problems with Lots of Missing Values

Tapani Raiko; Alexander Ilin; Juha Karhunen

Principal component analysis (PCA) is a well-known classical data analysis technique. There are a number of algorithms for solving the problem, some scaling better than others to problems with high dimensionality. They also differ in their ability to handle missing values in the data. We study a case where the data are high-dimensional and a majority of the values are missing. In case of very sparse data, overfitting becomes a severe problem even in simple linear models such as PCA. We propose an algorithm based on speeding up a simple principal subspace rule, and extend it to use regularization and variational Bayesian (VB) learning. The experiments with Netflix data confirm that the proposed algorithm is much faster than any of the compared methods, and that VB-PCA method provides more accurate predictions for new data than traditional PCA or regularized PCA.

- Short Papers | Pp. 691-698

doi: 10.1007/978-3-540-74958-5_70

Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling

Jan Ramon; Kurt Driessens; Tom Croonenborghs

We investigate the relation between transfer learning in reinforcement learning with function approximation and supervised learning with concept drift. We present a new incremental relational regression tree algorithm that is capable of dealing with concept drift through tree restructuring and show that it enables a Q-learner to transfer knowledge from one task to another by recycling those parts of the generalized Q-function that still hold interesting information for the new task. We illustrate the performance of the algorithm in several experiments.

- Short Papers | Pp. 699-707