Catálogo de publicaciones - libros

Compartir en
redes sociales


Knowledge Discovery in Databases: PKDD 2007: 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007. Proceedings

Joost N. Kok ; Jacek Koronacki ; Ramon Lopez de Mantaras ; Stan Matwin ; Dunja Mladenič ; Andrzej Skowron (eds.)

En conferencia: 11º European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) . Warsaw, Poland . September 17, 2007 - September 21, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74975-2

ISBN electrónico

978-3-540-74976-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Bridged Refinement for Transfer Learning

Dikan Xing; Wenyuan Dai; Gui-Rong Xue; Yong Yu

There is usually an assumption in traditional machine learning that the training and test data are governed by the same distribution. This assumption might be violated when the training and test data come from different time periods or domains. In such situations, traditional machine learning methods not aware of the shift of distribution may fail. This paper proposes a novel algorithm, namely , to take the shift into consideration. The algorithm corrects the labels predicted by a shift-unaware classifier towards a target distribution and takes the mixture distribution of the training and test data as a bridge to better transfer from the training data to the test data. In the experiments, our algorithm successfully refines the classification labels predicted by three state-of-the-art algorithms: the Support Vector Machine, the naïve Bayes classifier and the Transductive Support Vector Machine on eleven data sets. The relative reduction of error rates is about 50% in average.

- Long Papers | Pp. 324-335

A Prediction-Based Visual Approach for Cluster Exploration and Cluster Validation by HOV

Ke-Bing Zhang; Mehmet A. Orgun; Kang Zhang

Predictive knowledge discovery is an important knowledge acquisition method. It is also used in the clustering process of data mining. Visualization is very helpful for high dimensional data analysis, but not precise and this limits its usability in quantitative cluster analysis. In this paper, we adopt a visual technique called HOV to explore and verify clustering results with quantified measurements. With the quantified contrast between grouped data distributions produced by HOV, users can detect clusters and verify their validity efficiently.

- Long Papers | Pp. 336-349

Flexible Grid-Based Clustering

Marc-Ismaël Akodjènou-Jeannin; Kavé Salamatian; Patrick Gallinari

Grid-based clustering is particularly appropriate to deal with massive datasets. The principle is to first summarize the dataset with a grid representation, and then to merge grid cells in order to obtain clusters. All previous methods use grids with hyper-rectangular cells. In this paper we propose a flexible grid built from arbitrary shaped polyhedra for the data summary. For the clustering step, a graph is then extracted from this representation. Its edges are weighted by combining density and spatial informations. The clusters are identified as the main connected components of this graph. We present experiments indicating that our grid often leads to better results than an adaptive rectangular grid method.

- Short Papers | Pp. 350-357

Polyp Detection in Endoscopic Video Using SVMs

Luís A. Alexandre; João Casteleiro; Nuno Nobreinst

Colon cancer is one of the most common cancers in developed countries. Most of these cancers start with a polyp. Polyps are easily detected by physicians. Our goal is to mimic this detection ability so that endoscopic videos can be pre-scanned with our algorithm before the physician analyses them. The method will indicate which part of the video needs attention (polyps were detected there) and hence can speedup the procedures. In this paper we present a method for polyp detection in endoscopic images that uses SVM for classification. Our experiments yielded a result of 93.16 ± 0.09% of area under the Receiver Operating Characteristic (ROC) curve on a database of 4620 images indicating that the approach proposed is well suited to the detection of polyps in endoscopic video.

- Short Papers | Pp. 358-365

A Density-Biased Sampling Technique to Improve Cluster Representativeness

Ana Paula Appel; Adriano Arantes Paterlini; Elaine P. M. de Sousa; Agma J. M. Traina; Caetano Traina

The volume and complexity of data collected by modern applications has grown significantly, leading to increasingly costly operations for both data manipulation and analysis. Sampling is an useful technique to support manager a more sensible volume in the data reduction process. Uniform sampling has been widely used but, in datasets exhibiting skewed cluster distribution, biased sampling shows better results. This paper presents the algorithm which aims at keeping the skewed tendency of the clusters from the original data. We also present experimental results obtained with the proposed BBS algorithm.

- Short Papers | Pp. 366-373

Expectation Propagation for Rating Players in Sports Competitions

Adriana Birlutiu; Tom Heskes

Rating players in sports competitions based on game results is one example of paired comparison data analysis. Since an exact Bayesian treatment is intractable, several techniques for approximate inference have been proposed in the literature. In this paper we compare several variants of expectation propagation (EP). EP generalizes assumed density filtering (ADF) by iteratively improving the approximations that are made in the filtering step of ADF. Furthermore, we distinguish between two variants of EP: EP-Correlated, which takes into account the correlations between the strengths of the players and EP-Independent, which ignores those correlations. We evaluate the different approaches on a large tennis dataset to find that EP does significantly better than ADF (iterative improvement indeed helps) and EP-Correlated does significantly better than EP-Independent (correlations do matter).

- Short Papers | Pp. 374-381

Efficient Closed Pattern Mining in Strongly Accessible Set Systems (Extended Abstract)

Mario Boley; Tamás Horváth; Axel Poigné; Stefan Wrobel

Many problems in data mining can be viewed as a special case of the problem of enumerating the closed elements of an independence system with respect to some specific closure operator. Motivated by real-world applications, e.g., in track mining, we consider a generalization of this problem to strongly accessible set systems and arbitrary closure operators. For this more general problem setting, the closed sets can be enumerated with polynomial delay if deciding membership in the set system and computing the closure operator can be solved in polynomial time. We discuss potential applications in graph mining.

- Short Papers | Pp. 382-389

Discovering Emerging Patterns in Spatial Databases: A Multi-relational Approach

Michelangelo Ceci; Annalisa Appice; Donato Malerba

Spatial Data Mining (SDM) has great potential in supporting public policy and in underpinning society functioning. One task in SDM is the discovery of characterization and peculiarities of communities sharing socio-economic aspects in order to identify potentialities, needs and public intervention. Emerging patterns (EPs) are a special kind of pattern which contrast two classes. In this paper, we face the problem of extracting EPs from spatial data. At this aim, we resort to a multi-relational approach in order to deal with the degree of complexity of discovering EPs from spatial data (i.e., (i) the spatial dimension implicitly defines spatial properties and relations, (ii) spatial phenomena are affected by autocorrelation). Experiments on real datasets are described.

- Short Papers | Pp. 390-397

Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases

Colin Cooper; Michele Zito

We investigate the statistical properties of the databases generated by the IBM QUEST program. Motivated by the claim (also supported empirical evidence) that item occurrences in real life market basket databases follow a rather different pattern, we propose an alternative model for generating artificial data.

- Short Papers | Pp. 398-405

Learning Multi-dimensional Functions: Gas Turbine Engine Modeling

Chris Drummond

This paper shows how multi-dimensional functions, describing the operation of complex equipment, can be learned. The functions are points in a shape space, each produced by morphing a prototypical function located at its origin. The prototypical function and the space’s dimensions, which define morphological operations, are learned from a set of existing functions. New ones are generated by averaging the coordinates of similar functions and using these to morph the prototype appropriately. This paper discusses applying this approach to learning new functions for components of gas turbine engines. Experiments on a set of compressor maps, multi-dimensional functions relating the performance parameters of a compressor, show that it more accurately transforms old maps, into new ones, than existing methods.

- Short Papers | Pp. 406-413