Catálogo de publicaciones - libros

Compartir en
redes sociales


Knowledge Discovery in Databases: PKDD 2007: 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007. Proceedings

Joost N. Kok ; Jacek Koronacki ; Ramon Lopez de Mantaras ; Stan Matwin ; Dunja Mladenič ; Andrzej Skowron (eds.)

En conferencia: 11º European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) . Warsaw, Poland . September 17, 2007 - September 21, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74975-2

ISBN electrónico

978-3-540-74976-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Pre-processing Large Spatial Data Sets with Bayesian Methods

Saara Hyvönen; Esa Junttila; Marko Salmenkivi

Binary data appears in many spatial applications such as dialectology and ecology. We demonstrate that a simple Bayesian modeling approach can be used in pre-processing large spatial data sets with missing or uncertain data. Our experiments on real and synthetic data show that conducting the pre-processing phase before applying conventional data mining methods, such as PCA, clustering or NMF, improves the results significantly.

- Short Papers | Pp. 498-505

Tag Recommendations in Folksonomies

Robert Jäschke; Leandro Marinho; Andreas Hotho; Lars Schmidt-Thieme; Gerd Stumme

Collaborative tagging systems allow users to assign keywords—so called “tags”—to resources. Tags are used for navigation, finding resources and serendipitous browsing and thus provide an immediate benefit for users. These systems usually include tag recommendation mechanisms easing the process of finding good tags for a resource, but also consolidating the tag vocabulary across users. In practice, however, only very basic recommendation strategies are applied.

In this paper we evaluate and compare two recommendation algorithms on large-scale real life datasets: an adaptation of user-based collaborative filtering and a graph-based recommender built on top of FolkRank. We show that both provide better results than non-personalized baseline methods. Especially the graph-based recommender outperforms existing methods considerably.

- Short Papers | Pp. 506-514

Providing Naïve Bayesian Classifier-Based Private Recommendations on Partitioned Data

Cihan Kaleli; Huseyin Polat

Data collected for collaborative filtering (CF) purposes might be split between various parties. Integrating such data is helpful for both e-companies and customers due to mutual advantageous. However, due to privacy reasons, data owners do not want to disclose their data. We hypothesize that if privacy measures are provided, data holders might decide to integrate their data to perform richer CF services. In this paper, we investigate how to achieve naïve Bayesian classifier (NBC)-based CF tasks on partitioned data with privacy. We perform experiments on real data, analyze our outcomes, and provide some suggestions.

- Short Papers | Pp. 515-522

Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework

Hillol Kargupta; Kamalika Das; Kun Liu

Analysis of privacy-sensitive data in a multi-party environment often assumes that the parties are well-behaved and they abide by the protocols. Parties compute whatever is needed, communicate correctly following the rules, and do not collude with other parties for exposing third party’s sensitive data. This paper argues that most of these assumptions fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). This paper offers a more realistic formulation of the PPDM problem as a multi-party game where each party tries to maximize its own objectives. It develops a game-theoretic framework to analyze the behavior of each party in such games and presents detailed analysis of the well known secure sum computation as an example.

- Short Papers | Pp. 523-531

Multilevel Conditional Fuzzy C-Means Clustering of XML Documents

Michal Kozielski

XML documents are the special kind of data having hierarchical structure. Typical clustering algorithms do not meet requirements which may be stated for analysis of such data. A novel, dedicated for XML documents clustering method called () is presented in the paper. The method clusters feature vectors encoding XML documents on the different structure levels. Application of algorithm to method is proposed in the paper and the advantage of this fuzzy method over hard approach to algorithm is discussed and proved. An application of method to accelerating query execution on XML documents is discussed in the paper. The experimental results performed on two data sets having different characteristics show that the proposed method of multilevel conditional fuzzy clustering of XML documents outperforms hard multilevel clustering.

- Short Papers | Pp. 532-539

Uncovering Fraud in Direct Marketing Data with a Fraud Auditing Case Builder

Fletcher Lu

This paper illustrates an automated system that replicates the investigative operation of human fraud auditors. Human fraud auditors often utilize fraud detection methods that exploit structure in database tables to uncover outliers that may be part of a fraud case. From the uncovered outliers, an auditor will build a case of fraud by searching data related to the outlier possibly across many different databases and tables within these different databases. This paper illustrates an industrial implementation of an adaptive fraud case building system that uses machine learning to conduct the search and decision-making process with an automated outlier detection component. This system was successfully applied to uncover fraud cases in real marketing data.

- Short Papers | Pp. 540-547

Real Time GPU-Based Fuzzy ART Skin Recognition

Mario Martínez-Zarzuela; Francisco Javier Díaz Pernas; David González Ortega; José Fernando Díez Higuera; Míriam Antón Rodríguez

Graphics Processing Units (GPUs) have evolved into powerful programmable processors, becoming increasingly used in many research fields such as computer vision. For non-intrusive human body parts detection and tracking, skin filtering is a powerful tool. In this paper we propose the use of a GPU-designed implementation of a Fuzzy ART Neural Network for robust real-time skin recognition. Both learning and testing processes are done on the GPU using chrominance components in TSL color space. Within the GPU, classification of several pixels can be made simultaneously, allowing skin recognition at high frame rates. System performance depends both on video resolution and number of neural network committed categories. Our application can process 296 fps or 79 fps at video resolutions of 320x240 and 640x480 pixels respectively.

- Short Papers | Pp. 548-555

A Cooperative Game Theoretic Approach to Prototype Selection

Narayanan Rama Suri; V. Santosh Srinivas; M. Narasimha Murty

In this paper we consider the task of prototype selection whose primary goal is to reduce the storage and computational requirements of the Nearest Neighbor classifier while achieving better classification accuracies. We propose a solution to the prototype selection problem using techniques from cooperative game theory and show its efficacy experimentally.

- Short Papers | Pp. 556-564

Dynamic Bayesian Networks for Real-Time Classification of Seismic Signals

Carsten Riggelsen; Matthias Ohrnberger; Frank Scherbaum

We present a novel method for automatic classification of seismological data streams, focusing on the detection of earthquake signals. We consider the approach as being a first step towards a generic method that provides for classifying a broad range of seismic patterns by modeling the interrelationships between essential features of seismograms in a graphical model. Through a continuous Wavelet transform the features are extracted, yielding a time-frequency-amplitude decomposition. The extracted features obey certain Markov properties, which allows us to form a joint distribution in terms of a Dynamic Bayesian Network. We performed experiments using real seismic data recorded at different stations in the European Broadband Network, for which we achieve an average classification accuracy of 95%.

- Short Papers | Pp. 565-572

Robust Visual Mining of Data with Error Information

Jianyong Sun; Ata Kabán; Somak Raychaudhury

Recent results on robust density-based clustering have indicated that the uncertainty associated with the actual measurements can be exploited to locate objects that are atypical for a reason unrelated to measurement errors. In this paper, we develop a robust mixture model, which, in addition, is able to nonlinearly map such data for visual exploration. Our robust visual mining approach aims to combine statistically sound density-based analysis with visual presentation of the density structure, and to provide visual support for the identification and exploration of ‘genuine’ peculiar objects of interest that are not due to the measurement errors. In this model, an exact inference is not possible despite the latent space being discretised, and we resort to employing a structured variational EM. We present results on synthetic data as well as a real application, for visualising peculiar quasars from an astrophysical survey, given photometric measurements with errors.

- Short Papers | Pp. 573-580