Catálogo de publicaciones - libros

Compartir en
redes sociales


Knowledge Discovery in Databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings

Alípio Mário Jorge ; Luís Torgo ; Pavel Brazdil ; Rui Camacho ; João Gama (eds.)

En conferencia: 9º European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) . Porto, Portugal . October 3, 2005 - October 7, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29244-9

ISBN electrónico

978-3-540-31665-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

An Entropy-Based Approach for Generating Multi-dimensional Sequential Patterns

Chang-Hwan Lee

This paper proposes a new method for generating multi-dimensional sequential patterns. While the current sequential pattern methods are generating patterns within a single attribute, the proposed method is able to detect them among different attributes. We employ an information theoretic method for generating multi-dimensional sequential patterns with the use of Hellinger entropy measure. A number of theorems are proposed to reduce the computational complexity of the sequential pattern systems. The proposed method is tested on some synthesized transaction databases.

Palabras clave: Information Content; Sequential Pattern; Target Attribute; Hypothesis Space; Transaction Database.

- Short Papers | Pp. 585-592

Visual Terrain Analysis of High-Dimensional Datasets

Wenyuan Li; Kok-Leong Ong; Wee-Keong Ng

Most real-world datasets are, to a certain degree, skewed. When considered that they are also large, they become the pinnacle challenge in data analysis. More importantly, we cannot ignore such datasets as they arise frequently in a wide variety of applications. Regardless of the analytic, it is often that the effectiveness of analysis can be improved if the characteristic of the dataset is known in advance. In this paper, we propose a novel technique to preprocess such datasets to obtain this insight. Our work is inspired by the resonance phenomenon, where similar objects resonate to a given response function. The key analytic result of our work is the data terrain , which shows properties of the dataset to enable effective and efficient analysis. We demonstrated our work in the context of various real-world problems. In doing so, we establish it as the tool for preprocessing data before applying computationally expensive algorithms.

- Short Papers | Pp. 593-600

An Auto-stopped Hierarchical Clustering Algorithm for Analyzing 3D Model Database

Tian-yang Lv; Yu-hui Xing; Shao-bing Huang; Zheng-xuan Wang; Wan-li Zuo

In the research of shape-based 3D model retrieval, the analysis and classification of 3D model database is an important topic for improving the retrieval performance. However, it encounters difficulties due to lack of valuable prior knowledge and the semantic gaps exist in 3D model retrieval. The paper proposes a new auto-stopped hierarchical clustering algorithm overcome these problems, which combines outlier detection with clustering. The Princeton Shape Benchmark along with 2 data sets from UCI is employed to evaluate the performance of the algorithm. And the new algorithm outperforms other auto-stopped algorithms and obtains better classification of 3D model database.

Palabras clave: shape-based 3D model retrieval; clustering; outlier detection.

- Short Papers | Pp. 601-608

A Comparison Between Block CEM and Two-Way CEM Algorithms to Cluster a Contingency Table

Mohamed Nadif; Gérard Govaert

When the data consists of a set of objects described by a set of variables, we have recently proposed a new mixture model which takes into account the block clustering problem on the both sets and have developed the block CEM algorithm. In this paper, we embed the block clustering problem of contingency table in the mixture approach. In using a Poisson model and adopting the classification maximum likelihood principle we perform an adapted version of block CEM. We evaluate its performance and compare it to a simple use of CEM applied on the both sets separately. We present detailed experimental results on simulated data and we show the interest of this new algorithm.

- Short Papers | Pp. 609-616

An Imbalanced Data Rule Learner

Canh Hao Nguyen; Tu Bao Ho

Imbalanced data learning has recently begun to receive much attention from research and industrial communities as traditional machine learners no longer give satisfactory results. Solutions to the problem generally attempt to adapt standard learners to the imbalanced data setting. Basically, higher weights are assigned to small class examples to avoid their being overshadowed by the large class ones. The difficulty determining a reasonable weight for each example remains. In this work, we propose a scheme to weight examples of the small class based solely on local data distributions. The approach is for categorical data, and a rule learning algorithm is constructed taking the weighting scheme into account. Empirical evaluations prove the advantages of this approach.

Palabras clave: Weighting Scheme; Small Class; Rule Evaluation; Imbalanced Data; Learn Decision Tree.

- Short Papers | Pp. 617-624

Improvements in the Data Partitioning Approach for Frequent Itemsets Mining

Son N. Nguyen; Maria E. Orlowska

Frequent Itemsets mining is well explored for various data types, and its computational complexity is well understood. There are methods to deal effectively with computational problems. This paper shows another approach to further performance enhancements of frequent items sets computation. We have made a series of observations that led us to inventing data pre-processing methods such that the final step of the Partition algorithm, where a combination of all local candidate sets must be processed, is executed on substantially smaller input data. The paper shows results from several experiments that confirmed our general and formally presented observations.

Palabras clave: Association rules; Frequent itemset; Partition; Performance.

- Short Papers | Pp. 625-633

On-Line Adaptive Filtering of Web Pages

Richard Nock; Babak Esfandiari

We present a browser extension to dynamically learn to filter unwanted Uniform Resource Locators (such as advertisements or flashy images) based on minimal user feedback. Our extension builds upon one of the top ten of Mozilla firefox plug-ins which filters URLs without learning capabilities. We apply a weighted majority-type learning algorithm working on regular expressions. Experimental results confirm that the accuracy of the predictions converges quickly to very high levels, with other key parameters: recall, specificity and precision.

Palabras clave: Regular Expression; Weighted Majority; Interface Agent; Test Driver; Spam Detection.

- Short Papers | Pp. 634-642

A Bi-clustering Framework for Categorical Data

Ruggero G. Pensa; Céline Robardet; Jean-François Boulicaut

Bi-clustering is a promising conceptual clustering approach. Within categorical data, it provides a collection of (possibly overlapping) bi-clusters, i.e., linked clusters for both objects and attribute-value pairs. We propose a generic framework for bi-clustering which enables to compute a bi-partition from collections of local patterns which capture locally strong associations between objects and properties. To validate this framework, we have studied in details the instance CDK-Means . It is a K-Means -like clustering on collections of formal concepts, i.e., connected closed sets on both dimensions. It enables to build bi-partitions with a user control on overlapping between bi-clusters. We provide an experimental validation on many benchmark datasets and discuss the interestingness of the computed bi-partitions.

Palabras clave: Local Pattern; Formal Concept; Benchmark Dataset; Jaccard Index; Scalability Issue.

- Short Papers | Pp. 643-650

Privacy-Preserving Collaborative Filtering on Vertically Partitioned Data

Huseyin Polat; Wenliang Du

Collaborative filtering (CF) systems are widely used by E-commerce sites to provide predictions using existing databases comprised of ratings recorded from groups of people evaluating various items, sometimes, however, such systems’ ratings are split among different parties. To provide better filtering services, such parties may wish to share their data. However, due to privacy concerns, data owners do not want to disclose data. This paper presents a privacy-preserving protocol for CF grounded on vertically partitioned data. We conducted various experiments to evaluate the overall performance of our scheme.

- Short Papers | Pp. 651-658

Indexed Bit Map (IBM) for Mining Frequent Sequences

Lionel Savary; Karine Zeitouni

Sequential pattern mining has been an emerging problem in data mining. In this paper, we propose a new algorithm for mining frequent sequences. It processes only one scan of the database thanks to an indexed structure associated to a bit map representation. Thus, it allows a fast data access and a compact storage in main memory. The experimental results show the efficiency of our method compared to existing algorithms. It has been tested on synthetic data and on real data containing sequences of activities of a urban population time-use survey.

- Short Papers | Pp. 659-666