Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Data Mining: Theoretical Aspects and Applications: 7th Industrial Conference, ICDM 2007, Leipzig, Germany, July 14-18, 2007. Proceedings

Petra Perner (eds.)

En conferencia: 7º Industrial Conference on Data Mining (ICDM) . Leipzig, Germany . July 14, 2007 - July 18, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Database Management; Pattern Recognition; Image Processing and Computer Vision; Data Mining and Knowledge Discovery; Information Systems Applications (incl. Internet); Artificial Intelligence (incl. Robotics)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73434-5

ISBN electrónico

978-3-540-73435-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Case Based Reasoning and the Search for Knowledge

Michael M. Richter

A major goal of this paper is to compare Case Based Reasoning with other methods searching for knowledge. We consider knowledge as a resource that can be traded. It has no value in itself; the value is measured by the usefulness of applying it in some process. Such a process has info-needs that have to be satisfied. The concept to measure this is the economical term utility. In general, utility depends on the user and its context, i.e., it is subjective. Here we introduce levels of context from general to individual. We illustrate that Case Based Reasoning on the lower, i.e., more personal levels CBR is quite useful, in particular in comparison with traditional informational retrieval methods.

- Invited Talk | Pp. 1-14

Subsets More Representative Than Random Ones

Ilia Nouretdinov

Suppose we have a database that describes a set of objects, and our aim is to find its representative subset of a smaller size. Representativeness here means the measure of quality of prediction when the subset is used instead of the whole set in a typical machine learning procedure. We research how to find a subset that is more representative than a random selection of the same size.

- Aspects of Classification and Prediction | Pp. 15-20

Concepts for Novelty Detection and Handling Based on a Case-Based Reasoning Process Scheme

Petra Perner

Novelty detection, the ability to identify new or unknown situations that were never experienced before, is useful for intelligent systems aspiring to operate in environments where data are acquired incrementally. This characteristic is common to numerous problems in medical diagnosis and visual perception. We propose to see novelty detection as a case-based reasoning process. Our novelty-detection method is able to detect the novel situation, as well as to use the novel events for immediate reasoning. To ensure this capacity we combine statistical and similarity inference and learning. This view of CBR takes into account the properties of data, such as the uncertainty, and the underlying concepts, such as storage, learning, retrieval and indexing can be formalized and performed efficiently.

- Aspects of Classification and Prediction | Pp. 21-33

An Efficient Algorithm for Instance-Based Learning on Data Streams

Jürgen Beringer; Eyke Hüllermeier

The processing of data streams in general and the mining of such streams in particular have recently attracted considerable attention in various research fields. A key problem in stream mining is to extend existing machine learning and data mining methods so as to meet the increased requirements imposed by the data stream scenario, including the ability to analyze incoming data in an online, incremental manner, to observe tight time and memory constraints, and to appropriately respond to changes of the data characteristics and underlying distributions, amongst others. This paper considers the problem of classification on data streams and develops an instance-based learning algorithm for that purpose. The experimental studies presented in the paper suggest that this algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Notably, our method is very flexible and thus able to adapt to an evolving environment quickly, a point of utmost importance in the data stream context. At the same time, the algorithm is relatively robust and thus applicable to streams with different characteristics.

- Aspects of Classification and Prediction | Pp. 34-48

Softening the Margin in Discrete SVM

Carlotta Orsenigo; Carlo Vercellis

Discrete support vector machines are models for classification recently introduced in the context of statistical learning theory. Their distinctive feature is the formulation of mixed integer programming problems aimed at deriving optimal separating hyperplanes with minimum empirical error and maximum generalization capability. A new family of discrete SVM is proposed in this paper, for which the hyperplane establishes a variable softening of the margin to improve the separation among distinct classes. Theoretical bounds are derived to finely tune the parameters of the optimization problem. Computational tests on benchmark datasets in the biolife science application domain indicate the effectiveness of the proposed approach, that appears dominating against traditional SVM in terms of accuracy and percentage of support vectors.

- Aspects of Classification and Prediction | Pp. 49-62

Feature Selection Using Ant Colony Optimization (ACO): A New Method and Comparative Study in the Application of Face Recognition System

Hamidreza Rashidy Kanan; Karim Faez; Sayyed Mostafa Taheri

Feature Selection (FS) and reduction of pattern dimensionality is a most important step in pattern recognition systems. One approach in the feature selection area is employing population-based optimization algorithms such as Genetic Algorithm (GA)-based method and Ant Colony Optimization (ACO)-based method. This paper presents a novel feature selection method that is based on Ant Colony Optimization (ACO). ACO algorithm is inspired of ant’s social behavior in their search for the shortest paths to food sources. Most common techniques for ACO-Based feature selection use the priori information of features. However, in the proposed algorithm, classifier performance and the length of selected feature vector are adopted as heuristic information for ACO. So, we can select the optimal feature subset without the priori information of features. This approach is easily implemented and because of using one simple classifier in it, its computational complexity is very low. Simulation results on face recognition system and ORL database show the superiority of the proposed algorithm.

- Aspects of Classification and Prediction | Pp. 63-76

Outlier Detection with Streaming Dyadic Decomposition

Chetan Gupta; Robert Grossman

In this work we introduce a new algorithm for detecting outliers on streaming data in . The basic idea is to compute a dyadic decomposition into cubes in of the streaming data. Dyadic decomposition can be obtained by recursively bisecting the cube the data lies in. Dyadic decomposition obtained under streaming setting is understood as streaming dyadic decomposition. If we view the streaming dyadic decomposition as a tree with a fixed maximum (and sufficient) size (depth), then outliers are naturally defined by cubes that contain a small number of points in the cube itself or the cube itself and its neighboring cubes. We discuss some properties of detecting outliers with streaming dyadic decomposition and we present experimental results over real and artificial data sets.

- Aspects of Classification and Prediction | Pp. 77-91

VISRED –Numerical Data Mining with Linear and Nonlinear Techniques

Antonio Dourado; Edgar Ferreira; Paulo Barbeiro

Numerical data mining is a task for which several techniques have been developed that can provide a quick insight into a practical problem, if an easy to use common software platform is available. - Data ualisation by Space uction presented here, aims to be such a tool for data classification and clustering. It allows the quick application of Principal Component Analysis, Nonlinear Principal Component Analysis, Multi-dimensional Scaling (classical and non classical). For clustering several techniques have been included: hierarchical, k-means, subtractive, fuzzy k-means, SOM- Self Organizing Map (batch and recursive versions). It reads from and writes to Excel sheets. Its utility is shown with two applications: the visbreaker process part of an oil refinery and the UCI benchmark problem of breast cancer diagnosis.

- Aspects of Classification and Prediction | Pp. 92-106

Clustering by Random Projections

Thierry Urruty; Chabane Djeraba; Dan A. Simovici

Clustering algorithms for multidimensional numerical data must overcome special difficulties due to the irregularities of data distribution. We present a clustering algorithm for numerical data that combines ideas from random projection techniques and density-based clustering. The algorithm consists of two phases: the first phase that entails the use of random projections to detect clusters, and the second phase that consists of certain post-processing techniques of clusters obtained by several random projections. Experiments were performed on synthetic data consisting of randomly-generated points in ℝ, synthetic images containing colored regions randomly distributed, and, finally, real images. Our results suggest the potential of our algorithm for image segmentation.

- Clustering | Pp. 107-119

Lightweight Clustering Technique for Distributed Data Mining Applications

Lamine M. Aouad; Nhien-An Le-Khac; Tahar M. Kechadi

Many parallel and distributed clustering algorithms have already been proposed. Most of them are based on the aggregation of local models according to some collected local statistics. In this paper, we propose a lightweight distributed clustering algorithm based on minimum variance increases criterion which requires a very limited communication overhead. We also introduce the notion of distributed perturbation to improve the globally generated clustering. We show that this algorithm improves the quality of the overall clustering and manage to find the real structure and number of clusters of the global dataset.

- Clustering | Pp. 120-134