Catálogo de publicaciones - libros

Compartir en
redes sociales


Selected Contributions in Data Analysis and Classification

Paula Brito ; Guy Cucumel ; Patrice Bertrand ; Francisco de Carvalho (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Statistical Theory and Methods; Data Mining and Knowledge Discovery; Pattern Recognition

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73558-8

ISBN electrónico

978-3-540-73560-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Dependencies and Variation Components of Symbolic Interval-Valued Data

Lynne Billard

In 1987, Diday added a new dimension to data analysis with his fundamental paper introducing the notions of symbolic data and their analyses. He and his colleagues, among others, have developed innumerable techniques to analyse symbolic data; yet even more is waiting to be done. One area that has seen much activity in recent years involves the search for a measure of dependence between two symbolic random variables. This paper presents a covariance function for interval-valued data. It also discusses how the total, between interval, and within interval variations relate; and in particular, this relationship shows that a covariance function based only on interval midpoints does not capture all the variations in the data. While important in its own right, the covariance function plays a central role in many multivariate methods.

Part I - Analysis of Symbolic Data | Pp. 3-12

On the Analysis of Symbolic Data

Paula Brito

Symbolic data extend the classical tabular model, where each individual, takes exactly one value for each variable by allowing multiple, possibly weighted, values for each variable. New variable types - interval-valued, categorical multi-valued and modal variables - have been introduced, which allow representing variability and/or uncertainty inherent to the data. But are we still in the same framework when we allow for the variables to take multiple values? Are the definitions of basic notions still so straightforward? What properties remain valid? In this paper we discuss some issues that arise when trying to apply classical data analysis techniques to symbolic data. The central question of the measurement of dispersion, and the consequences of different possible choices in the design of multivariate methods will be addressed.

Part I - Analysis of Symbolic Data | Pp. 13-22

Symbolic Analysis to Learn Evolving CyberTraffic

Costantina Caruso; Donato Malerba

Monitoring Internet traffic in order to both dynamically tune network resources and ensure services continuity is a big challenge. Two main research issues characterize the analysis of the huge amount of data generated by Internet traffic: 1) learning a normal adaptive model which must be able to detect anomalies, and 2) computational efficiency of the learning algorithm in order to work properly on-line. In this chapter, we propose a methodology which returns a set of symbolic objects representing an adaptive model of ‘normal’ daily network traffic. The model can then be used to discover traffic anomalies of interest for the network administrator.

Part I - Analysis of Symbolic Data | Pp. 23-33

A Clustering Algorithm for Symbolic Interval Data Based on a Single Adaptive Hausdorff Distance

Francisco de A. T. de Carvalho

This paper introduces a dynamic clustering method to partitioning symbolic interval data. This method furnishes a partition and a prototype for each cluster by optimizing an adequacy criterion that measures the fitting between the clusters and their representatives. To compare symbolic interval data, the method uses a single adaptive Hausdorff distance that changes at each iteration but is the same for all the clusters. Experiments with real and synthetic symbolic interval data sets showed the usefulness of the proposed method.

Part I - Analysis of Symbolic Data | Pp. 35-44

An Agglomerative Hierarchical Clustering Algorithm for Improving Symbolic Object Retrieval

Floriana Esposito; Claudia d’Amato

One of the main novelties of the Symbolic data analysis is the introduction of symbolic objects (SOs): “aggregated data” that synthesize information concerning a group of individuals of a population. SOs are particularly suitable for representing (and managing) census data that require the availability of aggregated information. This paper proposes a new (conceptual) hierarchical agglomerative clustering algorithm whose output is a “tree” of progressively general SO descriptions. Such a tree can be effectively used to outperform the resource retrieval task, specifically for finding the SO to which an individual belongs to and/or to determine a more general representation of a given SO. (i.e. finding a more general segment of information which a SO belongs to).

Part I - Analysis of Symbolic Data | Pp. 45-53

3: Three-Way Symbolic Multidimensional Scaling

Patrick J. F. Groenen; Suzanne Winsberg

Multidimensional scaling aims at reconstructing dissimilarities between pairs of objects by distances in a low dimensional space. However, in some cases the dissimilarity itself is not known, but the range, or a histogram of the dissimilarities is given. This type of data fall in the wider class of symbolic data (see Bock and Diday (2000)). We model three-way two-mode data consisting of an interval of dissimilarities for each object pair from each of sources by a set of intervals of the distances defined as the minimum and maximum distance between two sets of embedded rectangles representing the objects. In this paper, we provide a new algorithm called 3 using iterative majorization, that is based on an algorithm, I- developed for the two-way case where the dissimilarities are given by a range of values ie an interval (see Groenen et al. (2006)). The advantage of iterative majorization is that each iteration is guaranteed to improve the solution until no improvement is possible. We present the results on an empirical data set on synthetic musical tones.

Part I - Analysis of Symbolic Data | Pp. 55-67

Clustering and Validation of Interval Data

André Hardy; Joffray Baune

The paper addresses the problem of assessing the validity of the clusters found by a clustering algorithm. The determination of the “true” number of “natural” clusters has often been considered as the central problem of cluster validation. Many different stopping rules have been proposed in the research literature but most of them are applicable only to classical data (qualitative or quantitative). In this paper we investigate the problem of the determination of the number of clusters for symbolic objects described by interval variables. We consider five classical methods and two hypothesis tests based on the Poisson point process. We extend these methods to interval data. We apply them to the meteorological stations data set.

Part I - Analysis of Symbolic Data | Pp. 69-81

Building Symbolic Objects from Data Streams

Georges Hébrail; Yves Lechevallier

With the increase of computer use in all sectors of activity, more and more data are available as streams of structured records so that it is not possible to store all data before analyzing them in a data mining perspective. New data management systems have been studied to handle such data streams and new algorithms have been developed to perform stream mining. In this paper, we propose approaches to extend the construction of symbolic objects to data streams: symbolic objects are built and maintained as a representation of a complete stream or a sliding window on the stream.

Part I - Analysis of Symbolic Data | Pp. 83-94

Feature Clustering Method to Detect Monotonic Chain Structures in Symbolic Data

Manabu Ichino

Finding a linear structure in multidimensional data is a main purpose of the principal component analysis (PCA). This paper describes a feature clustering method to detect embedded in symbolic data tables based on the which is a mathematical model to manipulate symbolic objects.

Part I - Analysis of Symbolic Data | Pp. 95-102

Symbolic Markov Chains

Monique Noirhomme-Fraiture; Etienne Cuvelier

Stochastic processes have, since a long time, large applications in quite different domains. The standard theory considers discrete or continuous state space. We consider here the concept of Stochastic Process associated to all the cases of symbolic variables: quantitative, categorical single and multiple, interval, modal. More particularly, we adapt the definition of Markov Chain and give the equivalent of the Chapman-Kolmogorov theorem in all cases.

Part I - Analysis of Symbolic Data | Pp. 103-111