Catálogo de publicaciones - libros

Compartir en
redes sociales


Data Science and Classification

Vladimir Batagelj ; Hans-Hermann Bock ; Anuška Ferligoj ; Aleš Žiberna (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-34415-5

ISBN electrónico

978-3-540-34416-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin · Heidelberg 2006

Tabla de contenidos

Data Science and Classification

Vladimir Batagelj; Hans-Hermann Bock; Anuška Ferligoj; Aleš Žiberna (eds.)

Pp. No disponible

A Tree-Based Similarity for Evaluating Concept Proximities in an Ontology

Emmanuel Blanchard; Pascale Kuntz; Mounira Harzallah; Henri Briand

The problem of evaluating semantic similarity in a network structure knows a noticeable renewal of interest linked to the importance of the ontologies in the semantic Web. Different semantic measures have been proposed in the literature to evaluate the strength of the semantic link between two concepts or two groups of concepts within either two different ontologies or the same ontology. This paper presents a theoretical study synthesis of some semantic measures based on an ontology restricted to subsumption links. We outline some limitations of these measures and introduce a new one: the Proportion of Shared Specificity. This measure which does not depend on an external corpus, takes into account the density of links in the graph between two concepts. A numerical comparison of the different measures has been made on different large size samples from WordNet.

Part I - Similarity and Dissimilarity | Pp. 3-11

Improved Fréchet Distance for Time Series

Ahlame Chouakria-Douzal; Panduranga Naidu Nagabhushan

This paper focuses on the Fréchet distance introduced by Maurice Fréchet in 1906 to account for the proximity between curves (Fréchet (1906)). The major limitation of this proximity measure is that it is based on the closeness of the values independently of the local trends. To alleviate this set back, we propose a dissimilarity index extending the above estimates to include the information of dependency between local trends. A synthetic dataset is generated to reproduce and show the limited conditions for the Fréchet distance. The proposed dissimilarity index is then compared with the Fréchet estimate and results illustrating its efficiency are reported.

Part I - Similarity and Dissimilarity | Pp. 13-20

Comparison of Distance Indices Between Partitions

Lucile Denœud; Alain Guénoche

In this paper, we compare five classical distance indices on , the set of partitions on elements. First, we recall the definition of the between partitions and an algorithm to evaluate it. Then, we build sets () of partitions at transfers from an initial partition . Finally, we compare the distributions of the five index values between and the elements of ().

Part I - Similarity and Dissimilarity | Pp. 21-28

Design of Dissimilarity Measures: A New Dissimilarity Between Species Distribution Areas

Christian Hennig; Bernhard Hausdorf

We give some guidelines for the choice and design of dissimilarity measures and illustrate some of them by the construction of a new dissimilarity measure between species distribution areas in biogeography. Species distribution data can be digitized as presences and absences in certain geographic units. As opposed to all measures already present in the literature, the geco coefficient introduced in the present paper takes the geographic distance between the units into account. The advantages of the new measure are illustrated by a study of the sensitivity against incomplete sampling and changes in the definition of the geographic units in two real data sets.

Part I - Similarity and Dissimilarity | Pp. 29-37

Dissimilarities for Web Usage Mining

Fabrice Rossi; Francisco De Carvalho; Yves Lechevallier; Alzennyr Da Silva

The obtention of a set of homogeneous classes of pages according to the browsing patterns identified in web server log files can be very useful for the analysis of organization of the site and of its adequacy to user needs. Such a set of homogeneous classes is often obtained from a dissimilarity measure between the visited pages defined via the visits extracted from the logs. There are however many possibilities for defined such a measure. This paper presents an analysis of different dissimilarity measures based on the comparison between the semantic structure of the site identified by experts and the clustering constructed with standard algorithms applied to the dissimilarity matrices generated by the chosen measures.

Part I - Similarity and Dissimilarity | Pp. 39-46

Properties and Performance of Shape Similarity Measures

Remco C. Veltkamp; Longin Jan Latecki

This paper gives an overview of shape dissimilarity measure properties, such as metric and robustness properties, and of retrieval performance measures. Fifteen shape similarity measures are shortly described and compared. Their retrieval results on the MPEG-7 Core Experiment CE-Shape-1 test set as reported in the literature and obtained by a reimplementation are compared and discussed.

Part I - Similarity and Dissimilarity | Pp. 47-56

Hierarchical Clustering for Boxplot Variables

Javier Arroyo; Carlos Maté; Antonio Muñoz-San Roque

Boxplots are well-known exploratory charts used to extract meaningful information from batches of data at a glance. Their strength lies in their ability to summarize data retaining the key information, which also is a desirable property of symbolic variables. In this paper, boxplots are presented as a new kind of symbolic variable. In addition, two different approaches to measure distances between boxplot variables are proposed. The usefulness of these distances is illustrated by means of a hierarchical clustering of boxplot data.

Part II - Classification and Clustering | Pp. 59-66

Evaluation of Allocation Rules Under Some Cost Constraints

Farid Beninel; Michel Grun Rehomme

Allocation of individuals or objects to labels or classes is a central problem in statistics, particularly in supervised classification methods such as Linear and Quadratic Discriminant analysis, Logistic Discrimination, Neural Networks, Support Vector Machines, and so on. Misallocations occur when allocation class and origin class differ. These errors could result from different situations such as quality of data, definition of the explained categorical variable or choice of the learning sample. Generally, the cost is not uniform depending on the type of error and consequently the use only of the percentage of correctly classified objects is not enough informative.

In this paper we deal with the evaluation of allocation rules taking into account the error cost. We use a statistical index which generalizes the percentage of correctly classified objects.

Part II - Classification and Clustering | Pp. 67-73

Crisp Partitions Induced by a Fuzzy Set

Slavka Bodjanova

Relationship between fuzzy sets and crisp partitions defined on the same finite set of objects X is studied. Granular structure of a fuzzy set is described by rough fuzzy sets and the quality of approximation of a fuzzy set by a crisp partition is evaluated. Measure of rough dissimilarity between clusters from a crisp partition of X with respect to a fuzzy set A defined on X is introduced. Properties of this measure are explored and some applications are provided. Classification of membership grades of A into linguistic categories is discussed.

Part II - Classification and Clustering | Pp. 75-82