Catálogo de publicaciones - libros

Compartir en
redes sociales


From Data and Information Analysis to Knowledge Engineering: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V. University of Magdeburg, March 9-11, 2005

Myra Spiliopoulou ; Rudolf Kruse ; Christian Borgelt ; Andreas Nürnberger ; Wolfgang Gaul (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-31313-7

ISBN electrónico

978-3-540-31314-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Berlin · Heidelberg 2006

Tabla de contenidos

Part-of-Speech Induction by Singular Value Decomposition and Hierarchical Clustering

Reinhard Rapp

Part-of-speech induction involves the automatic discovery of word classes and the assignment of each word of a vocabulary to one or several of these classes. The approach proposed here is based on the analysis of word distributions in a large collection of German newspaper texts. Its main advantage over other attempts is that it combines the hierarchical clustering of context vectors with a previous step of dimensionality reduction that minimizes the effects of sampling errors.

- Text Mining | Pp. 422-429

Near Similarity Search and Plagiarism Analysis

Benno Stein; Sven Meyer zu Eissen

Existing methods to text plagiarism analysis mainly base on “chunking”, a process of grouping a text into meaningful units each of which gets encoded by an integer number. Together theses numbers form a document’s signature or fingerprint. An overlap of two documents’ fingerprints indicate a possibly plagiarized text passage. Most approaches use MD5 hashes to construct fingerprints, which is bound up with two problems: () it is computationally expensive, () a small chunk size must be chosen to identify matching passages, which additionally increases the effort for fingerprint computation, fingerprint comparison, and fingerprint storage.

This paper proposes a new class of fingerprints that can be considered as an abstraction of the classical vector space model. These fingerprints operationalize the concept of “near similarity” and enable one to quickly identify candidate passages for plagiarism. Experiments show that a plagiarism analysis based on our fingerprints leads to a speed-up by a factor of five and higher—without compromising the recall performance.

- Text Mining | Pp. 430-437

Objective Function-based Discretization

Frank Höppner

Decision tree learner inspect marginal class distributions of numerical attributes to infer a predicate that can be used as a decision node in the tree. Since such discretization techniques examine the marginal distribution only, they may fail completely to predict the class correctly even in cases for which a decision tree with a 100% classification rate exists. In this paper, an objective function-based clustering algorithm is modified to yield a discretization of numerical variables that overcomes these problems. The underlying clustering algorithm is the fuzzy c-means algorithm, which is modified to (a) take the class information into account and (b) to organize all cluster prototypes in a regular grid such that the grid rather than the individiual clusters are optimized.

- Fuzzy Data Analysis | Pp. 438-445

Understanding and Controlling the Membership Degrees in Fuzzy Clustering

Frank Klawonn

Fuzzy cluster analysis uses membership degrees to assign data objects to clusters in order to better handle ambiguous data that share properties of different clusters. However, the introduction of membership degrees requires a new parameter called fuzzifier. In this paper the good and bad effects of the fuzzifier on the clustering results are analysed and based on these considerations a more general approach to fuzzy clustering is proposed, providing better control on the membership degrees and their influence in fuzzy cluster analysis.

- Fuzzy Data Analysis | Pp. 446-453

Autonomous Sensor-based Landing Systems: Fusion of Vague and Incomplete Information by Application of Fuzzy Clustering Techniques

Bernd Korn

Enhanced Vision Systems (EVS) are currently developed with the goal to alleviate restrictions in airspace and airport capacity in low-visibility conditions. EVS relies on weather penetrating forward-looking sensors that augment the naturally existing visual cues in the environment and provide a real-time image of prominent topographical objects that may be identified by the pilot. In this paper an automatic analysis of millimetre wave radar images for Enhanced Vision Systems is presented. The core part of the system is a fuzzy rule based inference machine which controls the data analysis based on the uncertainty in the actual knowledge in combination with a-priori knowledge. Compared with standard TV or IR images the quality of MMW images is rather poor and data is highly corrupted with noise and clutter. Therefore, one main task of the inference machine is to handle uncertainties as well as ambiguities and inconsistencies to draw the right conclusions. The output of different sensor data analysis processes are fused and evaluated within a fuzzy/possibilistic clustering algorithm whose results serve as input to the inference machine. The only a-priori knowledge used in the presented approach is the same pilots already know from airport charts which are available of almost every airport. The performance of the approach is demonstrated with real data acquired during extensive flight tests to several airports in Northern Germany.

- Fuzzy Data Analysis | Pp. 454-461

Outlier Preserving Clustering for Structured Data Through Kernels

Marie-Jeanne Lesot

In this paper, we propose a kernel-based clustering algorithm that highlights both the major trends and the atypical behaviours present in a dataset, so as to provide a complete characterisation of the data; thanks to the kernel framework, the algorithm can be applied independently of the data nature without requiring any adaptation. We apply it to xml data describing student results to several exams: we propose a kernel to handle such data and present the results obtained with a real dataset.

- Fuzzy Data Analysis | Pp. 462-469

Classification-relevant Importance Measures for the West German Business Cycle

Daniel Enache; Claus Weihs; Ursula Garczarek

When analyzing business cycle data, one observes that the relevant predictor variables are often highly correlated. This paper presents a method to obtain measures of importance for the classification of data in which such multicollinearity is present. In systems with highly correlated variables it is interesting to know what changes are inflicted when a certain predictor is changed by one unit and all other predictors according to their correlation to the first instead of a ceteris paribus analysis. The approach described in this paper uses directional derivatives to obtain such importance measures. It is shown how the interesting directions can be estimated and different evaluation strategies for characteristics of classification models are presented. The method is then applied to linear discriminant analysis and multinomial logit for the classification of west German business cycle phases.

- Economics and Mining in Business Processes | Pp. 470-477

The Classification of Local and Branch Labour Markets in the Upper Silesia

Witold Hantke

The paper focuses on the differentiation of unemployment in Upper Silesia. All analyses have been carried out referring to both, particular professions and subregions of Silesia. It shows, how the internal diversity of the province influences the situation of the labour markets. It could be interesting to compare the present and future classifications, because the time scope of the research covers the year preceeding the European integration.

- Economics and Mining in Business Processes | Pp. 478-485

An Overview of Artificial Life Approaches for Clustering

David Kämpf; Alfred Ultsch

Recently, artificial life approaches for clustering have been proposed. However, the research on artificial life is mainly the simulation of systems based on models for real life. In addition to that artificial life methods have been utilized to solve optimization problems. This paper gives a short overview of artificial life and its applications in general. From this starting point we will focus on artificial life approaches used for clustering. These approaches are characterized by the fact that solutions are emergent rather than predefined and preprogrammed. The data is seen as active rather than passive objects. New data can be added incrementally to the system. We will present existing concepts for clustering with artificial life and highlight their differences and strengths.

- Economics and Mining in Business Processes | Pp. 486-493

Design Problems of Complex Economic Experiments

Jonas Kunze

Economic experiments are a source of valuable data about economic decision making. Although a lot of experiments exist, there are few experiments done where the subjects face complex decision tasks.

Inventory management problems in supply chain management represent such complex decision tasks because of their time delay and nonlinearities.

The published experiments on inventory management in supply chains are reviewed and some of the design problems of these experiments are discussed. The paper especially focuses on incentives, presentation effects and concreteness of the experiment.

- Economics and Mining in Business Processes | Pp. 494-501