Catálogo de publicaciones - libros

Compartir en
redes sociales


From Data and Information Analysis to Knowledge Engineering: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V. University of Magdeburg, March 9-11, 2005

Myra Spiliopoulou ; Rudolf Kruse ; Christian Borgelt ; Andreas Nürnberger ; Wolfgang Gaul (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-31313-7

ISBN electrónico

978-3-540-31314-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Berlin · Heidelberg 2006

Tabla de contenidos

Robust Transformations and Outlier Detection with Autocorrelated Data

Andrea Cerioli; Marco Riani

The analysis of regression data is often improved by using a transformation of the response rather than the original response itself. However, finding a suitable transformation can be strongly affected by the influence of a few individual observations. Outliers can have an enormous impact on the fitting of statistical models and can be hard to detect due to masking and swamping. These difficulties are enhanced in the case of models for dependent observations, since any anomalies are with respect to the specific autocorrelation structure of the model. In this paper we develop a forward search approach which is able to robustly estimate the Box-Cox transformation parameter under a first-order spatial autoregression model.

- Robust Methods in Multivariate Statistics | Pp. 262-269

Robust Multivariate Methods: The Projection Pursuit Approach

Peter Filzmoser; Sven Serneels; Christophe Croux; Pierre J. Van Espen

Projection pursuit was originally introduced to identify structures in multivariate data clouds (Huber, 1985). The idea of projecting data to a low-dimensional subspace can also be applied to multivariate statistical methods. The robustness of the methods can be achieved by applying robust estimators to the lower-dimensional space. Robust estimation in high dimensions can thus be avoided which usually results in a faster computation. Moreover, flat data sets where the number of variables is much higher than the number of observations can be easier analyzed in a robust way.

We will focus on the projection pursuit approach for robust continuum regression (Serneels et al., 2005). A new algorithm is introduced and compared with the reference algorithm as well as with classical continuum regression.

- Robust Methods in Multivariate Statistics | Pp. 270-277

Finding Persisting States for Knowledge Discovery in Time Series

Fabian Mörchen; Alfred Ultsch

Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series. We propose a new method for meaningful unsupervised discretization of numeric time series called “Persist”, based on the Kullback-Leibler divergence between the marginal and the self-transition probability distributions of the discretization symbols. In evaluations with artificial and real life data it clearly outperforms existing methods.

- Robust Methods in Multivariate Statistics | Pp. 278-285

Restricted Co-inertia Analysis

Pietro Amenta; Enrico Ciavolino

In this paper, an extension of the Co-inertia Analysis is proposed. This extension is based on a objective function which takes into account directly the external information, as linear restrictions about one set of variables, by rewriting the Co-inertia Analysis objective function according to the principle of Restricted Eigenvalue Problem (Rao (1973)).

- Data Mining and Explorative Multivariate Data Analysis | Pp. 286-293

Hausman Principal Component Analysis

Vartan Choulakian; Luigi Dambra; Biagio Simonetti

The aim of this paper is to obtain discrete-valued weights of the variables by constraining them to Hausman weights (−1, 0, 1) in principal component analysis. And this is done in two steps: First, we start with the centroid method, which produces the most restricted optimal weights −1 and 1; then extend the weights to −1,0 or 1.

- Data Mining and Explorative Multivariate Data Analysis | Pp. 294-301

Nonlinear Time Series Modelling: Monitoring a Drilling Process

Amor Messaoud; Claus Weihs; Franz Hering

Exponential autoregressive (ExpAr) time series models are able to reveal certain types of nonlinear dynamics such as fixed points and limit cycles. In this work, these models are used to model a drilling process. This modelling approach provides an on-line monitoring strategy, using control charts, of the process in order to detect dynamic disturbances and to secure production with high quality.

- Data Mining and Explorative Multivariate Data Analysis | Pp. 302-309

Word Length and Frequency Distributions in Different Text Genres

Gordana Antić; Ernst Stadlober; Peter Grzybek; Emmerich Kelih

In this paper we study word length frequency distributions of a systematic selection of 80 Slovenian texts (private letters, journalistic texts, poems and cooking recipes). The adequacy of four two-parametric Poisson models is analyzed according their goodness of fit properties, and the corresponding model parameter ranges are checked for their suitability to discriminate the text sorts given. As a result we obtain that the Singh-Poisson distribution seems to be the best choice for both problems: first, it is an appropriate model for three of the text sorts (private letters, journalistic texts and poems); and second, the parameter space of the model can be split into regions constituting all four text sorts.

- Text Mining | Pp. 310-317

Bootstrapping an Unsupervised Morphemic Analysis

Christoph Benden

Unsupervised morphemic analysis may be divided into two phases: 1) Establishment of an initial morpheme set, and 2) optimization of this generally imperfect first approximization. This paper focuses on the first phase, that is the establishment of an initial morphemic analysis, whereby methodological questions regarding ‘unsupervision’ will be touched on. The basic algorithm for segmentation employed goes back to Harris (1955). Proposals for the antecedent transformation of graphemic representations into (partial) phonemic ones are discussed as well as the postprocessing step of reapplying the initially gained morphemic candidates. Instead of directly using numerical (count) measures, a proposal is put forward which exploits numerical interpretations of a universal morphological assumption on morphemic order for the evaluation of the computationally gained segmantations and their quantitative properties.

- Text Mining | Pp. 318-325

Automatic Extension of Feature-based Semantic Lexicons via Contextual Attributes

Chris Biemann; Rainer Osswald

We describe how a feature-based semantic lexicon can be automatically extended using large, unstructured text corpora. Experiments are carried out using the lexicon HaGenLex and the Wortschatz corpus. The semantic classes of nouns are determined via the adjectives that modify them. It turns out to be reasonable to combine several classifiers for single attributes into one for complex semantic classes. The method is evaluated thoroughly and possible improvements are discussed.

- Text Mining | Pp. 326-333

Learning Ontologies to Improve Text Clustering and Classification

Stephan Bloehdorn; Philipp Cimiano; Andreas Hotho

Recent work has shown improvements in text clustering and classification tasks by integrating conceptual features extracted from ontologies. In this paper we present text mining experiments in the medical domain in which the ontological structures used are acquired automatically in an unsupervised learning process from the text corpus in question. We compare results obtained using the automatically learned ontologies with those obtained using manually engineered ones. Our results show that both types of ontologies improve results on text clustering and classification tasks, whereby the automatically acquired ontologies yield a improvement competitive with the manually engineered ones.

- Text Mining | Pp. 334-341