Catálogo de publicaciones - libros
From Data and Information Analysis to Knowledge Engineering: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V. University of Magdeburg, March 9-11, 2005
Myra Spiliopoulou ; Rudolf Kruse ; Christian Borgelt ; Andreas Nürnberger ; Wolfgang Gaul (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-31313-7
ISBN electrónico
978-3-540-31314-4
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer Berlin · Heidelberg 2006
Tabla de contenidos
On External Indices for Mixtures: Validating Mixtures of Genes
Ivan G. Costa; Alexander Schliep
Mixture models represent results of gene expression cluster analysis in a more natural way than ‘hard’ partitions. This is also true for the representation of gene labels, such as functional annotations, where one gene is often assigned to more than one annotation term. Another important characteristic of functional annotations is their higher degree of detail in relation to groups of co-expressed genes. In other words, genes with similar function should be be grouped together, but the inverse does not holds. Both these facts, however, have been neglected by validation studies in the context of gene expression analysis presented so far. To overcome the first problem, we propose an external index extending the corrected Rand for comparison of two mixtures. To address the second and more challenging problem, we perform a clustering of terms from the functional annotation, in order to address the problem of difference in coarseness of two mixtures to be compared. We resort to simulated and biological data to show the usefulness of our proposals. The results show that we can only differentiate between distinct solutions after applying the component clustering
- Bioinformatics and Biostatistics | Pp. 662-669
Tests for Multiple Change Points in Binary Markov Sequences
Joachim Krauth
In Krauth (2005) we derived a finite conditional conservative test for a change point in a Bernoulli sequence with first-order Markov dependence. This approach was based on the property of intercalary independence of Markov processes (Dufour and Torrès (2000)) and on the CUSUM statistic considered in Krauth (1999, 2000) for the case of independent binomial trials. Here, we derive finite conditional tests for multiple change points in binary first-order Markov sequences using in addition conditional modified maximum likelihood estimates for multiple change points (Krauth, 2004) and Exact Fisher tests.
- Bioinformatics and Biostatistics | Pp. 670-677
UnitExpressions: A Rational Normalization Scheme for DNA Microarray Data
Alfred Ultsch
A new normalization scheme for DNA microarray data, called UnitExpresion, is introduced. The central idea is to derive a precise model of unexpressed genes. Most of the expression rates in a typical microarray experiment belong to this category. Pareto probability density estimation (PDE) and EM are used to calculate a precise model of this distribution. UnitExpressions represent a lower bound on the probability that a gene on a microarray is expressed. With UnitExpressions experiments from different microrarrays can be compared even across different studies. UnitExpressions are compared to standardized LogRatios for distance calculation in hierarchical clustering.
- Bioinformatics and Biostatistics | Pp. 678-683
A Ridge Classification Method for High-dimensional Observations
Martin Grüning; Siegfried Kropf
Currently experimental techniques such as gene expression analysis with microarrays result in the situation that the number of variables exceeds the number of observations by far. Then application of the standard classification methodology fails because of singularity of the covariance matrix. One of the possibilities to circumvent this problem is to use ridge estimates instead of the sample covariance matrix.
Raudys and Skurichina presented an analytic formula for the asymptotic error of the one-parametric ridge classification rule. Based on their approach we derived a new formula which is unlike that of Raudys and Skurichina also valid in the case of a singular covariance matrix. Under suitable conditions the formula allows to calculate the ridge parameter which minimizes the classification error. Simulation results are presented.
- Classification of High-dimensional Biological and Medical Data | Pp. 684-691
Assessing the Trustworthiness of Clustering Solutions Obtained by a Function Optimization Scheme
Ulrich Möller; Dörte Radke
We present a method for finding clustering structures which are good and trustable. The method analyzes re-clustering results obtained by varying the search path in the space of partitions. From the scatter of results the joint optimum of given quality criteria is determined and the re-occurrence probability of this optimum (called optimum consensus) is estimated. Then the finest structure is determined that emerged robustly with scores typical of high partition quality. When applied to tumor gene expression benchmark data the method assigned fewer tissue samples to a wrong class compared to methods based on either consensus or quality criteria.
- Classification of High-dimensional Biological and Medical Data | Pp. 692-699
Variable Selection for Discrimination of More Than Two Classes Where Data are Sparse
Gero Szepannek; Claus Weihs
In classification, with an increasing number of variables, the required number of observations grows drastically. In this paper we present an approach to put into effect the maximal possible variable selection, by splitting a class classification problem into pairwise problems. The principle makes use of the possibility that a variable that discriminates two classes will not necessarily do so for all such class pairs.
We further present the construction of a classification rule based on the pairwise solutions by the Pairwise Coupling algorithm according to Hastie and Tibshirani (1998). The suggested proceedure can be applied to any classification method. Finally, situations with lack of data in multidimensional spaces are investigated on different simulated data sets to illustrate the problem and the possible gain. The principle is compared to the classical approach of linear and quadratic discriminant analysis.
- Classification of High-dimensional Biological and Medical Data | Pp. 700-707
The Assessment of Second Primary Cancers (SPCs) in a Series of Splenic Marginal Zone Lymphoma (SMZL) Patients
Stefano De Cantis; Anna Maria Taormina
The purpose of this study is to estimate the risk of second primary cancer (SPC) in 129 consecutive patients with splenic marginal zone lymphoma (SMZL) diagnosed in three Italian haematological centres. The person-years method deriving as a sum of products of age- and sex- specific rates and of the corresponding time at risk was used. The SPC Standardized Incidence Ratio (SIR) was 2.03 with a 95% confidence interval: [1.05, 3.56] ( < 0.05) and the corresponding Absolute Excess Risk (AER) was 145.8 (per 10000 SMZL patients per year). Our findings evidence a high frequency of additional cancers in patients with SMZL and suggest that the incidence rate of SPCs is significantly different from that expected in the general population.
- Medical and Health Sciences | Pp. 708-715
Heart Rate Classification Using Support Vector Machines
Michael Vogt; Ulrich Moissl; Jochen Schaab
This contribution describes a classification technique that improves the heart rate estimation during hemodialysis treatments. After the heart rate is estimated from the pressure signal of the dialysis machine, a classifier decides if it is correctly identified and rejects it if necessary. As the classifier employs a support vector machine, special interest is put on the automatic selection of its user parameters. In this context, a comparison between different optimization techniques is presented, including a gradient projection method as latest development.
- Medical and Health Sciences | Pp. 716-723
Visual Mining in Music Collections
Fabian Mörchen; Alfred Ultsch; Mario Nöcker; Christian Stamm
We describe the system for organizing large collections of music with databionic mining techniques. Visualization based on perceptually motivated audio features and Emergent Self-Organizing Maps enables the unsupervised discovery of timbrally consistent clusters that may or may not correspond to musical genres and artists. We demonstrate the visualization capabilities of the U-Map. An intuitive browsing of large music collections is offered based on the paradigm of topographic maps. The user can navigate the sound space and interact with the maps to play music or show the context of a song.
- Music Analysis | Pp. 724-731
Modeling Memory for Melodies
Daniel Müllensiefen; Christian Hennig
The aim of the presented study was to find structural descriptions of melodies that influence recognition memory for melodies. 24 melodies were played twice to 42 test persons. In the second turn, some of the melodies were changed, and the subjects were asked whether they think that the melody has been exactly the same as in the first turn or not. The variables used to predict the subject judgments comprise data about the subjects’ musical experience, features of the original melody and its position in the music piece, and informations about the change between the first and the second turn. Classification and regression methods have been carried out and tested on a subsample. The prediction problem turned out to be difficult. The results seem to be influenced strongly by differences between the subjects and between the melodies that had not been recorded among the regressor variables.
- Music Analysis | Pp. 732-739