Catálogo de publicaciones - libros

Compartir en
redes sociales


Statistical Methods for Biostatistics and Related Fields

Wolfgang Härdle Yuichi Mori Philippe Vieu

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-32690-8

ISBN electrónico

978-3-540-32691-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Survival Analysis

Makoto Tomita

This paper focuses on the Fréchet distance introduced by Maurice Fréchet in 1906 to account for the proximity between curves (Fréchet (1906)). The major limitation of this proximity measure is that it is based on the closeness of the values independently of the local trends. To alleviate this set back, we propose a dissimilarity index extending the above estimates to include the information of dependency between local trends. A synthetic dataset is generated to reproduce and show the limited conditions for the Fréchet distance. The proposed dissimilarity index is then compared with the Fréchet estimate and results illustrating its efficiency are reported.

Part I - Biostatistics | Pp. 207-217

Ozone Pollution Forecasting Using Conditional Mean and Conditional Quantiles with Functional Covariates

Hervé Cardot; Christophe Crambes; Pascal Sarda

Microarrays are part of a new class of biotechnologies which allow the monitoring of expression levels of thousands of genes simultaneously. In microarray data analysis, the comparison of gene expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large data sets. To identify genes with altered expression under two experimental conditions, we describe in this chapter a new nonparametric statistical approach. Specifically, we propose estimating the distributions of a t-type statistic and its null statistic, using kernel methods. A comparison of these two distributions by means of a likelihood ratio test can identify genes with significantly changed expressions. A method for the calculation of the cut-off point and the acceptance region is also derived. This methodology is applied to a leukemia data set containing expression levels of 7129 genes. The corresponding results are compared to the traditional -test and the normal mixture model.

Part II - Related Sciences | Pp. 221-243

Nonparametric Functional Methods: New Tools for Chemometric Analysis

Frédéric Ferraty; Aldo Goia; Philippe Vieu

In this contribution, we have shown how spectrometric data can be succesfully analysed by considering them as curve data and by using the recent nonparametric methodology for curve data. However, note that all the statistical backgrounds are presented in a general way (and not only for spectrometric data). Similarly, the XploRe quantlets that we provided can be directly used in any other applied setting involving curve data. For reason of shortness, and because it was not the purpose here, we only presented the results given by the nonparametric functional methodology without discussing any comparison with alternative methods (but relevant references on these points are given all along the contribution).

Also for shortness reasons, we just presented two statistical problems (namely regression from curve data and curves discrimination) among the several problems that can be treated by nonparametric functional methods (on this point also, our contribution contains several references about other problems that could be attacked similarly). These two problems have been chosen by us for two reasons: first, these issues are highly relevant to many applied studies involving curve analysis and second, their theoretical and practical importance led to emergence of different computer automated procedures.

Part II - Related Sciences | Pp. 245-264

Variable Selection in Principal Component Analysis

Yuichi Mori; Masaya Iizuka; Tomoyuki Tarumi; Yutaka Tanaka

While there exist several criteria by which to select a reasonable subset of variables in the context of PCA, we introduce herein variable selection using criteria in modified PCA (M.PCA) among others.

In order to perform such variable selection via XploRe, the quantlib vaspca, which reads all the necessary quantlets for selection, is first called, and then the quantlet mpca is run using a number of selection parameters.

In the first four sections we present brief explanations of variable selection in PCA, an outline of M.PCA and flows of four selection procedures, based mainly on , , and . In the last two sections, we illustrate the quantlet mpca and its performance by two numerical examples.

Part II - Related Sciences | Pp. 265-283

Spatial Statistics

Pavel Čížzek; Wolfgang Härdle; Jürgen Symanzik

While there exist several criteria by which to select a reasonable subset of variables in the context of PCA, we introduce herein variable selection using criteria in modified PCA (M.PCA) among others.

In order to perform such variable selection via XploRe, the quantlib vaspca, which reads all the necessary quantlets for selection, is first called, and then the quantlet mpca is run using a number of selection parameters.

In the first four sections we present brief explanations of variable selection in PCA, an outline of M.PCA and flows of four selection procedures, based mainly on , , and . In the last two sections, we illustrate the quantlet mpca and its performance by two numerical examples.

Part II - Related Sciences | Pp. 285-304

Functional Data Analysis

Michal Benko

In many different fields of applied statistics the object of interest is depending on some continuous parameter, i.e. continuous time. Typical examples in biostatistics are growth curves or temperature measurements. Although for technical reasons, we are able to measure temperature just in discrete intervals — it is clear that temperature is a continuous process. Temperature during one year is a function with argument “time”. By collecting one-year-temperature functions for several years or for different weather stations we obtain bunch (sample) of functions — . The questions arising by the statistical analysis of functional data are basically identical to the standard statistical analysis of univariate or multivariate objects. From the theoretical point, design of a stochastic model for functional data and statistical analysis of the functional data set can be taken often one-to-one from the conventional multivariate analysis. In fact the first method how to deal with the functional data is to discretize them and perform a standard multivariate analysis on the resulting random vectors. The aim of this chapter is to introduce the functional data analysis (FDA), discuss the practical usage and implementation of the FDA methods.

This chapter is organized as follows: Section 16.1 defines the basic mathematical and statistical framework for the FDA, Section 16.2 introduces the most popular implementation of functional data analysis — the functional basis expansion. In Section 16.4 we present the basic theory of the functional principal components, smoothed functional principal components and a practical application on the temperature data set of the Canadian Weather-stations.

Part II - Related Sciences | Pp. 305-327

Analysis of Failure Time with Microearthquakes Applications

Graciela Estévez-Pérez; Alejandro Quintela del Rio

Some gene expression data contain outliers and noise because of experiment error. In clustering, outliers and noise can result in false positives and false negatives. This motivates us to develop a weighting method to adjust the expression data such that the outlier and noise effect decrease, and hence result in a reduction in false positives and false negatives in clustering.

In this paper, we describe the weighting adjustment method and apply it to a yeast cell cycle data set. Based on the adjusted yeast cell cycle expression data, the hierarchical clustering method with a correlation coefficient measure performs better than that based on standardized expression data. The clustering method based on the adjusted data can group some functionally related genes together and yields higher quality clusters.

Part II - Related Sciences | Pp. 329-345

Polychotomous Regression: Application to Landcover Prediction

Frédéric Ferraty; Martin Paegelow; Pascal Sarda

This paper focuses on the Fréchet distance introduced by Maurice Fréchet in 1906 to account for the proximity between curves (Fréchet (1906)). The major limitation of this proximity measure is that it is based on the closeness of the values independently of the local trends. To alleviate this set back, we propose a dissimilarity index extending the above estimates to include the information of dependency between local trends. A synthetic dataset is generated to reproduce and show the limited conditions for the Fréchet distance. The proposed dissimilarity index is then compared with the Fréchet estimate and results illustrating its efficiency are reported.

Part II - Related Sciences | Pp. 347-356

The Application of Fuzzy Clustering to Satellite Images Data

Hizir Sofyan; Muzailin Affan; Khaled Bawahidi

Microarrays are part of a new class of biotechnologies which allow the monitoring of expression levels of thousands of genes simultaneously. In microarray data analysis, the comparison of gene expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large data sets. To identify genes with altered expression under two experimental conditions, we describe in this chapter a new nonparametric statistical approach. Specifically, we propose estimating the distributions of a t-type statistic and its null statistic, using kernel methods. A comparison of these two distributions by means of a likelihood ratio test can identify genes with significantly changed expressions. A method for the calculation of the cut-off point and the acceptance region is also derived. This methodology is applied to a leukemia data set containing expression levels of 7129 genes. The corresponding results are compared to the traditional -test and the normal mixture model.

Part II - Related Sciences | Pp. 357-366