Catálogo de publicaciones - libros
Statistical Modeling and Analysis for Complex Data Problems
Pierre Duchesne ; Bruno RÉMillard (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-0-387-24554-6
ISBN electrónico
978-0-387-24555-3
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer Science+Business Media, Inc. 2005
Cobertura temática
Tabla de contenidos
Dependence Properties of Meta-Elliptical Distributions
Belkacem Abdous; Christian Genest; Bruno Rémillard
A distribution is said to be meta-elliptical if its associated copula is elliptical. Various properties of these copulas are critically reviewed in terms of association measures, concepts, and stochastic orderings, including tail dependence. Most results pertain to the bivariate case.
Pp. 1-15
The Statistical Significance of Palm Beach County
David Andrews; Andrey Feuerverger
This paper emphasizes certain issues and problems that arise when a statistical analysis must be undertaken on complex and evolving data, under tight constraints of time. In such circumstances, it typically is not possible to develop extensive or problem-specific methodology, yet an answer may be required almost immediately, and must be correct, defensible, understandable, and carry impact. It must also be able to withstand the test of comparison with analyses yet to come.
We illustrate these points by presenting the background to, and an analysis of, the State of Florida results in the 7 November, 2000 U.S. Presidential elections with emphasis on Palm Beach County. The analysis we discuss was carried out in the days immediately following that election. The statistical evidence strongly suggested that the use of the ‘butterfly’ ballot in Palm Beach County had resulted in a significant number of votes having been counted for presidential candidate Pat Buchanan which had not so been intended. The design of the ‘butterfly’ ballot suggests that many of these votes had likely been intended for the Democratic candidate Al Gore. This confusion was sufficient to affect the overall outcome of the 2000 U.S. Presidential election, conferring the office to George W. Bush, and this result is statistically significant.
Pp. 17-40
Bayesian Functional Estimation of Hazard Rates for Randomly Right Censored Data Using Fourier Series Methods
Jean-François Angers; Brenda MacGibbon
This paper discusses a Bayesian functional estimation method, based on Fourier series, for the estimation of the hazard rate fronm randomly right-censored data. A nonparametric approach, assuming that the hazard rate has no specific and prespecified parametric form, is used. A simulation study is also done to compare the proposed methodology with the estimators introduced in Antoniadis et al. (1999). The method is illustrated with a real data set consisting of survival data from bone marrow transplant patients.
Pp. 41-57
Conditions for the Validity of -Ratio Tests for Treatment and Carryover Effects in Crossover Designs
François Bellavance; Serge Tardif
Continuous data from crossover trials are often analysed using ordinary least squares with the assumption of independent errors. Because each experimental unit receives a sequence of treatment and repeated measurements are collected, it is more realistic to assume that the errors within an experimental unit are correlated. In this paper, we extend to crossover designs the conditions on the covariance structure of the errors, found by Huynh and Feldt (1970) for randomized block and split-plot designs, that will not invalidate the -ratio tests for treatment and carryover effects. We also show that results on optimal crossover designs remain valid under this more general structure of the covariance matrix.
Pp. 59-73
Bias in Estimating the Variance of -Fold Cross-Validation
Yoshua Bengio; Yves Grandvalet
Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the perforniance of different algorithms (in particular, their proposed algorithmn). In order to be able to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the very commonly used -fold cross-validation estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) unbiased estimator of the variance of -fold cross-validation, based on a single computation of the -fold cross-validation estimator. The analysis that accompanies this result is based on the eigen-decomposition of the covariance matrix of errors, which has only three different eigenvalues corresponding to three degrees of freedom of the matrix and three components of the total variance. This analysis helps to better understand the nature of the problem and how it can make naive estimators (that don't take into account the error correlations due to the overlap between training and test sets) grossly underestimate variance. This is confirmed by numerical experiments in which the three components of the variance are compared when the difficulty of the learning problem and the number of folds are varied.
Pp. 75-95
Effective Construction of Modified Histograms in Higher Dimensions
Alain Berlinet; Laurent Rouvière
Density estimation raises delicate problems in higher dimensions especially when strong convergence is required and data marginals can be highly correlated. Modified histograms have been introduced to circumvent the problem of low bin counts when convergence is considered in the sense of information divergence. These estimates are defined from some reference probability density and an associated partition which is defined in the univariate case fromni the quantiles of the reference density. Therefore, in the multivariate case, the definition of the partition causes an additional probleni related to the lack of total order. In this paper, we present a method for constructing modified multivariate histograms such that the corresponding partition is well adapted to the observed data. The approach is based on a data-driven coordinate system selected by cross-validation. We discuss the performance of our estimate with the help of a finite sample sirnulation study.
Pp. 97-119
On Robust Diagnostics at Individual Lags Using RA-ARX Estimators
Imad Bou-Hamad; Pierre Duchesne
The aim of this paper is to present robust individual tests in autoregressive models with exogenous variables. We derive the asymptotic distribution of the RA-ARX estimators introduced in Duchesne (2004a), following an approach similar to Bustos and Yohai (1986). In particular, we give the asymptotic covariance structure of the RA-ARX estimators. Using this result, we establish the asymptotic distribution of the robustified residual autocorrelations under the null hypothesis of adequacy, which is normal. Some simulation results are reported.
Pp. 121-140
Bootstrap Confidence Intervals for Periodic Preventive Replacement Policies
Pascal Croteau; Robert Cléroux; Christian Léger
This paper presents bootstrap confidence intervals for the optimal time and the minimum cost of a periodic preventive replacement policy. The bootstrap is applied to different parametric and nonparametric estimators of the renewal function and thus of the cost function. A simulation study shows that the bootstrap approach can prove useful in practice for some estimators.
Pp. 141-159
Statistics for Comparison of Two Independent cDNA Filter Microarrays
André Dabrowski
The great interest in gene expression in microbiology has led to the development of the cDNA array experiment. A very large number of genes can be tested simultaneously for their level of expression through these arrays. This gives rise to data where there are but a few observations for any one gene upon which to base a test of significance, and a very large number of these tests to carry out. Here we consider experiments with two filter microarrays; one from a control and one from a treated preparation. The question of interest is whether or not the largest observed gene-wise differences are indicative of true differences, or whether the results can be due to background variation. We develop a parametric approach distinct fromn those based on resampling methods, and obtain an hypothesis test applicable to this experiment that can be easily implemented in standard statistical software. We illustrate the approach on data on herpes-infected cells.
Pp. 161-178
Large Deviations for Interacting Processes in the Strong Topology
Donald A. Dawson; Pierre Del Moral
Strong large deviations principles for a general class of discrete generation and interacting particle systems are developed. The analysis is essentially conducted through an original projective interpretation of the -topology, combined with a powerful projective transfer result presented by the first author with J. Gärtner. These large deviations principles simplify and encompass the ones obtained in an earlier joint work of the second author with A. Guionnet. They are illustrated with simplified versions of McKean-Vlasov diffusions, and Boltzmann type collision models. We also describe the impact of this analysis on a recently developed class of genealogical and interacting particle interpretations of non linear Feynman-Kac-Schrödinger path measures.
Pp. 179-208