Catálogo de publicaciones - libros
Bioinformatics and Computational Biology Solutions Using R and Bioconductor
Robert Gentleman ; Vincent J. Carey ; Wolfgang Huber ; Rafael A. Irizarry ; Sandrine Dudoit (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-0-387-25146-2
ISBN electrónico
978-0-387-29362-2
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer Science+Business Media, Inc. 2005
Cobertura temática
Tabla de contenidos
Analysis Overview
V. J. Carey; R. Gentleman
Chapters in this part of the book address tasks common in the downstream analysis (after preprocessing) of high-dimensional data. The basic assumption is that preprocessing has led to a sample for which it is reasonable to make comparisons between samples or between feature-vectors assembled across samples. Most examples are based on microarray data, but the principles are much broader and apply to many other sources of data. In this overview, the basic concepts and assumptions are briefly sketched.
Palabras clave: Microarray Data; Expression Measure; Array Comparative Genomic Hybridization; cDNA Array; Computational Learning Theory.
Part III - Statistical analysis for genomic experiments | Pp. 183-187
Distance Measures in DNA Microarray Data Analysis
R. Gentleman; B. Ding; S. Dudoit; J. Ibrahim
Both supervised and unsupervised machine learning techniques require selection of a measure of distance between, or similarity among, the objects to be classified or clustered. Different measures of distance or similarity will lead to different machine learning performance. The appropriateness of a distance measure will typically depend on the types of features being used in the learning process. In this chapter, we examine the properties of distance measures in the context of the analysis of gene expression data from DNA microarray experiments. The feature vectors represent transcript levels, i.e., mRNA abundance or relative abundance, either across biological samples (if comparing genes) or across genes (if comparing samples). We consider different aspects of distances that help address the heterogeneity of the data and differences in interpretation depending on the source of the data (cDNA arrays versus short oligonucleotide arrays). Traditional measures, such as Euclidean and Manhattan distances as well as correlation-based distances, are considered. Other dissimilarity functions, which involve comparisons of distributions based on the Kullback-Leibler and mutual information criteria, are also examined.
Palabras clave: Mutual Information; Distance Measure; Linear Discriminant Analysis; Mahalanobis Distance; Expression Measure.
Part III - Statistical analysis for genomic experiments | Pp. 189-208
Cluster Analysis of Genomic Data
K. S. Pollard; M. J. van der Laan
We provide an overview of existing partitioning and hierarchical clustering algorithms in R. We discuss statistical issues and methods in choosing the number of clusters, the choice of clustering algorithm, and the choice of dissimilarity matrix. We also show how to visualize a clustering result by plotting ordered dissimilarity matrices in R. A new R package hopach, which implements the Hierarchical Ordered Partitioning And Collapsing Hybrid (HOPACH) algorithm, is presented (van der Laan and Pollard, 2003). The methodology is applied to a renal cell cancer gene expression data set.
Palabras clave: Cluster Algorithm; Cluster Result; Fuzzy Cluster; Dissimilarity Matrix; Hierarchical Cluster Algorithm.
Part III - Statistical analysis for genomic experiments | Pp. 209-228
Analysis of Differential Gene Expression Studies
D. Scholtens; A. von Heydebreck
In this chapter, we focus on the analysis of differential gene expression studies. Many microarray studies are designed to detect genes associated with different phenotypes, for example, the comparison of cancer tumors and normal cells. In some multifactor experiments, genetic networks are perturbed with various treatments to understand the effects of those treatments and their interactions with each other in the dynamic cellular network. For even the simplest experiments, investigators must consider several issues for appropriate gene selection. We discuss strategies for geneat-a-time analyses, nonspecific and meta-data driven prefiltering techniques, and commonly used test statistics for detecting differential expression. We show how these strategies and statistical tools are implemented and used in Bioconductor. We also demonstrate the use of factorial models for probing complex biological systems and highlight the importance of carefully coordinating known cellular behavior with statistical modeling to make biologically relevant inference from microarray studies.
Palabras clave: Gene Ontology; Outlier Detection; Limma Package; Single Outlier; Multiple Testing Procedure.
Part III - Statistical analysis for genomic experiments | Pp. 229-248
Multiple Testing Procedures: the multtest Package and Applications to Genomics
K. S. Pollard; S. Dudoit; M. J. van der Laan
The Bioconductor R package multtest implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates. The current version of multtest provides MTPs for tests concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models. Typical testing scenarios are illustrated by applying various MTPs implemented in multtest to the Acute Lymphoblastic Leukemia (ALL) data set of Chiaretti et al. (2004), with the aim of identifying genes whose expression measures are associated with (possibly censored) biological and clinical outcomes.
Palabras clave: Acute Lymphoblastic Leukemia; False Discovery Rate; Null Distribution; Expression Measure; Augmentation Procedure.
Part III - Statistical analysis for genomic experiments | Pp. 249-271
Machine Learning Concepts and Tools for Statistical Genomics
V. J. Carey
In this chapter, supervised machine learning methods are described in the context of microarray applications. The most widely used families of machine learning methods are described, along with various approaches to learner assessment. The Bioconductor interfaces to machine learning tools are described and illustrated. Key problems of model selection and interpretation are reviewed in examples.
Palabras clave: Machine Learning; Random Forest; Class Label; Machine Learning Method; Generalization Error.
Part III - Statistical analysis for genomic experiments | Pp. 273-292
Ensemble Methods of Computational Inference
T. Hothorn; M. Dettling; P. Bühlmann
Prognostic modeling of tumor classes, disease status, and survival time based on information obtained from gene expression profiling techniques is studied in this chapter. The basic principles of ensemble methods like bagging, random forests, and boosting are explained. The application of those methods to data from patients suffering acute lymphoblastic leukemia or renal cell cancer is illustrated. The problem of identifying the best method for a certain prediction task is addressed by means of benchmark experiments.
Palabras clave: Acute Lymphoblastic Leukemia; Random Forest; Bootstrap Sample; Renal Cell Cancer; Ensemble Method.
Part III - Statistical analysis for genomic experiments | Pp. 293-311
Browser-based Affymetrix Analysis and Annotation
C. A. Smith
webbioc is a CGI-based interface to Bioconductor methods for preprocessing and analyzing Affymetrix data. It wraps up the functionality of a number of Bioconductor packages into a consistent environment that can be deployed for use by small groups or large departments. Without ever seeing a command prompt, it will take the user from raw data to annotated lists of the most significantly differentially expressed genes. It will optionally make use of a back-end computer cluster for batch processing. This chapter will discuss the appropriate circumstances under which webbioc should be deployed and the pros and cons of using it versus the typical command line environment of R. Installation and configuration will be fully covered. Use of theWeb-based interface will be visually demonstrated. Finally, we will describe how to expand the interface by adding additional analysis modules.
Palabras clave: Bioconductor Package; Command Line Interface; Multiple Testing Procedure; Shell Script; Portable Batch System.
Part III - Statistical analysis for genomic experiments | Pp. 313-326
Introduction and Motivating Examples
R. Gentleman; W. Huber; V. J. Carey
Part IV - Graphs and networks | Pp. 329-336
Graphs
W. Huber; R. Gentleman; V. J. Carey
In this chapter, we describe and discuss various definitions and algorithms for graphs, their representation, and uses. The presentation is formal and we leave references to software and usage for the later chapters. Our goal is to use graphs to explore, navigate, represent, and model biological data. Hence, we must often specialize general concepts and ideas to the tasks at hand. Some of our motivation is taken from the area of social network analysis where many similar problems have been considered and there is a rich history of both concepts and methods.
Palabras clave: Bipartite Graph; Random Graph; Social Network Analysis; Edge Weight; Subgroup Find.
Part IV - Graphs and networks | Pp. 337-346