Catálogo de publicaciones - libros

Compartir en
redes sociales


Bioinformatics and Computational Biology Solutions Using R and Bioconductor

Robert Gentleman ; Vincent J. Carey ; Wolfgang Huber ; Rafael A. Irizarry ; Sandrine Dudoit (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-0-387-25146-2

ISBN electrónico

978-0-387-29362-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Science+Business Media, Inc. 2005

Tabla de contenidos

Preprocessing Overview

W. Huber; R. A. Irizarry; R. Gentleman

In this chapter, we give a brief overview of the tasks of microarray data preprocessing. There are a variety of microarray technology platforms in use, and each of them requires specific considerations. These will be described in detail by other chapters in this part of the book. This overview chapter describes relevant data structures, and provides with some broadly applicable theoretical background.

Palabras clave: Nominal Concentration; Probe Intensity; Stepwise Approach; Expression Matrix; Laboratory Information Management System.

Part I - Preprocessing data from genomic experiments | Pp. 3-12

Preprocessing High-density Oligonucleotide Arrays

B. M. Bolstad; R. A. Irizarry; L. Gautier; Z. Wu

High-density oligonucleotide expression arrays are a widely used microarray platform. Affymetrix GeneChip arrays dominate this market. An important distinction between the GeneChip and other technologies is that on GeneChips, multiple short probes are used to measure gene expression levels. This makes preprocessing particularly important when using this platform. This chapter begins by describing how to import probe-level data into the system and how these data can be examined using the facilities of the AffyBatch class. Then we will describe background adjustment, normalization, and summarization methods. Functionality for GeneChip probe-level data is provided by the affy, affyPLM, affycomp, gcrma, and affypdnn packages. All these tools are useful for preprocessing probe-level data stored in an AffyBatch object into expression-level data stored in an exprSet object. Because there are many competing methods for this preprocessing step, it is useful to have a way to assess the differences. In Bioconductor, this can be carried out using the affycomp package, which we discuss briefly.

Palabras clave: Perfect Match; Background Correction; Probe Intensity; Expression Measure; Quantile Normalization.

Part I - Preprocessing data from genomic experiments | Pp. 13-32

Quality Assessment of Affymetrix GeneChip Data

B. M. Bolstad; F. Collin; J. Brettschneider; K. Simpson; L. Cope; R. A. Irizarry; T.P. Speed

This chapter covers quality assessment for Affymetrix GeneChip data. The focus is on procedures available from the affy and affy-PLM packages. Initially some exploratory plots provided by the affy package, including images of the raw probe-level data, boxplots, histograms, and M vs A plots are examined. Next methods for assessing RNA degradation are discussed, specifically we compare the standard procedures recommended by Affymetrix and RNA degradation plots. Finally, we investigate how appropriate probe-level models yield good quality assessment tools. Chip pseudo-images of residuals and weights obtained from fitting robust linear models to the probe level data can be used as a visual tool for identifying artifacts on GeneChip microarrays. Other output from the probe-level modeling tools provide summary plots that may be used to identify aberrant chips.

Palabras clave: Probe Intensity; GeneChip Data; GeneChip Microarrays; Spatial Artifact; Robust Linear Model.

Part I - Preprocessing data from genomic experiments | Pp. 33-47

Preprocessing Two-Color Spotted Arrays

Y. H. Yang; A. C. Paquet

Preprocessing of two-color spotted arrays can be broadly divided in two main categories: quality assessment and normalization. In this chapter, we will focus on functions from the arrayQuality and marray packages that perform these tasks. The chapter begins by describing various data structures and tools available in these packages for reading and storing primary data from two-color spotted arrays. This is followed by descriptions of various exploratory tools such as MAplots, spatial plots, and boxplots to assess data quality of an array. Finally, algorithms available for performing appropriate normalization to remove sources of systematic variation are discussed. We will illustrate the above-mentioned functions using a case study.

Palabras clave: Background Intensity; Target Information; Diagnostic Plot; Limma Package; Color Palette.

Part I - Preprocessing data from genomic experiments | Pp. 49-69

Cell-Based Assays

W. Huber; F. Hahne

This chapter describes methods and tools for processing and visualizing data from high-throughput cell-based assays. Such assays are used to examine the contribution of genes to a biological process or phenotype (Carpenter and Sabatini, 2004). In principle, this can be done for any gene or combination of genes and for any biological process of interest. There is a variety of technologies, but all of them rely on the availability of genomic resources such as whole genome sequences, full-length cDNA libraries, siRNA collections; or on libraries of protein-specific ligands (compounds). Typically, all or at least large parts of the experimental procedures and data collection are automated. Cell-based assays offer the potential for clustering of genes based on their functional profiles (Piano et al., 2002) and epistatic analyses to elucidate complex genetic networks (Tong et al., 2004).

Palabras clave: Bivariate Normal Distribution; Epistatic Analysis; Rectangular Table; Complex Genetic Network; General Purpose Tool.

Part I - Preprocessing data from genomic experiments | Pp. 71-90

SELDI-TOF Mass Spectrometry Protein Data

X. Li; R. Gentleman; X. Lu; Q. Shi; J.D. Iglehart; L. Harris; A. Miron

The term proteome is used to denote the set of proteins encoded by a genome, and proteomics is the study of the expression and interactions of the proteins, which can depend on many factors such as cell type, treatment, tissue type, developmental state, and disease state. Conceptually, this is similar to the transcriptomics technologies discussed in Chapters 2-4; however, due to the more complicated chemistry of proteins, compared to RNA, the field has a different and diverse set of technologies and produces a wide range of specific challenges. Here we discuss one particular mass spectrometry technology.

Palabras clave: Peak Detection; Average Standard Deviation; Array Surface; Baseline Subtraction; Calibration Spectrum.

Part I - Preprocessing data from genomic experiments | Pp. 91-109

Meta-data Resources and Tools in Bioconductor

R. Gentleman; V. J. Carey; J. Zhang

Closing the gap between knowledge of sequence and knowledge of function requires aggressive, integrative use of biological research databases of many different types. For greatest effectiveness, analysis processes and interpretation of analytic results must be guided using relevant knowledge about the systems under investigation. However, this knowledge is often widely scattered and encoded in a variety of formats. In this section, we consider some of the different sources of biological information as well as the software tools that can be used to access these data and to integrate them into an analysis. Bioconductor provides tools for creating, distributing, and accessing annotation resources in ways that have been found effective in workflows for statistical analysis of microarray and other high-throughput assays.

Palabras clave: Gene Ontology; Hash Table; Enzyme Commission; Gene Ontology Annotation; Evidence Code.

Part II - Meta-data: biological annotation and visualization | Pp. 113-133

Querying On-line Resources

V. J. Carey; D. Temple Lang; J. Gentry; J. Zhang; R. Gentleman

Many different meta-data resources are available on-line, and several of these provide a Web services model for interactions. R and Bioconductor support the use of different technologies (including HTTP, SOAP, and XML-RPC) for accessing different Web services. In this chapter we describe the tools for accessing Web services and demonstrate their use in a number of examples. Our view is very similar to that proposed by Stein (2002), who emphasized Web services as the basic computational resource for bioinformatics. Well-designed Web services will play an essential role in solving many bioinformatic problems and R has the capability of playing many different roles, both on the client and the server side.

Palabras clave: Simple Object Access Protocol; Code Chunk; Entrez System; Basic Computational Resource; U95Av2 GeneChip.

Part II - Meta-data: biological annotation and visualization | Pp. 135-146

Interactive Outputs

C. A. Smith; W. Huber; R. Gentleman

In this chapter, we discuss creation of interactive outputs. We focus on the generation of reports, marked up in HTML, that link sets of genes with on-line resources, such as those supplied by the EBI or the NCBI, and which can be shared between different investigators. We discuss both the simple creation of these pages as well as some of the underlying software tools that can be used to construct new and different outputs. Although linked Web pages form the most commonly used outputs, we also consider some other tools that can be used to produce Web graphics that respond to the mouse in different ways.

Palabras clave: Annotation Data; Scalable Vector Graphic; Cascade Style Sheet; Annotation Package; Interactive Output.

Part II - Meta-data: biological annotation and visualization | Pp. 147-160

Visualizing Data

W. Huber; X. Li; R. Gentleman

Visualization is an essential part of exploring, analyzing, and reporting data. Visualizations are used in all chapters in this monograph and in most scientific papers. Here we review some of the recurring concepts in visualizing genomic and biological data. We discuss scatterplots to investigate the dependency between pairs of variables, heatmaps for the visualization of matrix-like data, the visualization of distance relationships between objects, and the visualization of data along genomic coordinates.

Palabras clave: Acute Myeloid Leukemia; Visualize Data; Visualization Method; Sense Strand; Antisense Strand.

Part II - Meta-data: biological annotation and visualization | Pp. 161-179