Catálogo de publicaciones - libros

Compartir en
redes sociales


Computational and Statistical Approaches to Genomics

Wei Zhang ; Ilya Shmulevich (eds.)

Second Edition.

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-0-387-26287-1

ISBN electrónico

978-0-387-26288-8

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Science+Business Media, Inc. 2006

Tabla de contenidos

Statistical Methods in Serial Analysis of Gene Expression (Sage)

Ricardo Z. N. Vêncio; Helena Brentani

In this chapter we aimed to give a guide to the state-of-art in statistical methods for SAGE analysis. We just scratch some issues for the sake of being focused in differential expression detection problems, but we hope that main ideas could be useful to track the original literature. We saw that estimation of a tag abundance could not be simpler than observed counts divided by sequenced total, but rather can receive sophisticated treatments such as multinomial estimation, correction of potential sequencing errors, knowledge incorporation, and so on. Given an (assumed) error-corrected data set, one could search for differentially expressed tags among conditions. Several methods for this were mentioned, but we stress the importance of using biological replication designs to capture general information. Finally, we want to point out that only accumulation of experimental data in public databases, with biological replication, and use of good statistics could improve usefulness of SAGE, MPSS or EST counting data in general terms, helping to elucidate basic/applied gene expression questions.

Pp. 209-233

Normalized Maximum Likelihood Models for Boolean Regression with Application to Prediction and Classification in Genomics

Ioan Tabus; Rissanen Jorma; Jaakko Astola

Boolean regression models are useful tools for various applications in nonlinear filtering, prediction, classification, and clustering. We discuss here the so-called normalized maximum likelihood (NML) models for these classes of models and discuss the connections with the minimum description length principle. Examples of discrimination of cancer types with these models for the Boolean regression demonstrate the efficiency of the method, especially its ability to select sets of feature genes for discrimination at error rates significantly smaller than those obtainable with other methods.

Pp. 235-258

Inference of Genetic Regulatory Networks via Best-Fit Extensions

Harri Lähdesmäki; Ilya Shmulevich; Olli Yli-Harja; Jaakko Astola

The ability to efficiently infer the structure of Boolean networks has immense potential for understanding the regulatory interactions in real genetic networks. We have considered a learning strategy that is well suited for situations in which inconsistencies in observations are likely to occur. This strategy produces a Boolean network that makes as few misclassifications as possible and is a generalization of the well-known Consistency Problem. We have focused on the computational complexity of this problem. It turns out that for many function classes, the Best-Fit Extension Problem for Boolean networks is polynomial-time solvable, including those networks having bounded indegree and those in which no assumptions whatsoever about the functions are made. This promising result provides motivation for developing efficient algorithms for inferring network structures from gene expression data.

Pp. 259-278

Regularization and Noise Injection for Improving Genetic Network Models

Eugene van Someren; Lodewyk Wessels; Marcel Reinders; Eric Backer

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

Pp. 279-295

Parallel Computation and Visualization Tools for Codetermination Analysis of Multivariate Gene Expression Relations

Edward B. Suh; Edward R. Dougherty; Seungchan Kim; Michael L. Bittner; Yidong Chen; Daniel E. Russ; Robert L. Martino

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

Pp. 297-310

Single Nucleotide Polymorphisms and Their Applications

Rudy Guerra; Zhaoxia Yu

SNPs are highly abundant in the human genome, explaining most of sequence variation. This makes them a valuable resource for population genetics, evolution, and gene mapping. In this article we have given an overview of the major issues arising in their application to haplotype and haplotype block estimation and genetic association. The discussion should make clear that many statistical methods have been developed for these problems, but there is still much more to understand about the relative merits of the competing methods. Perhaps more important is further understanding of the practical utility of the methods.

Pp. 311-349

The Contribution of Alternative Transcription and Alternative Splicing to the Complexity of Mammalian Transcriptomes

Mihaela Zavolan; Christian Schönbach

This paper proposes a novel method for biometric identification, based on arm swing motions with a template update in order to improve long term stability. In our previous work, we studied arm swing identification and proposed a basic method to realize a personal identification function on mobile terminals. The method compares the acceleration signals of arm swing motion as individual characteristics, with the tolerant similarity measurement between two arm swing motions via DP-matching, which enables users to unlock a mobile terminal simply by swinging it. However, the method has a problem with long term stability. In other words, the arm swing motions of identical individuals tend to fluctuate among every trial. Furthermore, the difference between the enrolled and trial motions increases over time. Therefore in this paper, we propose an update approach to the enrollment template for DP-matching to solve this problem. We employ an efficient adaptive update method using a minimum route determination algorithm in DP-matching. Identification experiments involving 12 persons over 6 weeks confirm the proposed method achieves a superior equal error rate of 4.0% than the conventional method, which has an equal error rate of 14.7%.

Pp. 351-380

Computational Imaging, and Statistical Analysis of Tissue Microarrays: Quantitative Automated Analysis of Tissue Microarrays

Aaron J. Berger; Robert L. Camp; David L. Rimm

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

Pp. 381-403