Catálogo de publicaciones - libros

Compartir en
redes sociales


Computational and Statistical Approaches to Genomics

Wei Zhang ; Ilya Shmulevich (eds.)

Second Edition.

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-0-387-26287-1

ISBN electrónico

978-0-387-26288-8

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Science+Business Media, Inc. 2006

Tabla de contenidos

Microarray Image Analysis and Gene Expression Ratio Statistics

Yidong Chen; Edward R. Dougherty; Michael L. Bittner; Paul Meltzer; Jeffery Trent

Various image analysis issues have been addressed: target segmentation, background detection, target detection, and intensity measurement. Since microarray technology is still under development and image quality varies considerably, a robust and precise image analysis algorithm that reduces background interference and extracts precise signal intensity and expression ratios for each gene is critical to the success of further statistical analysis. The overall methodology discussed in this chapter has been developed and enhanced through five years of experience working with cDNA microarray images. It continues to be expanded and revised as new issues arise.

Pp. 1-19

Statistical Considerations in the Assessment of cDNA Microarray Data Obtained Using Amplification

Jing Wang; Kevin R. Coombes; Keith Baggerly; Limei Hu; Stanley R. Hamilton; Wei Zhang

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

Pp. 21-36

Sources of Variation in Microarray Experiments

Kathleen F. Kerr; Edward H. Leiter; Laurent Picard; Gary A. Churchill

This paper proposes a novel method for biometric identification, based on arm swing motions with a template update in order to improve long term stability. In our previous work, we studied arm swing identification and proposed a basic method to realize a personal identification function on mobile terminals. The method compares the acceleration signals of arm swing motion as individual characteristics, with the tolerant similarity measurement between two arm swing motions via DP-matching, which enables users to unlock a mobile terminal simply by swinging it. However, the method has a problem with long term stability. In other words, the arm swing motions of identical individuals tend to fluctuate among every trial. Furthermore, the difference between the enrolled and trial motions increases over time. Therefore in this paper, we propose an update approach to the enrollment template for DP-matching to solve this problem. We employ an efficient adaptive update method using a minimum route determination algorithm in DP-matching. Identification experiments involving 12 persons over 6 weeks confirm the proposed method achieves a superior equal error rate of 4.0% than the conventional method, which has an equal error rate of 14.7%.

Pp. 37-47

Studentizing Microarray Data

Keith A. Baggerly; Kevin R. Coombes; Kenneth R. Hess; David N. Stivers; Lynne V. Abruzzo; Wei Zhang

This paper proposes a novel method for biometric identification, based on arm swing motions with a template update in order to improve long term stability. In our previous work, we studied arm swing identification and proposed a basic method to realize a personal identification function on mobile terminals. The method compares the acceleration signals of arm swing motion as individual characteristics, with the tolerant similarity measurement between two arm swing motions via DP-matching, which enables users to unlock a mobile terminal simply by swinging it. However, the method has a problem with long term stability. In other words, the arm swing motions of identical individuals tend to fluctuate among every trial. Furthermore, the difference between the enrolled and trial motions increases over time. Therefore in this paper, we propose an update approach to the enrollment template for DP-matching to solve this problem. We employ an efficient adaptive update method using a minimum route determination algorithm in DP-matching. Identification experiments involving 12 persons over 6 weeks confirm the proposed method achieves a superior equal error rate of 4.0% than the conventional method, which has an equal error rate of 14.7%.

Pp. 49-59

Exploratory Clustering of Gene Expression Profiles of Mutated Yeast Strains

Merja Oja; Janne Nikkilä; Petri Törönen; Garry Wong; Eero Castrén; Samuel Kaski

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

Pp. 61-74

Selecting Informative Genes for Cancer Classification Using Gene Expression Data

Tatsuya Akutsu; Satoru Miyano

Boolean regression models are useful tools for various applications in nonlinear filtering, prediction, classification, and clustering. We discuss here the so-called normalized maximum likelihood (NML) models for these classes of models and discuss the connections with the minimum description length principle. Examples of discrimination of cancer types with these models for the Boolean regression demonstrate the efficiency of the method, especially its ability to select sets of feature genes for discrimination at error rates significantly smaller than those obtainable with other methods.

Pp. 75-88

Finding Functional Structures in Ggioma Gene-Expressions Using Gene Shaving Clustering and MDL Principle

Ciprian D. Giurcaneanu; Cristian Mircean; Gregory N. Fuller; Ioan Tabus

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

Pp. 89-118

Design Issues and Comparison of Methods for Microarray-Based Classification

Edward R. Dougherty; Sanju N. Attoor

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

Pp. 119-136

Analyzing Protein Sequences Using Signal Analysis Techniques

Karen M. Bloch; Gonzalo R. Arce

This chapter discusses the use of frequency and time-frequency signal processing methods for the analysis of protein sequence data. The amino acid sequence of a protein may be considered as a twenty symbol alphabet sequence, or it may be considered as a sequence of numerical values reflecting various physicochemical aspects of the amino acids such as hydrophobicity, bulkiness, or electron-ion interaction potential. When primary protein sequence information is mapped into numerical values, it is possible to treat the sequence as a signal and apply well known signal processing methods for analysis. These methods allow proteins to be clustered into functional families and can also lead to the identification of biologically active sights. This chapter discusses frequency and time-frequency methods for protein sequence analysis and illustrates these concepts using various protein families. In addition, a method for selecting appropriate numerical mappings of amino acids is introduced.

Pp. 137-161

Scale-Dependent Statistics of the Numbers of Transcripts and Protein Sequences Encoded in the Genome

Vladimir A. Kuznetsov

Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.

Pp. 163-208