Catálogo de publicaciones - libros
Computational and Statistical Approaches to Genomics
Wei Zhang ; Ilya Shmulevich (eds.)
Second Edition.
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-0-387-26287-1
ISBN electrónico
978-0-387-26288-8
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer Science+Business Media, Inc. 2006
Cobertura temática
Tabla de contenidos
Microarray Image Analysis and Gene Expression Ratio Statistics
Yidong Chen; Edward R. Dougherty; Michael L. Bittner; Paul Meltzer; Jeffery Trent
Various image analysis issues have been addressed: target segmentation, background detection, target detection, and intensity measurement. Since microarray technology is still under development and image quality varies considerably, a robust and precise image analysis algorithm that reduces background interference and extracts precise signal intensity and expression ratios for each gene is critical to the success of further statistical analysis. The overall methodology discussed in this chapter has been developed and enhanced through five years of experience working with cDNA microarray images. It continues to be expanded and revised as new issues arise.
Pp. 1-19
Statistical Considerations in the Assessment of cDNA Microarray Data Obtained Using Amplification
Jing Wang; Kevin R. Coombes; Keith Baggerly; Limei Hu; Stanley R. Hamilton; Wei Zhang
Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.
Pp. 21-36
Sources of Variation in Microarray Experiments
Kathleen F. Kerr; Edward H. Leiter; Laurent Picard; Gary A. Churchill
This paper proposes a novel method for biometric identification, based on arm swing motions with a template update in order to improve long term stability. In our previous work, we studied arm swing identification and proposed a basic method to realize a personal identification function on mobile terminals. The method compares the acceleration signals of arm swing motion as individual characteristics, with the tolerant similarity measurement between two arm swing motions via DP-matching, which enables users to unlock a mobile terminal simply by swinging it. However, the method has a problem with long term stability. In other words, the arm swing motions of identical individuals tend to fluctuate among every trial. Furthermore, the difference between the enrolled and trial motions increases over time. Therefore in this paper, we propose an update approach to the enrollment template for DP-matching to solve this problem. We employ an efficient adaptive update method using a minimum route determination algorithm in DP-matching. Identification experiments involving 12 persons over 6 weeks confirm the proposed method achieves a superior equal error rate of 4.0% than the conventional method, which has an equal error rate of 14.7%.
Pp. 37-47
Studentizing Microarray Data
Keith A. Baggerly; Kevin R. Coombes; Kenneth R. Hess; David N. Stivers; Lynne V. Abruzzo; Wei Zhang
This paper proposes a novel method for biometric identification, based on arm swing motions with a template update in order to improve long term stability. In our previous work, we studied arm swing identification and proposed a basic method to realize a personal identification function on mobile terminals. The method compares the acceleration signals of arm swing motion as individual characteristics, with the tolerant similarity measurement between two arm swing motions via DP-matching, which enables users to unlock a mobile terminal simply by swinging it. However, the method has a problem with long term stability. In other words, the arm swing motions of identical individuals tend to fluctuate among every trial. Furthermore, the difference between the enrolled and trial motions increases over time. Therefore in this paper, we propose an update approach to the enrollment template for DP-matching to solve this problem. We employ an efficient adaptive update method using a minimum route determination algorithm in DP-matching. Identification experiments involving 12 persons over 6 weeks confirm the proposed method achieves a superior equal error rate of 4.0% than the conventional method, which has an equal error rate of 14.7%.
Pp. 49-59
Exploratory Clustering of Gene Expression Profiles of Mutated Yeast Strains
Merja Oja; Janne Nikkilä; Petri Törönen; Garry Wong; Eero Castrén; Samuel Kaski
Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.
Pp. 61-74
Selecting Informative Genes for Cancer Classification Using Gene Expression Data
Tatsuya Akutsu; Satoru Miyano
Boolean regression models are useful tools for various applications in nonlinear filtering, prediction, classification, and clustering. We discuss here the so-called normalized maximum likelihood (NML) models for these classes of models and discuss the connections with the minimum description length principle. Examples of discrimination of cancer types with these models for the Boolean regression demonstrate the efficiency of the method, especially its ability to select sets of feature genes for discrimination at error rates significantly smaller than those obtainable with other methods.
Pp. 75-88
Finding Functional Structures in Ggioma Gene-Expressions Using Gene Shaving Clustering and MDL Principle
Ciprian D. Giurcaneanu; Cristian Mircean; Gregory N. Fuller; Ioan Tabus
Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.
Pp. 89-118
Design Issues and Comparison of Methods for Microarray-Based Classification
Edward R. Dougherty; Sanju N. Attoor
Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.
Pp. 119-136
Analyzing Protein Sequences Using Signal Analysis Techniques
Karen M. Bloch; Gonzalo R. Arce
This chapter discusses the use of frequency and time-frequency signal processing methods for the analysis of protein sequence data. The amino acid sequence of a protein may be considered as a twenty symbol alphabet sequence, or it may be considered as a sequence of numerical values reflecting various physicochemical aspects of the amino acids such as hydrophobicity, bulkiness, or electron-ion interaction potential. When primary protein sequence information is mapped into numerical values, it is possible to treat the sequence as a signal and apply well known signal processing methods for analysis. These methods allow proteins to be clustered into functional families and can also lead to the identification of biologically active sights. This chapter discusses frequency and time-frequency methods for protein sequence analysis and illustrates these concepts using various protein families. In addition, a method for selecting appropriate numerical mappings of amino acids is introduced.
Pp. 137-161
Scale-Dependent Statistics of the Numbers of Transcripts and Protein Sequences Encoded in the Genome
Vladimir A. Kuznetsov
Except in situations where the amount of data is large in comparison to the number of variables, classifier design and error estimation involve subtle issues. This is especially so in applications such as cancer classification where there is no prior knowledge concerning the vector-label distributions involved. It is clearly prudent to try to achieve classification using small numbers of genes and rules of low complexity (low VC dimension), and to use cross-validation when it is not possible to obtain large independent samples for testing. Even when one uses a cross-validation method such as leave-one-out estimation, one is still confronted by the high variance of the estimator. In many applications, large samples are impossible owing to either cost or availability. Therefore, it is unlikely that a statistical approach alone will provide satisfactory results. Rather, one can use the results of classification analysis to discover gene sets that potentially provide good discrimination, and then focus attention on these. In the same vein, one can utilize the common engineering approach of integrating data with human knowledge to arrive at satisfactory systems.
Pp. 163-208