Catálogo de publicaciones - libros

Compartir en
redes sociales


Bioinformatics Research and Applications: Third International Symposium, ISBRA 2007, Atlanta, GA, USA, May 7-10, 2007. Proceedings

Ion Măndoiu ; Alexander Zelikovsky (eds.)

En conferencia: 3º International Symposium on Bioinformatics Research and Applications (ISBRA) . Atlanta, GA, USA . May 7, 2007 - May 10, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-72030-0

ISBN electrónico

978-3-540-72031-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

GFBA: A Biclustering Algorithm for Discovering Value-Coherent Biclusters

Xubo Fei; Shiyong Lu; Horia F. Pop; Lily R. Liang

Clustering has been one of the most popular approaches used in gene expression data analysis. A clustering method is typically used to partition genes according to their similarity of expression under different conditions. However, it is often the case that some genes behave similarly only on a subset of conditions and their behavior is uncorrelated over the rest of the conditions. As traditional clustering methods will fail to identify such gene groups, the biclustering paradigm is introduced recently to overcome this limitation. In contrast to traditional clustering, a biclustering method produces biclusters, each of which identifies a set of genes and a set of conditions under which these genes behave similarly. The boundary of a bicluster is usually fuzzy in practice as genes and conditions can belong to multiple biclusters at the same time but with different membership degrees. However, to the best of our knowledge, a method that can discover fuzzy value-coherent biclusters is still missing. In this paper, (i) we propose a new fuzzy bicluster model for value-coherent biclusters; (ii) based on this model, we define an objective function whose minimum will characterize good fuzzy value-coherent biclusters; and (iii) we propose a genetic algorithm based method, Genetic Fuzzy Biclustering Algorithm (GFBA), to identify fuzzy value-coherent biclusters. Our experiments show that GFBA is very efficient in converging to the global optimum.

Pp. 1-12

Significance Analysis of Time-Course Gene Expression Profiles

Fang-Xiang Wu

This paper proposes a statistical method for significance analysis of time-course gene expression profiles, called SATgene. The SATgene models time-dependent gene expression profiles by autoregressive equations plus Gaussian noises, and time-independent gene expression profiles by constant numbers plus Gaussian noises. The statistical F-testing for regression analysis is used to calculate the confidence probability (significance level) that a time-course gene expression profile is not time-independent. The user can use this confidence probability to select significantly expressed genes from a time-course gene expression dataset. Both one synthetic dataset and one biological dataset were employed to evaluate the performance of the SATgene, compared to traditional gene selection methods: the pairwise R-fold change method and the standard deviation method. The results show that the SATgene outperforms the traditional methods.

Pp. 13-24

Data-Driven Smoothness Enhanced Variance Ratio Test to Unearth Responsive Genes in 0-Time Normalized Time-Course Microarray Data

Juntao Li; Jianhua Liu; R. Krishna Murthy Karuturi

Discovering responsive or differentially expressed genes in time-course microarray studies is an important step before further interpretation is carried out. The statistical challenge in this task is due to high prevalence of situations in which the following settings are true: (1) none or insufficiently fewer repeats; (2) 0-time or starting point reference; and, (3) undefined or unknown pattern of response. One simple and effective criterion that comes for rescue is smoothness criterion which assumes that a responsive gene exhibits a smooth pattern of response whereas a non-responsive gene exhibits a non-smooth response. Smoothness of response may be gauranteed if the expression is sufficiently sampled and it can be measured in terms of first order or serial autocorrelation of gene expression time-course using Durbin-Watson (DW) test. But, the DW-test ignores variance of the response which also plays an important role in the discovery of responsive genes while variance alone is not appropriate because of nonuniform noise variance across genes. Hence, we propose a novel which effectively combines smoothness and variance of gene expression time-course. We demonstrate that dSEVRaT does significantly better than DW-test as well as other tests on both simulated data and real data. Further, we demonstrate that dSEVRaT can address both 0-time normalized data and the other data equally well.

Pp. 25-36

Efficiently Finding the Most Parsimonious Phylogenetic Tree Via Linear Programming

Srinath Sridhar; Fumei Lam; Guy E. Blelloch; R. Ravi; Russell Schwartz

Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we introduce an integer linear programming formulation to find the most parsimonious phylogenetic tree from a set of binary variation data. The method uses a flow-based formulation that could use exponential numbers of variables and constraints in the worst case. The method has, however, proved extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods. The program solves several large mtDNA and Y-chromosome instances within a few seconds, giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality.

Pp. 37-48

A Multi-Stack Based Phylogenetic Tree Building Method

Róbert Busa-Fekete; András Kocsor; Csaba Bagyinka

Here we introduce a new Multi-Stack (MS) based phylogenetic tree building method. The Multi-Stack approach organizes the candidate subtrees (i.e. those having same number of leaves) into limited priority queues, always selecting the -best subtrees, according to their distance estimation error. Using the -best subtrees our method iteratively applies a novel subtree joining strategy to generate candidate higher level subtrees from the existing low-level ones. This new MS method uses the Constrained Least Squares Criteria (CLSC) which guarantees the non-negativity of the edge weights.

The method was evaluated on real-life datasets as well as on artificial data. Our empirical study consists of three very different biological domains, and the artificial tests were carried out by applying a proper model population generator which evolves the sequences according to the predetermined branching pattern of a randomly generated model tree. The MS method was compared with the Unweighted Pair Group Method (UPGMA), Neighbor-Joining (NJ), Maximum Likelihood (ML) and Fitch-Margoliash (FM) methods in terms of Branch Score Distance (BSD) and Distance Estimation Error (DEE). The results show clearly that the MS method can achieve improvements in building phylogenetic trees.

Pp. 49-60

A New Linear-Time Heuristic Algorithm for Computing the Parsimony Score of Phylogenetic Networks: Theoretical Bounds and Empirical Performance

Guohua Jin; Luay Nakhleh; Sagi Snir; Tamir Tuller

Phylogenies play a major role in representing the interrelationships among biological entities. Many methods for reconstructing and studying such phylogenies have been proposed, almost all of which assume that the underlying history of a given set of species can be represented by a binary tree. Although many biological processes can be effectively modeled and summarized in this fashion, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in , rather than trees, of relationships.

In a series of papers, we have extended the maximum parsimony (MP) criterion to phylogenetic networks, demonstrated its appropriateness, and established the intractability of the problem of scoring the parsimony of a phylogenetic network. In this work we show the hardness of approximation for the general case of the problem, devise a very fast (linear-time) heuristic algorithm for it, and implement it on simulated as well as biological data.

Pp. 61-72

A Bootstrap Correspondence Analysis for Factorial Microarray Experiments with Replications

Qihua Tan; Jesper Dahlgaard; Basem M. Abdallah; Werner Vach; Moustapha Kassem; Torben A. Kruse

Characterized by simultaneous measurement of the effects of experimental factors and their interactions, the economic and efficient factorial design is well accepted in microarray studies. To date, the only statistical method for analyzing microarray data obtained using factorial design has been the analysis of variance (ANOVA) model which is a gene by gene approach and relies on multiple assumptions. We introduce a multivariate approach, the bootstrap correspondence analysis (BCA), to identify and validate genes that are significantly regulated in factorial microarray experiments and show the advantages over the traditional method. Applications of our method to two microarray experiments using factorial have detected genes that are up or down-regulated due to the main experimental factors or as a result of interactions. Model comparison showed that although both BCA and ANOVA capture the main regulatory profiles in the data, our multivariate approach is more efficient in identifying genes with biological and functional significances.

Pp. 73-84

Clustering Algorithms Optimizer: A Framework for Large Datasets

Roy Varshavsky; David Horn; Michal Linial

Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a data-driven framework that includes two interrelated steps. The first one is SVD-based dimension reduction and the second is an automated tuning of the algorithm’s parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In this study we illustrate the effectiveness of this platform by incorporating the standard K-Means and the Quantum Clustering algorithms. The implementations are applied to several gene-expression benchmarks with significant success.

Pp. 85-96

Ranking Function Based on Higher Order Statistics (RF-HOS) for Two-Sample Microarray Experiments

Jahangheer Shaik; Mohammed Yeasin

This paper proposes a novel ranking function, called RFHOS by incorporating higher order cumulants into the ranking function for finding differentially expressed genes. Traditional ranking functions assume a data distribution (e.g., Normal) and use only first two cumulants for statistical significance analysis. Ranking functions based on second order statistics are often inadequate in ranking small sampled data (e.g., Microarray data). Also, relatively small number of samples in the data makes it hard to estimate the parameters accurately causing inaccuracies in ranking of the genes. The proposed ranking function is based on higher order statistics (RFHOS) that account for both the amplitude and the phase information by incorporating the HOS. The incorporation of HOS deviates from implicit symmetry assumed for Gaussian distribution. In this paper the performance of the RFHOS is compared against other well known ranking functions designed for ranking the genes in two sample microarray experiments.

Pp. 97-108

Searching for Recombinant Donors in a Phylogenetic Network of Serial Samples

Patricia Buendia; Giri Narasimhan

Determining the evolutionary history of a sampled sequence can become quite complex when multiple recombination events are part of its past. With at least five new recombination detection methods published in the last year, the growing list of over 40 methods suggests that this field is generating a lot of interest. In previous studies comparing recombination detection methods, the evaluation procedures did not measure how many recombinant sequences, breakpoints and donors were correctly identified. In this paper we will present the algorithm that scans a phylogenetic network and uses its edge lengths and topology to identify the parental/donor sequences and breakpoint positions for each query sequence. findings can be used to evaluate the output of recombination detection programs. may also assist in understanding how network size and complexity may shape recombination signals in a set of DNA sequences. The results may prove useful in the phylogenetic study of serially-sampled viral data with recombination events.

Pp. 109-120