Catálogo de publicaciones - libros

Compartir en
redes sociales


Algorithms in Bioinformatics: 6th International Workshop, WABI 2006, Zurich, Switzerland, September 11-13, 2006, Proceedings

Philipp Bücher ; Bernard M. E. Moret (eds.)

En conferencia: 6º International Workshop on Algorithms in Bioinformatics (WABI) . Zurich, Switzerland . September 11, 2006 - September 13, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-39583-6

ISBN electrónico

978-3-540-39584-3

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Protein Side-Chain Placement Through MAP Estimation and Problem-Size Reduction

Eun-Jong Hong; Tomás Lozano-Pérez

We present an exact method for the global minimum energy conformation (GMEC) search of protein side-chains. Our method consists of a branch-and-bound (B&B) framework and a new subproblem-pruning scheme. The pruning scheme consists of upper/lower-bounding methods and problem-size reduction techniques. We explore a way of using the tree-reweighted max-product algorithm for computing lower-bounds of the GMEC energy. The problem-size reduction techniques are necessary when the size of the subproblem is too large to rely on more accurate yet expensive bounding methods. The experimental results show our pruning scheme is effective and our B&B method exactly solves protein sequence design cases that are very hard to solve with the dead-end elimination.

Pp. 219-230

On the Complexity of the Crossing Contact Map Pattern Matching Problem

Shuai Cheng Li; Ming Li

Contact maps are concepts that are often used to represent structural information in molecular biology. The contact map pattern matching (CMPM) problem is to decide if a contact map (called the ) is a substructure of another contact map (called the ). In general, the problem is NP-hard, but when there are restrictions on the form of the pattern, the problem can, in some case, be solved in polynomial time. In particular, a polynomial time algorithm has been proposed [1] for the case when the patterns are so-called . In this paper we show that the problem is actually NP-hard, and show a flaw in the proposed polynomial-time algorithm. Through the same method, we also show that a related problem, namely, the , is NP-hard.

Pp. 231-241

A Fuzzy Dynamic Programming Approach to Predict RNA Secondary Structure

Dandan Song; Zhidong Deng

Due to the recent discovery of many RNAs with great diversity of functions, there is a resurgence of research in using RNA primary sequences to predict their secondary structures, due to the discovery of many new RNAs with a great diversity of functions. Among the proposed computational approaches, the well-known traditional approaches such as the Nussinov approach and the Zuker approach are essentially based on deterministic dynamic programming, whereas the stochastic context-free grammar (SCFG), the Bayesian estimation, and the partition function approaches are based on stochastic dynamic programming. In addition, heuristic approaches like artificial neural network and genetic algorithm have also been presented to address this challenging problem. But the prediction accuracy of these approaches is still far from perfect. Here based on the fuzzy sets theory, we propose a fuzzy dynamic programming approach to predict RNA secondary structure, which takes advantage of the fuzzy sets theory to reduce parameter sensitivity and import qualitative prior knowledge through fuzzy goal distribution. Based on the experiments performed on a dataset of tRNA sequences, it is shown that the prediction accuracy of our proposed approach is significantly improved compared with the BJK grammar model of the SCFG approach.

Pp. 242-251

Landscape Analysis for Protein-Folding Simulation in the H-P Model

Kathleen Steinhöfel; Alexandros Skaliotis; Andreas A. Albrecht

The hydrophobic-hydrophilic (H-P) model for protein folding was introduced by Dill et al.[7]. A problem instance consists of a sequence of amino acids, each labeled as either hydrophobic (H) or hydrophilic (P). The sequence must be placed on a 2D or 3D grid without overlapping, so that adjacent amino acids in the sequence remain adjacent in the grid. The goal is to minimize the energy, which in the simplest variation corresponds to maximizing the number of adjacent hydrophobic pairs. The protein folding problem in the H-P model is NP-hard in both 2D and 3D. Recently, Fu and Wang [10] proved an (()ln ) algorithm for -dimensional protein folding simulation in the HP-model. Our preliminary results on stochastic search applied to protein folding utilize complete move sets proposed by Lesh et al.[15] and Blazewicz et al.[4]. We obtain that after (/) Markov chain transitions, the probability to be in a minimum energy conformation is at least 1–, where is the maximum neighbourhood size and Γ is the maximum value of the minimum escape height from local minima of the underlying energy landscape. We note that the time bound depends on the specific instance. Based on [10] we conjecture Γ≤. We analyse experimentally on selected benchmark problems [15,21] for the 2D case.

Pp. 252-261

Rapid RNA Folding Including Pseudoknots Via Graph Tree Decomposition

Jizhen Zhao; Russell L. Malmberg; Liming Cai

The prediction of RNA secondary structure including pseudoknots remains a challenge due to the intractable computation of the sequence conformation from intriguing nucleotide interactions. Optimal algorithms often assume a restricted class for the predicted RNA structures and yet still require a high-degree polynomial time complexity, which is too expensive to use. Heuristic methods may yield time-efficient algorithms but they do not guarantee optimality of the predicted structure. This paper introduces a new and efficient algorithm for the prediction of RNA structure with pseudoknots for which the structure is not restricted. Novel prediction techniques are developed based on graph tree decomposition. In particular, stem overlapping relationships are defined with a graph, in which a specialized maximum independent set (IS) corresponds to the desired optimal structure. Such a graph is tree decomposable; dynamic programming over a tree decomposition of the graph leads to an efficient algorithm. The new algorithm is evaluated on a large number of RNA sequence sets taken from diverse resources. It demonstrates overall sensitivity and specificity that outperforms or is comparable with those of previous optimal and heuristic algorithms yet it requires significantly less time than other optimal algorithms.

Pp. 262-273

Flux-Based Topology-Based Similarity of Metabolic Genes

Oleg Rokhlenko; Tomer Shlomi; Roded Sharan; Eytan Ruppin; Ron Y. Pinter

We present an effectively computable measure of functional gene similarity that is based on metabolic gene activity across a variety of growth media. We applied this measure to 750 genes comprising the metabolic network of the budding yeast. Comparing the computed functional similarities to those obtained by using experimental expression data, we show that our computational method captures similarities beyond those that are obtained by the topological analysis of metabolic networks, thus revealing—at least in part—dynamic characteristics of gene function. We also suggest that network centrality partially explains functional centrality ( the number of functionally highly similar genes) by reporting a significant correlation between the two. Finally, we find that functional similarities between topologically distant genes occur between genes with different GO annotations.

Pp. 274-285

Combinatorial Methods for Disease Association Search and Susceptibility Prediction

Dumitru Brinza; Alexander Zelikovsky

Accessibility of high-throughput genotyping technology makes possible genome-wide association studies for common complex diseases. When dealing with common diseases, it is necessary to search and analyze multiple independent causes resulted from interactions of multiple genes scattered over the entire genome. This becomes computationally challenging since interaction even of pairs gene variations require checking more than 10 possibilities genome-wide. This paper first explores the problem of searching for for a given population sample of diseased and non-diseased individuals. A proposed fast complimentary greedy search finds multi-SNP combinations with non-trivially high association on real data. Exploiting the developed methods for searching associated risk and resistance factors, the paper addresses the . We first propose a relevant optimum clustering formulation and the model-fitting algorithm transforming clustering algorithms into susceptibility prediction algorithms. For three available real data sets (Crohn’s disease (Daly et al, 2001), autoimmune disorder (Ueda et al, 2003), and tick-borne encephalitis (Barkash et al, 2006)), the accuracies of the prediction based on the combinatorial search (respectively, 84%, 83%, and 89%) are higher by 15% compared to the accuracies of the best previously known methods. The prediction based on the complimentary greedy search almost matches the best accuracy but is much more scalable.

Pp. 286-297

Integer Linear Programs for Discovering Approximate Gene Clusters

Sven Rahmann; Gunnar W. Klau

We contribute to the discussion about the concept of by presenting a class of definitions that (1) can be written as integer linear programs (ILPs) and (2) allow several variations that include existing definitions such as common intervals, -windows, and max-gap clusters or gene teams. While the ILP formulation does not directly lead to optimal algorithms, it provides unprecedented generality and is competitive in practice for those cases where efficient algorithms are known. It allows for the first time a non-heuristic study of large approximate clusters in several genomes. Source code and datasets are available at .

Pp. 298-309

Approximation Algorithms for Bi-clustering Problems

Lusheng Wang; Yu Lin; Xiaowen Liu

One of the main goals in the analysis of microarray data is to identify groups of genes and groups of experimental conditions (including environments, individuals and tissues), that exhibit similar expression patterns. This is the so-called bi-clustering problem. In this paper, we consider two variations of the bi-clustering problem: the Consensus Submatrix Problem and the Bottleneck Submatrix Problem. The input of the problems contains a × matrix and integers and . The Consensus Submatrix Problem is to find a × submatrix with < and < and a consensus vector such that the sum of distance between all rows in the submatrix and the vector is minimized. The Bottleneck Submatrix Problem is to find a × submatrix with < and <, an integer and a center vector such that the distance between every row in the submatrix and the vector is at most and is minimized. We show that both problems are NP-hard and give randomized approximation algorithms for special cases of the two problems. Using standard techniques, we can derandomize the algorithms to get polynomial time approximation schemes for the two problems. To our knowledge, this is the first time that approximation algorithms with guaranteed ratio are presented for microarray analysis.

Pp. 310-320

Improving the Layout of Oligonucleotide Microarrays: Pivot Partitioning

Sérgio A. de Carvalho; Sven Rahmann

The production of commercial DNA microarrays is based on a light-directed chemical synthesis driven by a set of masks or micromirror arrays. Because of the natural properties of light and the ever shrinking feature sizes, the arrangement of the probes on the chip and the order in which their nucleotides are synthesized play an important role on the quality of the final product. We propose a new model called for evaluating the layout of microarrays. We also present a new algorithm, called Pivot Partitioning, that improves the quality of layouts, according to existing measures, by over 6% when compared to the best known algorithms.

Pp. 321-332