Catálogo de publicaciones - libros

Compartir en
redes sociales


Algorithms in Bioinformatics: 6th International Workshop, WABI 2006, Zurich, Switzerland, September 11-13, 2006, Proceedings

Philipp Bücher ; Bernard M. E. Moret (eds.)

En conferencia: 6º International Workshop on Algorithms in Bioinformatics (WABI) . Zurich, Switzerland . September 11, 2006 - September 13, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-39583-6

ISBN electrónico

978-3-540-39584-3

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Multiple Structure Alignment and Consensus Identification for Proteins

Jieping Ye; Ivaylo Ilinkin; Ravi Janardan; Adam Isom

An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus structure which captures common substructures present in the given proteins. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins. A distinguishing feature of the algorithm is that it works directly with the coordinate representation in three dimensions with no loss of spatial information, unlike some other multiple structure alignment algorithms that operate on sets of backbone vectors translated to the origin; hence, the algorithm is able to generate true alignments. Experimental studies on several protein datasets show that the algorithm is quite competitive with a well-known algorithm called CE-MC. A web-based tool has also been developed to facilitate remote access to the algorithm over the Internet.

Pp. 115-125

Procrastination Leads to Efficient Filtration for Local Multiple Alignment

Aaron E. Darling; Todd J. Treangen; Louxin Zhang; Carla Kuiken; Xavier Messeguer; Nicole T. Perna

We describe an efficient local multiple alignment filtration heuristic for identification of conserved regions in one or more DNA sequences. The method incorporates several novel ideas: (1) palindromic spaced seed patterns to match both DNA strands simultaneously, (2) seed extension (chaining) in order of decreasing multiplicity, and (3) procrastination when low multiplicity matches are encountered. The resulting local multiple alignments may have nucleotide substitutions and internal gaps as large as characters in any occurrence of the motif. The algorithm consumes memory and time where is the sequence length. We score the significance of multiple alignments using entropy-based motif scoring methods. We demonstrate the performance of our filtration method on Alu-repeat rich segments of the human genome and a large set of Hepatitis C virus genomes. The GPL implementation of our algorithm in C++ is called and is freely available from http://gel.ahabs.wisc.edu/procrastination

Pp. 126-137

Controlling Size When Aligning Multiple Genomic Sequences with Duplications

Minmei Hou; Piotr Berman; Louxin Zhang; Webb Miller

For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the -globin gene cluster.

Pp. 138-149

Reducing Distortion in Phylogenetic Networks

Daniel H. Huson; Mike A. Steel; Jim Whitfield

When multiple genes are used in a phylogenetic study, the result is often a collection of incompatible trees. Phylogenetic networks and super-networks can be employed to analyze and visualize the incompatible signals in such a data set. In many situations, it is important to have control over the amount of imcompatibility that is represented in a phylogenetic network, for example reducing noise by removing splits that do not recur among the source trees. Current algorithms for computing hybridization networks from trees are based on a combinatorial analysis of the arising set of splits, and are thus sensitive to false positive splits. Here, a filter is desirable that can identify and remove splits that are not compatible with a hybridization scenario. To address these issues, the concept of the distortion of a tree relative to a split is defined as a measure of how much the tree needs to be modified in order to accommodate the split, and some of its properties are investigated. We demonstrate the usefulness of the approach by recovering a plausible hybridization scenario for buttercups from a pair of gene trees that cannot be obtained by existing methods. In a second example, a set of seven gene trees from microgastrine braconid wasps is investigated using filtered networks. A user-friendly implementation of the method is provided as a plug-in for the program SplitsTree4.

Pp. 150-161

Imputing Supertrees and Supernetworks from Quartets

Barbara Hollan; Glenn Conner; Katharina T. Huber; Vincent Moulton

A contemporary and sometimes contentious problem in genome phylogeny is to reconcile the fact that an accurately reconstructed gene tree does not necessarily correspond to a species phylogeny. Thus, in practice, species phylogenies are commonly obtained by applying consensus tree/supertree methods to collections of gene trees. However, such methods can suppress true conflicts in gene trees arising from processes such as gene transfer and gene duplication/loss.

Pp. 162-162

A Unifying View of Genome Rearrangements

Anne Bergeron; Julia Mixtacki; Jens Stoye

Genome rearrangements have been modeled by a variety of operations such as inversions, translocations, fissions, fusions, transpositions and block interchanges. The operation, introduced by Yancopoulos , allows to model all the classical operations while simplifying the algorithms. In this paper we show a simple way to apply this operation to the most general type of genomes with a mixed collection of linear and circular chromosomes. We also describe a graph structure that allows simplifying the theory and distance computation considerably, as neither capping nor concatenation of the linear chromosomes are necessary.

Pp. 163-173

Efficient Sampling of Transpositions and Inverted Transpositions for Bayesian MCMC

István Miklós; Timothy Brooks Paige; Péter Ligeti

The evolutionary distance between two organisms can be determined by comparing the order of appearance of orthologous genes in their genomes. Above the numerous parsimony approaches that try to obtain the shortest sequence of rearrangement operations sorting one genome into the other, Bayesian Markov chain Monte Carlo methods have been introduced a few years ago. The computational time for convergence in the Markov chain is the product of the number of needed steps in the Markov chain and the computational time needed to perform one MCMC step. Therefore faster methods for making one MCMC step can reduce the mixing time of an MCMC in terms of computer running time.

We introduce two efficient algorithms for characterizing and sampling transpositions and inverted transpositions for Bayesian MCMC. The first algorithm characterizes the transpositions and inverted transpositions by the number of breakpoints the mutations change in the breakpoint graph, the second algorithm characterizes the mutations by the change in the number of cycles. Both algorithms run in () time, where is the size of the genome. This is a significant improvement compared with the so far available brute force method with () running time and memory usage.

Pp. 174-185

Alignment with Non-overlapping Inversions in ()-Time

Augusto F. Vellozo; Carlos E. R. Alves; Alair Pereira do Lago

Alignments of sequences are widely used for biological sequence comparisons. Only biological events like mutations, insertions and deletions are usually modeled and other biological events like inversions are not automatically detected by the usual alignment algorithms.

Alignment with inversions does not have a known polynomial algorithm and a simplification to the problem that considers only non-overlapping inversions were proposed by Schöniger and Waterman [20] in 1992 as well as a corresponding () solution. An improvement to an algorithm with ( log)-time complexity was announced in an extended abstract [1] and, in this present paper, we give an algorithm that solves this simplified problem in ()-time and ()-space in the more general framework of an edit graph.

Inversions have recently [4,7,13,17] been discovered to be very important in Comparative Genomics and Scherer et al. in 2005 [11] experimentally verified inversions that were found to be polymorphic in the human genome. Moreover, 10% of the 1,576 putative inversions reported overlap RefSeq genes in the human genome. We believe our new algorithms may open the possibility to more detailed studies of inversions on DNA sequences using exact optimization algorithms and we hope this may be particularly interesting if applied to regions around known rearrangements boundaries. Scherer report 29 such cases and prioritize them as candidates for biological and evolutionary studies.

Pp. 186-196

Accelerating Motif Discovery: Motif Matching on Parallel Hardware

Geir Kjetil Sandve; Magnar Nedland; Øyvind Bø Syrstad; Lars Andreas Eidsheim; Osman Abul; Finn Drabløs

Discovery of motifs in biological sequences is an important problem, and several computational methods have been developed to date. One of the main limitations of the established motif discovery methods is that the running time is prohibitive for very large data sets, such as upstream regions of large sets of cell-cycle regulated genes. Parallel versions have been developed for some of these methods, but this requires supercomputers or large computer clusters. Here, we propose and define an abstract module PAMM (Parallel Acceleration of Motif Matching) with motif matching on parallel hardware in mind. As a proof-of-concept, we provide a concrete implementation of our approach called MAMA. The implementation is based on the MEME algorithm, and uses an implementation of PAMM based on specialized hardware to accelerate motif matching. Running MAMA on a standard PC with specialized hardware on a single PCI-card compares favorably to running parallel MEME on a cluster of 12 computers.

Pp. 197-206

Segmenting Motifs in Protein-Protein Interface Surfaces

Jeff M. Phillips; Johannes Rudolph; Pankaj K. Agarwal

Protein-protein interactions form the basis for many intercellular events. In this paper we develop a tool for understanding the structure of these interactions. Specifically, we define a method for identifying a set of structural motifs on protein-protein interface surfaces. These motifs are secondary structures, akin to -helices and -sheets in protein structure; they describe how multiple residues form knob-into-hole features across the interface. These motifs are generated entirely from geometric properties and are easily annotated with additional biological data. We point to the use of these motifs in analyzing hotspot residues.

Pp. 207-218