Catálogo de publicaciones - libros

Compartir en
redes sociales


Algorithms in Bioinformatics: 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings

Rita Casadio ; Gene Myers (eds.)

En conferencia: 5º International Workshop on Algorithms in Bioinformatics (WABI) . Mallorca, Spain . October 3, 2005 - October 6, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29008-7

ISBN electrónico

978-3-540-31812-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Spectral Clustering Gene Ontology Terms to Group Genes by Function

Nora Speer; Christian Spieth; Andreas Zell

With the invention of biotechnological high throughput methods like DNA microarrays, biologists are capable of producing huge amounts of data. During the analysis of such data the need for a grouping of the genes according to their biological function arises. In this paper, we propose a method that provides such a grouping. As functional information, we use Gene Ontology terms. Our method clusters all GO terms present in a data set using a Spectral Clustering method. Then, mapping the genes back to their annotation, genes can be associated to one or more clusters of defined biological processes. We show that our Spectral Clustering method is capable of finding clusters with high inner cluster similarity.

Palabras clave: Gene Ontology; Directed Acyclic Graph; Semantic Similarity; Spectral Cluster; Spectral Domain.

Pp. 1-12

Dynamic De-Novo Prediction of microRNAs Associated with Cell Conditions: A Search Pruned by Expression

Chaya Ben-Zaken Zilberstein; Michal Ziv-Ukelson

Biological background: Plant microRNAs (miRNAs) are short RNA sequences that bind to target genes (mRNAs) and change their expression levels by redirecting their stabilities and marking them for cleavage. In Arabidopsis thaliana, microRNAs have been shown to regulate development and are believed to impact expression both under many conditions, such as stress and stimuli, as well as in various tissue types. Methods: mirXdeNovo is a novel prototype tool for the de-novo prediction of microRNAs associated with a given cell condition. The work of mirXdeNovo is composed of two off-line preprocessing stages, which are executed only once per genome in the database, and a dynamic online main stage, which is executed again and again for each newly obtained expression profile. During the preprocessing stages, a set of candidate microRNAs is computed for the genome of interest and then each microRNA is associated with a set of mRNAs which are its predicted targets. Then, during the main stage, given a newly obtained cell condition represented by a vector describing the expression level of each of the genes under this condition, the tool will efficiently compute the subset of microRNA candidates which are predicted to be active under this condition. The efficiency of the main stage is based in a novel branch-and-bound search of a tree constructed over the microRNA candidates and annotated with the corresponding predicted targets. This search exploits the monotonicity of the target prediction decision with respect to microRNA prefix size in order to apply an efficient yet admissible pruning. Our testing indicates that this pruning results in a substantial speed up over the naive search. Biological Results: We employed mirXdeNovo to conduct a study, using the plant Arabidopsis thaliana as our model organism and the subject of our ”hunt for microRNAs”. During the preprocessing stage, 2000 microRNA precursor candidates were extracted from the genome. Our study included the 3’UTRs of 5800 mRNAs. 380 different conditions were analyzed including various tissues and hormonal treatments. This led to the discovery of some interesting and statistically significant newly predicted microRNAs, annotated with their potential condition of activity.

Palabras clave: Cell Condition; microRNA Target; Prototype Tool; Stable Duplex; microRNA Precursor.

Pp. 13-26

Clustering Gene Expression Series with Prior Knowledge

Laurent Bréhélin

Microarrays allow monitoring of thousands of genes over time periods. Recently, gene clustering approaches specially adapted to deal with the time dependences of these data have been proposed. According to these methods, we investigate here how to use prior knowledge about the approximate profile of some classes to improve the classification result. We propose a Bayesian approach to this problem. A mixture model is used to describe and classify the data. The parameters of this model are constrained by a prior distribution defined with a new type of model that can express both our prior knowledge about the profile of classes of interest and the temporal nature of the data. Then, an EM algorithm estimates the parameters of the mixture model by maximizing its posterior probability. Supplementary Material: http://www.lirmm.fr/~brehelin/WABI05.pdf

Palabras clave: Prior Knowledge; Mixture Model; Hide Markov Model; Prior Distribution; Prior Probability.

Pp. 27-38

A Linear Time Biclustering Algorithm for Time Series Gene Expression Data

Sara C. Madeira; Arlindo L. Oliveira

Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated behaviors. In the most common settings, biclustering is an NP-complete problem, and heuristic approaches are used to obtain sub-optimal solutions using reasonable computational resources. In this work, we examine a particular setting of the problem, where we are concerned with finding biclusters in time series expression data. In this context, we are interested in finding biclusters with consecutive columns. For this particular version of the problem, we propose an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix. This complexity is obtained by manipulating a discretized version of the matrix and by using string processing techniques based on suffix trees. We report results in both synthetic and real data that show the effectiveness of the approach.

Palabras clave: Gene Expression Data; Internal Node; Linear Time Algorithm; Gene Expression Matrix; Path Label.

Pp. 39-52

Time-Window Analysis of Developmental Gene Expression Data with Multiple Genetic Backgrounds

Tamir Tuller; Efrat Oron; Erez Makavy; Daniel A. Chamovitz; Benny Chor

We study gene expression data, derived from developing tissues, under multiple genetic backgrounds (mutations). Motivated by the perceived behavior under these background, our main goals are to explore time windows questions : 1 Find a large set of genes that have a similar behavior in two different genetic backgrounds, under an appropriate time shift. 2 Find a model that approximates the dynamics of a gene network in developing tissues at different continuous time windows. We first explain the biological significance of these problems, and then explore their computational complexity, which ranges from polynomial to NP-hard. We developed algorithms and heuristics for the different problems, and ran those on synthetic and biological data, with very encouraging results.

Palabras clave: Time Window; Dissimilarity Measure; Gene Expression Dataset; Expensive Operation; Cop9 Signalosome.

Pp. 53-64

A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem

Gang Wu; Jia-Huai You; Guohui Lin

A lookahead branch-and-bound algorithm (LBnB) is proposed for solving the Maximum Quartet Consistency (MQC) Problem where the input is a complete set of quartets on the taxa and the goal is to construct a phylogeny that satisfies the maximum number of given quartets. It integrates a number of previous efforts on exact algorithms, heuristics, and approximation algorithms for the NP-hard MQC problem, and a few improved search techniques, especially a lookahead scheme, to solve the problem optimally. The theoretical running time analysis of the LBnB algorithm is provided, and an extensive simulation study has been well designed to compare the algorithm to previous existing exact algorithms and a best heuristic Hypercleaning. The experimental results on both synthetic and real datasets show that LBnB outperformed other exact algorithms, and it was competitive to Hypercleaning on many datasets.

Palabras clave: Dynamic Programming; Search Tree; Real Dataset; Exact Algorithm; Constraint Programming.

Pp. 65-76

Computing the Quartet Distance Between Trees of Arbitrary Degree

Chris Christiansen; Thomas Mailund; Christian N. S. Pedersen; Martin Randers

We present two algorithms for computing the quartet distance between trees of arbitrary degree. The quartet distance between two unrooted evolutionary trees is the number of quartets—sub-trees induced by four leaves—that differs between the trees. Previous algorithms focus on computing the quartet distance between binary trees. In this paper, we present two algorithms for computing the quartet distance between trees of arbitrary degrees. One in time O ( n ^3) and space O ( n ^2) and one in time O ( n ^2 d ^2) and space O ( n ^2), where n is the number of species and d is the maximal degree of the internal nodes of the trees. We experimentally compare the two algorithms and discuss possible directions for improving the running time further.

Palabras clave: Binary Tree; Internal Node; Directed Edge; Extended Tree; Original Tree.

Pp. 77-88

Using Semi-definite Programming to Enhance Supertree Resolvability

Shlomo Moran; Satish Rao; Sagi Snir

Supertree methods are used to construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Since accurate reconstruction methods are currently limited to a maximum of few dozens of taxa, the use of a supertree method in order to construct the tree of life is inevitable. Supertree methods are broadly divided according to the input trees: When the input trees are unrooted, the basic reconstruction unit is a quartet tree. In this case, the basic decision problem of whether there exists a tree that agrees with all quartets is NP-complete. On the other hand, when the input trees are rooted, the basic reconstruction unit is a rooted triplet, and the above decision problem has a polynomial time algorithm. However, when there is no tree which agrees with all triplets, it would be desirable to find the tree that agrees with the maximum number of triplets. However, this optimization problem was shown to be NP-hard. Current heuristic approaches perform mincut on a graph representing the triplets inconsistency and return a tree that is guaranteed to satisfy some required properties. In this work we present a different heuristic approach that guarantees the properties provided by the current methods and give experimental evidence that it significantly outperforms currently used methods. This method is based on divide and conquer where we use a semi-definite programming approach in the divide step.

Palabras clave: Connectivity Graph; Internal Vertex; Input Tree; Good Edge; Parsimony Score.

Pp. 89-103

An Efficient Reduction from Constrained to Unconstrained Maximum Agreement Subtree

Z. S. Peng; H. F. Ting

We propose and study the Maximum Constrained Agreement Subtree (MCAST) problem, which is a variant of the classical Maximum Agreement Subtree (MAST) problem. Our problem allows users to apply their domain knowledge to control the construction of the agreement subtrees in order to get better results. We show that the MCAST problem can be reduced to the MAST problem efficiently and thus we have algorithms for MCAST with running times matching the fastest known algorithms for MAST.

Palabras clave: Evolutionary Tree; Mast Problem; Bipartite Matchings; Maximum Agreement; Rooted Subtrees.

Pp. 104-115

Pattern Identification in Biogeography

Ganeshkumar Ganapathy; Barbara Goodson; Robert Jansen; Vijaya Ramachandran; Tandy Warnow

We develop and study two distance metrics for area cladograms (leaf-labeled trees where many leaves can share the same label): the edge contract-and-refine metric and the MAAC distance metric. We demonstrate that in contrast to phylogenies, the contract-and-refine distance between two area cladograms is not identical to the character encoding distance, and the latter is not a metric. We present a polynomial time algorithm to compute the MAAC distance, based on a polynomial-time algorithm for computing the largest common pruned subtree of two area cladograms. We also describe a linear time algorithm to decide if two area cladograms are identical.

Palabras clave: Mast Problem; Edit Distance; Ecological Zone; Maximum Agreement; Leaf Label.

Pp. 116-127