Catálogo de publicaciones - libros

Compartir en
redes sociales


Research in Computational Molecular Biology: 11th Annual International Conference, RECOMB 2007, Oakland, CA, USA, April 21-25, 2007. Proceedings

Terry Speed ; Haiyan Huang (eds.)

En conferencia: 11º Annual International Conference on Research in Computational Molecular Biology (RECOMB) . Oakland, CA, USA . April 21, 2007 - April 25, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-71680-8

ISBN electrónico

978-3-540-71681-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Connectedness Profiles in Protein Networks for the Analysis of Gene Expression Data

Joël Pradines; Vlado Dančík; Alan Ruttenberg; Victor Farutin

Knowledge about protein function is often encoded in the form of large and sparse undirected graphs where vertices are proteins and edges represent their functional relationships. One elementary task in the computational utilization of these networks is that of quantifying the density of edges, referred to as connectedness, inside a prescribed protein set. For instance, many functional modules can be identified because of their high connectedness. Since individual proteins can have very different numbers of interactions, a connectedness measure should be well-normalized for vertex degree. Namely, its distribution across random sets of vertices should not be affected when these sets are biased for hubs. We show that such degree-robustness can be achieved via an analytical framework based on a model of random graph with given expected degrees. We also introduce the concept of connectedness profile, which characterizes the relation between adjacency in a graph and a prescribed order of its vertices. A straightforward application to gene expression data and protein networks is the identification of tissue-specific functional modules or cellular processes perturbed in an experiment. The strength of the mapping between gene-expression score and interaction in the network is measured by the area of the connectedness profile. Deriving the distribution of this area under the random graph enables us to define degree-robust statistics that can be computed in , being the network size. These statistics can identify groups of microarray experiments that are pathway-coherent, and more generally, vertex attributes that relate to adjacency in a graph.

Pp. 296-310

Multivariate Segmentation in the Analysis of Transcription Tiling Array Data

Antonio Piccolboni

Tiling DNA microarrays extend current microarray technology by probing the non-repeat portion of a genome at regular intervals in an unbiased fashion. A fundamental problem in the analysis of these data is the detection of genomic regions that are differentially transcribed across multiple conditions. We propose a linear time algorithm based on segmentation techniques and linear modeling that can work at a user-selected false discovery rate. It also attains a four-fold sensitivity gain over the only competing algorithm when applied to a whole genome transcription data set spanning the embryonic development of .

Pp. 311-324

A Bayesian Model That Links Microarray mRNA Measurements to Mass Spectrometry Protein Measurements

Anitha Kannan; Andrew Emili; Brendan J. Frey

An important problem in biology is to understand correspondences between mRNA microarray levels and mass spectrometry peptide counts. Recently, a compendium of mRNA expression levels and protein abundances were released for the entire genome of the laboratory mouse, . The availability of these two data sets facilitate using machine learning methods to automatically infer plausible correspondences between the gene products. Knowing these correspondences can be helpful either for predicting protein abundances from microarray data or as an independent source of information that can be used for learning richer models such as regulatory networks. We propose a probabilistic model that relates protein abundances to mRNA expression levels. Using cross-mapped data from the above-mentioned studies, we learn the model and then score the genes for their strength of relationship by performing probabilistic inference in the learned model. While we gave a simplified outline of our technique in a publication aimed at biologists (Cell 2006), in this paper, we give a complete description of the Bayesian model and the computational technique used to perform inference. In addition, we demonstrate that the Bayesian technique achieves mappings with higher statistical significance, compared to standard linear regression and a maximum likelihood version of the proposed model.

Pp. 325-338

Rearrangements in Genomes with Centromeres Part I: Translocations

Michal Ozery-Flato; Ron Shamir

A is a special region in the chromosome that plays a vital role during cell division. Every new chromosome created by a genome rearrangement event must have a centromere in order to survive. This constraint has been ignored in the computational modeling and analysis of genome rearrangements to date. Unlike genes, the different centromeres are indistinguishable, they have no orientation, and only their location is known. A prevalent rearrangement event in the evolution of multi-chromosomal species is translocation, i.e., the exchange of tails between two chromosomes. A translocation may create a chromosome with no centromere in it. In this paper we study for the first time centromeres-aware genome rearrangements. We present a polynomial time algorithm for computing a shortest sequence of translocations transforming one genome into the other, where all of the intermediate chromosomes must contain centromeres. We view this as a first step towards analysis of more general genome rearrangement models that take centromeres into consideration.

Pp. 339-353

Identification of Deletion Polymorphisms from Haplotypes

Erik Corona; Benjamin Raphael; Eleazar Eskin

Numerous efforts are underway to catalog genetic variation in human populations. While the majority of studies of genetic variation have focused on single base pair differences between individuals, i.e. single nucleotide polymorphisms (SNPs), several recent studies have demonstrated that larger scale structural variation including copy number polymorphisms and inversion polymorphisms are also common. However, direct techniques for detection and validation of structural variants are generally much more expensive than detection and validation of SNPs. For some types of structural variation, in particular deletions, the polymorphism produces a distinct signature in the SNP data. In this paper, we describe a new probabilistic method for detecting deletion polymorphisms from SNP data. The key idea in our method is that we estimate the frequency of the haplotypes in a region of the genome both with and without the possibility of a deletion in the region and apply a generalized likelihood ratio test to assess the significance of a deletion. Application of our method to the HapMap Phase I data revealed 319 candidate deletions, 142 of these overlap with variants identified in earlier studies, while 177 are novel. Using Phase II HapMap data we predict 6730 deletions.

Pp. 354-365

Free Energy Estimates of All-Atom Protein Structures Using Generalized Belief Propagation

Hetunandan Kamisetty; Eric P. Xing; Christopher J. Langmead

We present a technique for approximating the free energy of protein structures using Generalized Belief Propagation (GBP). The accuracy and utility of these estimates are then demonstrated in two different application domains. First, we show that the entropy component of our free energy estimates can be useful in distinguishing native protein structures from — structures with similar internal energy to that of the native structure, but otherwise incorrect. Our method is able to correctly identify the native fold from among a set of decoys with 87.5% accuracy over a total of 48 different immunoglobin folds. The remaining 12.5% of native structures are ranked among the top 4 of all structures. Second, we show that our estimates of upon mutation upon mutation for three different data sets have linear correlations between 0.63-0.70 with experimental values and statistically significant p-values. Together, these results suggests that GBP is an effective means for computing free energy in all-atom models of protein structures. GBP is also efficient, taking a few minutes to run on a typical sized protein, further suggesting that GBP may be an attractive alternative to more costly molecular dynamic simulations for some tasks.

Pp. 366-380

Minimizing and Learning Energy Functions for Side-Chain Prediction

Chen Yanover; Ora Schueler-Furman; Yair Weiss

Side-chain prediction is an important subproblem of the general protein folding problem. Despite much progress in side-chain prediction, performance is far from satisfactory. As an example, the ROSETTA program that uses simulated annealing to select the minimum energy conformations, correctly predicts the first two side-chain angles for approximately 72% of the buried residues in a standard data set. Is further improvement more likely to come from better search methods, or from better energy functions? Given that exact minimization of the energy is NP hard, it is difficult to get a systematic answer to this question.

In this paper, we present a novel search method and a novel method for learning energy functions from training data that are both based on Tree Reweighted Belief Propagation (TRBP). We find that TRBP can find the optimum of the ROSETTA energy function in a few minutes of computation for approximately 85% of the proteins in a standard benchmark set. TRBP can also effectively bound the partition function which enables using the Conditional Random Fields (CRF) framework for learning.

Interestingly, finding the global minimum does not significantly improve side-chain prediction for an energy function based on ROSETTA’s default energy terms (less than 0.1%), while learning new weights gives a significant boost from 72% to 78%. Using a recently modified ROSETTA energy function with a softer Lennard-Jones repulsive term, the global optimum does improve prediction accuracy from 77% to 78%. Here again, learning new weights improves side-chain modeling even further to 80%. Finally, the highest accuracy (82.6%) is obtained using an extended rotamer library and CRF learned weights. Our results suggest that combining machine learning with approximate inference can improve the state-of-the-art in side-chain prediction.

Pp. 381-395

Protein Conformational Flexibility Analysis with Noisy Data

Anshul Nigham; David Hsu

Protein conformational changes play a critical role in biological functions such as ligand-protein and protein-protein interactions. Due to the noise in structural data, determining salient conformational changes reliably and efficiently is a challenging problem. This paper presents an efficient algorithm for analyzing protein conformational changes, using noisy data. It applies a statistical flexibility test to all contiguous fragments of a protein and combines the information from these tests to compute a consensus flexibility measure for each residue of the protein. We tested the algorithm, using data from the Protein Data Bank and the Macromolecular Movements Database. The results show that our algorithm can reliably detect different types of salient conformational changes, including well-known examples such as hinge well as the flap motion of HIV-1 proteaseThe software implementing software implementing our algorithm is available at .

Pp. 396-411

Deterministic Pharmacophore Detection Via Multiple Flexible Alignment of Drug-Like Molecules

Yuval Inbar; Dina Schneidman-Duhovny; Oranit Dror; Ruth Nussinov; Haim J. Wolfson

We present a novel highly efficient method for the detection of a pharmacophore from a set of ligands/drugs that interact with a target receptor. A pharmacophore is a spatial arrangement of physico-chemical features in a ligand that is responsible for the interaction with a specific receptor. In the absence of a known 3D receptor structure, a pharmacophore can be identified from a multiple structural alignment of the ligand molecules. The key advantages of the presented algorithm are: (a) its ability to multiply align flexible ligands in a deterministic manner, (b) its ability to focus on subsets of the input ligands, which may share a large common substructure, resulting in the detection of both outlier molecules and alternative binding modes, and (c) its computational efficiency, which allows to detect pharmacophores shared by a large number of molecules on a standard PC. The algorithm was extensively tested on a dataset of almost 80 ligands acting on 12 different receptors. The results, which were achieved using a standard default parameter set, were consistent with reference pharmacophores that were derived from the bound ligand-receptor complexes. The pharmacophores detected by the algorithm are expected to be a key component in the discovery of new leads by screening large drug-like molecule databases.

Pp. 412-429

Design of Compact, Universal DNA Microarrays for Protein Binding Microarray Experiments

Anthony A. Philippakis; Aaron M. Qureshi; Michael F. Berger; Martha L. Bulyk

Our group has recently developed a compact, universal protein binding microarray (PBM) that can be used to determine the binding preferences of transcription factors (TFs) [1]. This design represents all possible sequence variants of a given length (i.e., all -mers) on a single array, allowing a complete characterization of the binding specificities of a given TF. Here, we present the mathematical foundations of this design based on de Bruijn sequences generated by linear feedback shift registers. We show that these sequences represent the maximum number of variants for any given set of array dimensions (i.e., number of spots and spot lengths), while also exhibiting desirable pseudo-randomness properties. Moreover, de Bruijn sequences can be selected that represent gapped sequence patterns, further increasing the coverage of the array. This design yields a powerful experimental platform that allows the binding preferences of TFs to be determined with unprecedented resolution.

Pp. 430-443