Catálogo de publicaciones - libros

Compartir en
redes sociales


Bioinformatics Research and Applications: Third International Symposium, ISBRA 2007, Atlanta, GA, USA, May 7-10, 2007. Proceedings

Ion Măndoiu ; Alexander Zelikovsky (eds.)

En conferencia: 3º International Symposium on Bioinformatics Research and Applications (ISBRA) . Atlanta, GA, USA . May 7, 2007 - May 10, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-72030-0

ISBN electrónico

978-3-540-72031-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Using Multi Level Nearest Neighbor Classifiers for G-Protein Coupled Receptor Sub-families Prediction

Mudassir Fayyaz; Asifullah Khan; Adnan Mujahid; Alex Kavokin

Prediction based on the hydrophobicity of the protein yields potentially good classification rate as compared to the other compositions for G-Proteins coupled receptor (GPCR’s) families and their respective sub-families. In the current study, we make use of the hydrophobicity of the proteins in order to obtain a fourier spectrum of the protein sequence, which is then used for classification purpose. The classification of 17 GPCR subfamilies is based on Nearest Neighbor (NN) method, which is employed at two levels. At level-1 classification, the GPCR super-family is recognized and at level-2, the respective sub-families for the predicted super-family are classified. As against Support Vector Machine (SVM), NN approach has shown better performance using both jackknife and independent data set testing. The results are formulated using three performance measures, the Mathew’s Correlation Coefficient (MCC), overall accuracy (ACC) and reliability (R) on both training and independent data sets. Comparison of our results is carried out with the overall class accuracies obtained for super-families using existing technique. The multilevel classifier has shown promising performance and has achieved overall ACC and MCC of 97.02% and 0.95 using jackknife test, and 87.50 % and 0.85 for independent data set test respectively.

Pp. 564-576

Invited Talk: Ab Initio Gene Finding Engines: What Is Under the Hood

Mark Borodovsky

I will revisit the statistical and computational foundations of ab initio gene finding algorithms that best fit current challenges in analysis of genomic data. With the number of new sequenced genomes rapidly growing, there is a need to generate high quality gene annotations in less time.

Pp. 577-577

Reconstruction of 3D Structures from Protein Contact Maps

Marco Vassura; Luciano Margara; Filippo Medri; Pietro di Lena; Piero Fariselli; Rita Casadio

Proteins are large organic compounds made of amino acids arranged in a linear chain (primary structure). Most proteins fold into unique three-dimensional (3D) structures called interchangeably tertiary, folded, or native structures. Discovering the tertiary structure of a protein (Protein Folding Problem) can provide important clues about how the protein performs its function and it is one of the most important problems in Bioinformatics. A contact map of a given protein is a binary matrix such that = 1 iff the physical distance between amino acids and in the native structure is less than or equal to a pre-assigned threshold . The contact map of each protein is a distinctive signature of its folded structure. Predicting the tertiary structure of a protein directly from its primary structure is a very complex and still unsolved problem. An alternative and probably more feasible approach is to predict the contact map of a protein from its primary structure and then to compute the tertiary structure starting from the predicted contact map. This last problem has been recently proven to be NP-Hard [6]. In this paper we give a heuristic method that is able to reconstruct in a few seconds a 3D model that exactly matches the target contact map. We wish to emphasize that our method computes an exact model for the protein independently of the contact map threshold. To our knowledge, our method outperforms all other techniques in the literature [5,10,17,19] both for the quality of the provided solutions and for the running times. Our experimental results are obtained on a non-redundant data set consisting of 1760 proteins which is by far the largest benchmark set used so far. Average running times range from 3 to 15 seconds depending on the contact map threshold and on the size of the protein. Repeated applications of our method (starting from randomly chosen distinct initial solutions) show that the same contact map may admit (depending on the threshold) quite different 3D models. Extensive experimental results show that contact map thresholds ranging from 10 to 18 Ångstrom allow to reconstruct 3D models that are very similar to the proteins native structure. Our Heuristic is freely available for testing on the web at the following url: http://vassura.web.cs.unibo.it/cmap23d/

Pp. 578-589

A Feature Selection Algorithm Based on Graph Theory and Random Forests for Protein Secondary Structure Prediction

Gulsah Altun; Hae-Jin Hu; Stefan Gremalschi; Robert W. Harrison; Yi Pan

Protein secondary structure prediction problem is one of the widely studied problems in bioinformatics. Predicting the secondary structure of a protein is an important step for determining its tertiary structure and thus its function. This paper explores the protein secondary structure problem using a novel feature selection algorithm combined with a machine learning approach based on random forests. For feature reduction, we propose an algorithm that uses a graph theoretical approach which finds cliques in the non-position specific evolutionary profiles of proteins obtained from BLOSUM62. Then, the features selected by this algorithm are used for condensing the position specific evolutionary information obtained from PSI-BLAST. Our results show that we are able to save significant amount of space and time and still achieve high accuracy results even when the features of the data are 25% reduced.

Pp. 590-600

DNA Sites Buried in Nucleosome Become Accessible at Room Temperature: A Discrete-Event-Simulation Based Modeling Approach

Amin R. Mazloom; Kalyan Basu; Subhrangsu S. Mandal; Mehran Sorourian; Sajal Das

Conformation of a canonical nucleosome inhibits the direct access of the binding proteins to portions of nucleosomal DNA. Nucleosome dynamics establish certain pathways through which nucleosome gets remodeled (spontaneously, covalently or non-covalently) and the buried DNA sites become accessible. Currently for most pathways no single model is available to capture the temporal behavior of these pathways. Plus traditional diffusion-based models in most cases are not precise. In this work we have given a systematic overview of such pathways. Then, we manipulate the probability of a binding site on array of nucleosomes and chromatin of length base pairs . We further identify three of the widely accepted thermal-driven (passive) pathways and model those based on stochastic process and the Discrete-Event-Simulation. For the output of the models we have sought either the site access rate or the sliding rate of the nucleosome. We also show that results from these models match the experimental data where available.

Pp. 601-614

Comparative Analysis of Gene-Coexpression Networks Across Species

Shiquan Wu; Jing Li

This paper presents a large scale analysis of gene-coexpression networks (GCNs) across four plant species, i.e. Arabidopsis, Barley, Soybean, and Wheat, over 1471 DNA microarrays. We first identify a set of 5164 metagenes that are highly conserved across all of them. For each of the four species, a GCN is constructed by linking reliable coexpressed metagene pairs based on their expression profiles within each species. Similarly, an overall GCN for the four species is constructed based on gene expression profiles across the four species. On average, more than 50K correlation links have been generated for each of the five networks. A number of recent studies have shown that topological structures of GCNs and some other biological networks have some common characteristics, and GCNs across species may reveals conserved genetic modules that contain functionally related genes. But no studies on GCNs across crop species have been reported. In this study, we focus on the comparative analysis of statistical properties on the topological structure of the above five networks across Arabidopsis and three crop species. We show that: (1) the five networks are scale-free and their degree distributions follow the power law; (2) these networks have the small-world property; (3) these networks share very similar values for a variety of network parameters such as degree distributions, network diameters, cluster coefficients, and frequency distributions of correlation patterns (sub-graphs); (4) these networks are non-random and are stable; (5) cliques and clique-like subgraphs are overly present in these networks. Further analysis can be carried out to investigate conserved functional modules and regulatory pathways across the four species based on these networks. A web-based computing tool, available at , has been designed to visualize expression profiles of metagenes across the four species.

Pp. 615-626

Comparative Pathway Prediction Via Unified Graph Modeling of Genomic Structure Information

Jizhen Zhao; Dongsheng Che; Liming Cai

Genomic information other than sequence similarity is important for comparative analysis based prediction of biological pathways. There is evidence that structure information like protein-DNA interactions and operons is very useful in improving the pathway prediction accuracy. This paper introduces a graph model that can unify the protein-DNA interaction and operon information as well as homologous relationships between involved genes. Under this model, pathway prediction corresponds to finding the maximum independent set in the model graph, which is solved efficiently via non-trivial tree decomposition-based techniques. The developed algorithm is evaluated based on the prediction of 30 pathways in K12 using those in 168 as templates. The overall accuracy of the new method outperforms those based solely on sequence similarity or using different categories of structure information separately.

Pp. 627-637

Extending the Calculus of Looping Sequences to Model Protein Interaction at the Domain Level

Roberto Barbuti; Andrea Maggiolo–Schettini; Paolo Milazzo

In previous papers we introduced a formalism, called Calculus of Looping Sequences (CLS), for describing biological systems and their evolution. CLS is based on term rewriting. Terms can be constructed by composing symbols of a given alphabet in sequences, which could be closed (looping) and contain other terms. In this paper we extend CLS to represent protein interaction at the domain level. Such an extension, called Calculus of Linked Looping Sequences (LCLS), is obtained by labeling alphabet symbols used in terms. Two symbols with the same label are considered to be linked. We introduce a type system to express a concept of well–formedness of LCLS terms, we give an operational semantics of the new calculus, and we show the application of LCLS to the description of a biological system.

Pp. 638-649