Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Bioinformatics and Computational Biology: 2nd Brazilian Symposium on Bioinformatics, BSB 2007, Angra dos Reis, Brazil, August 29-31, 2007. Proceedings

Marie-France Sagot ; Maria Emilia M. T. Walter (eds.)

En conferencia: 2º Brazilian Symposium on Bioinformatics (BSB) . Angra dos Reis, Brazil . August 29, 2007 - August 31, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73730-8

ISBN electrónico

978-3-540-73731-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Automating Molecular Docking with Explicit Receptor Flexibility Using Scientific Workflows

K. S. Machado; E. K. Schroeder; D. D. Ruiz; O. Norberto de Souza

Computer assisted drug design (CADD) is a process involving the execution of many computer programs, ensuring that the ligand binds optimally to its receptor. This process is usually executed using shell scripts which input parameters assignments and result analyses are complex and time consuming. Moreover, receptors and ligands are naturally flexible molecules. In order to explicitly model the receptor flexibility during molecular docking experiments, we propose to use different receptor conformations derived from a molecular dynamics simulation trajectory. This work presents an integrated scientific workflow solution aiming at automating molecular docking with explicit inclusion of receptor flexibility. Enhydra JAWE and Shark software tools were used to model and execute workflows, respectively. To test our approach we performed docking experiments with the enzyme InhA (receptor) and three ligands: NADH, IPCF and TCL. The results illustrate the effectiveness of both the proposed workflow and the implementation of the docking processes.

- Selected Articles | Pp. 1-11

Gene Set Enrichment Analysis Using Non-parametric Scores

Ariel E. Bayá; Mónica G. Larese; Pablo M. Granitto; Juan Carlos Gómez; Elizabeth Tapia

Gene Set Enrichment Analysis (GSEA) is a well-known technique used for studying groups of functionally related genes and their correlation with phenotype. This method creates a ranked list of genes, which is used to calculate an enrichment score. In this work, we introduce two different metrics for gene ranking in GSEA, namely the Wilcoxon and the Baumgartner-Weiß-Schindler tests. The advantage of these metrics is that they do not assume any particular distribution on the data. We compared them with the signal-to-noise ratio metric originally proposed by the developers of GSEA on a type 2 diabetes mellitus (DM2) database. Statistical significance is evaluated by means of false discovery rate and -value calculations. Results show that the Baumgartner-Weiß-Schindler test detects more pathways with statistical significance. One of them could be related to DM2, according to the literature, but further research is needed.

- Selected Articles | Pp. 12-21

Comparison of Simple Encoding Schemes in GA’s for the Motif Finding Problem: Preliminary Results

Giovanna Martínez-Arellano; Carlos A. Brizuela

The DNA motif finding problem is of great relevance in molecular biology. Weak signals that mark transcription factor binding sites involved in gene regulation are considered to be challenging to find. These signals (motifs) consist of a short string of unknown length that can be located anywhere in the gene promoter region. Therefore, the problem consists on discovering short, conserved sites in genomic DNA without knowing, , the length nor the chemical composition of the site, turning the original problem into a combinatorial one, where computational tools can be applied to find the solution. Pevzner and Sze [7], studied a precise combinatorial formulation of this problem, called , which is of particular interest because it is a challenging model for commonly used motif-finding algorithms [15]. In this work, we analyze two different encoding schemes for genetic algorithms to solve the planted motif finding problem. One representation encodes the initial position for the motif occurrences at each sequence, and the other encodes a candidate motif. We test the performance of both algorithms on a set of planted motif instances. Preliminary experimental results show a promising superior performance of the algorithm encoding the candidate motif over the more standard position based scheme.

- Selected Articles | Pp. 22-33

Multi-Objective Clustering Ensemble with Prior Knowledge

Katti Faceli; André C. P. L. F. de Carvalho; Marcílio C. P. de Souto

In this paper, we introduce an approach to integrate prior knowledge in cluster analysis, which is different from the existing ones for semi-supervised clustering methods. In order to aid the discovery of alternative structures present in the data, we consider the knowledge of some existing complete classification of such data. The approach proposed is based on our Multi-Objective Clustering Ensemble algorithm (MOCLE). This algorithm generates a concise and stable set of partitions, which represents different trade-offs between several measures of partition quality. The prior knowledge is automatically integrated in MOCLE by embedding it into one of the objective functions. In this case, the function gives as output the quality of a partition, considering the prior knowledge of one of the known structures of the data.

- Selected Articles | Pp. 34-45

Biological Sequence Comparison Application in Heterogeneous Environments with Dynamic Programming Algorithms

Marcelo N. P. Santana; Alba Cristina M. A. Melo

This paper presents the design and evaluation of a task allocation framework for Biological Sequence Comparison applications that use dynamic programming and run in heterogeneous environments. The framework is composed by four modules and either task allocation policies or applications can be integrated to it. The results obtained with four different task allocation policies in a 10-machine heterogeneous environment show that, for some sequence sizes, we were able to reduce the execution time of the parallel application in 54.2%, with the appropriate allocation policy.

- Selected Articles | Pp. 46-56

New EST Trimming Procedure Applied to SUCEST Sequences

Christian Baudet; Zanoni Dias

In order to improve EST trimming, we proposed a new method consisting of a new set of procedures to detect regions that do not belong to the sequenced organism or have low quality or low complexity. Most trimming procedures process ESTs in a pipeline where the output of an step is adopted as the input for the following one. In our method, all artifact detection steps process the raw EST and their results are combined in the last step, which outputs the trimmed sequence. This strategy reduces the occurrence of false negatives and, additionally, has the advantage of producing better artifact composition characterization for the analyzed sequences. We evaluated our method using SUCEST [1] ESTs. Based on the results, we concluded that our method suits projects that want to produce more reliable clusters.

- Selected Articles | Pp. 57-68

A Method for Inferring Biological Functions Using Homologous Genes Among Three Genomes

Daniel A. S. Anjos; Gustavo G. Zerlotini; Guilherme A. Pinto; Maria Emilia M. T. Walter; Marcelo M. Brigido; Guilherme P. Telles; Carlos Juliano M. Viana; Nalvo F. Almeida

In this work, we propose 3, a method to infer a particular biological function in an organism, by finding homologous genes among three genomes, comparing the genes of the investigated organism with the genes of two other organisms, one having and the other not having this function. Our 3 method takes as input identified families of paralogous genes in each one of the genomes, and produces a three set Venn diagram, each set representing a genome. The intersection of three (two) sets shows the families of similar genes having strong similarities among the three (two) genomes. The gene families of a genome not having strong similarities with any family of the other two genomes appear outside the intersections. We have used our method to determine potential pathogenic genes of the fungus, comparing it with seven fungi, three at a time, one pathogenic and the other non-pathogenic. To validate 3, we first investigate the Pfam classification of the families belonging to the intersections and compare with INPARANOID and 3 methods.

- Selected Articles | Pp. 69-80

Validating Gene Clusterings by Selecting Informative Gene Ontology Terms with Mutual Information

Ivan G. Costa; Marcilio C. P. de Souto; Alexander Schliep

We propose a method for global validation of gene clusterings. The method selects a set of informative and non-redundant GO terms through an exploration of the Gene Ontology structure guided by mutual information. Our approach yields a global assessment of the clustering quality, and a higher level interpretation for the clusters, as it relates GO terms with specific clusters. We show that in two gene expression data sets our method offers an improvement over previous approaches.

- Selected Articles | Pp. 81-92

An Optimized Distance Function for Comparison of Protein Binding Sites

Gábor Iván

An important field of application of string processing algorithms is the comparison of protein or nucleotide sequences. In this paper we present an algorithm capable of determining the dissimilarity () of protein sequences originating from protein binding sites found in the RS-PDB database that is a repaired and cleaned version of the publicly available Protein Data Bank (PDB). The special way of construction of these protein sequences enabled us to optimize the algorithm, achieving runtimes several times faster than the unoptimized approach. One example the algorithm proposed in this paper can be useful for is searching conserved sequences in protein chains.

- Selected Articles | Pp. 93-100

Comparing RNA Structures: Towards an Intermediate Model Between the and the Problems

Guillaume Blin; Guillaume Fertin; Gaël Herry; Stéphane Vialette

In the recent past, RNA structure comparison has appeared as an important field of bioinformatics. In this paper, we introduce a new and general intermediate model for comparing RNA structures: the Maximum Arc-Preserving Common Subsequence problem (or ). This new model lies between two well-known problems – namely the Longest Arc-Preserving Common Subsequence () and the distance. After showing the relationship between , , , and also the Maximum Linear Graph problem, we will investigate the computational complexity landscape of , depending on the RNA structure complexity.

- Selected Articles | Pp. 101-112