Catálogo de publicaciones - libros

Compartir en
redes sociales


Bioinformatics Research and Development: First International Conference, BIRD 2007, Berlin, Germany, March 12-14, 2007. Proceedings

Sepp Hochreiter ; Roland Wagner (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-71232-9

ISBN electrónico

978-3-540-71233-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

satDNA Analyzer 1.2 as a Valuable Computing Tool for Evolutionary Analysis of Satellite-DNA Families: Revisiting Y-Linked Satellite-DNA Sequences of (Polygonaceae)

Rafael Navajas-Pérez; Manuel Ruiz Rejón; Manuel Garrido-Ramos; José Luis Aznarte; Cristina Rubio-Escudero

In a previous paper [1] we showed that Y-linked satellite-DNA sequences of Rumex (Polygonaceae) present reduced rates of evolution in relation to other autosomal satellite-DNA sequences. In the present paper, we re-analyze the same set of sequences by using the satDNA Analyzer 1.2 software, specifically developed by us for analysis of satellite DNA evolution. We do not only confirm our previous findings but also prove that the satDNA Analyzer 1.2 package constitutes a powerful tool for users interested in evolutionary analysis on satellite-DNA sequences. In fact, we are able to gather more accurate calculations regarding location of Strachan positions and evolutionary rates calculations, among others useful statistics. All results are displayed in a very comprehensive multicoloured graphic representation easy to use as an html file. Furthermore, satDNA Analyzer 1.2 is a time saving feature since every utility is automatized and collected in a single software package, so the user does not need to use different programs. Additionally, it significantly reduces the rate of data miscalculations due to human errors, very prone to occur specially in large files.

- Session 4: Medical, SNPs, Genomics II | Pp. 131-139

A Soft Hierarchical Algorithm for the Clustering of Multiple Bioactive Chemical Compounds

Jehan Zeb Shah; Naomie bt Salim

Most of the clustering methods used in the clustering of chemical structures such as Ward’s, Group Average, K- means and Jarvis-Patrick, are known as hard or crisp as they partition a dataset into strictly disjoint subsets; and thus are not suitable for the clustering of chemical structures exhibiting more than one activity. Although, fuzzy clustering algorithms such as fuzzy c-means provides an inherent mechanism for the clustering of overlapping structures (objects) but this potential of the fuzzy methods which comes from its fuzzy membership functions have not been utilized effectively. In this work a fuzzy hierarchical algorithm is developed which provides a mechanism not only to benefit from the fuzzy clustering process but also to get advantage of the multiple membership function of the fuzzy clustering. The algorithm divides each and every cluster, if its size is larger than a pre-determined threshold, into two sub clusters based on the membership values of each structure. A structure is assigned to one or both the clusters if its membership value is very high or very similar respectively. The performance of the algorithm is evaluated on two bench mark datasets and a large dataset of compound structures derived from MDL’s MDDR database. The results of the algorithm show significant improvement in comparison to a similar implementation of the hard c-means algorithm.

- Session 4: Medical, SNPs, Genomics II | Pp. 140-153

A Novel Method for Flux Distribution Computation in Metabolic Networks

Da Jiang; Shuigeng Zhou; Jihong Guan

In recent years, the study on metabolic networks has attracted considerable attention from the research community. Though the topological structures of genome-scale metabolic networks of some organisms have been investigated, their metabolic flux distributions still remain unclear. The understanding of flux distributions in metabolic networks, especially when it comes to the gene-knockout mutants, is helpful for suggesting potential ways to improve strain design. The traditional method of flux distribution computation, i.e., flux balance analysis (FBA) method, is based on the idea of maximizing biomass yield. However, this method overestimates the production of biomass. In this paper, we develop a novel approach to overcome the drawback of the FBA method. First, we adopt a series of extended equations to model reaction flux; Second, we build the stoichiometric matrix of a metabolic network by using a more complex but accurate model – carbon mole balance – rather than mass balance used in FBA. Computation results with real-world data of show that our approach outperforms FBA in the accuracy of flux distribution computation.

- Session 5: Systems Biology | Pp. 154-167

Inverse Bifurcation Analysis of a Model for the Mammalian / Regulatory Module

James Lu; Heinz W. Engl; Rainer Machné; Peter Schuster

Given a large, complex ordinary differential equation model of a gene regulatory network, relating its dynamical properties to its network structure is a challenging task. Biologically important questions include: what network components are responsible for the various dynamical behaviors that arise? can the underlying dynamical behavior be essentially attributed to a small number of modules? In this paper, we demonstrate that inverse bifurcation analysis can be used to address such . We show that sparsity-promoting regularization strategies, in combination with numerical bifurcation analysis, can be used to identify small sets of ”influential” submodules and parameters within a given network. In addition, hierarchical strategies can be used to generate parameter solutions of increasing cardinality of non-zero entries. We apply the proposed methods to analyze a model of the mammalian / regulatory module.

- Session 5: Systems Biology | Pp. 168-184

Weighted Cohesiveness for Identification of Functional Modules and Their Interconnectivity

Zelmina Lubovac; David Corne; Jonas Gamalielsson; Björn Olsson

Systems biology offers a holistic perspective where individual proteins are viewed as elements in a network of protein-protein interactions (PPI), in which the proteins have contextual functions within functional modules. In order to facilitate the identification and analysis of such modules, we have previously proposed a Gene Ontology-weighted clustering coefficient for identification of modules in PPI networks and a method, named SWEMODE (Semantic WEights for MODule Elucidation), where this measure is used to identify network modules. Here, we introduce novel aspects of the method that are tested and evaluated. One of the aspects that we consider is to use the -core graph instead of the original protein-protein interaction graph.Also, by taking the spatial aspect into account, by using the GO cellular component annotation when calculating weighted cohesiveness, we are able to improve the results compared to previous work where only two of the GO aspects (molecular function and biological process) were combined. We here evaluate the predicted modules by calculating their overlap with MIPS functional complexes. In addition, we identify the “most frequent” proteins, i.e. the proteins that most frequently participate in overlapping modules. We also investigate the role of these proteins in the interconnectivity between modules. We find that the majority of identified proteins are involved in the assembly and arrangement of cell structures, such as the cell wall and cell envelope.

- Session 5: Systems Biology | Pp. 185-198

Modelling and Simulation of the Genetic Phenomena of Additivity and Dominance via Gene Networks of Parallel Aggregation Processes

Elena Tsiporkova; Veselka Boeva

The contribution develops a mathematical model allowing interpretation and simulation of the phenomenon of additive-dominance heterosis as a network of interacting parallel aggregation processes. Initially, the overall heterosis potential has been expressed in terms of the heterosis potentials of each of the individual genes controlling the trait of interest. Further, the individual genes controlling the trait of interest are viewed as interacting agents involved in the process of achieving a trade-off between their individual contributions to the overall heterosis potential. Each agent is initially assigned a vector of interaction coefficients, representing the relative degrees of influence the agent is prepared to accept from the other agents. Then the individual heterosis potentials of the different agents are combined in parallel with weighted mean aggregations, one for each agent. Consequently, a new heterosis potential is obtained for each agent. The above parallel aggregations are repeated again and again until a consensus between the agents is attained.

- Session 5: Systems Biology | Pp. 199-211

Protein Remote Homology Detection Based on Binary Profiles

Qiwen Dong; Lei Lin; Xiaolong Wang

Remote homology detection is a key element of protein structure and function analysis in computational and experimental biology. This paper presents a simple representation of protein sequences, which uses the evolutionary information of profiles for efficient remote homology detection. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. Such binary profiles make up of a new building block for protein sequences. The protein sequences are mapped into high-dimensional vectors by the occurrence times of each binary profile. The resulting vectors are then evaluated by support vector machine to train classifiers that are then used to classify the test protein sequences. The method is further improved by applying an efficient feature extraction algorithm from natural language processing, namely, the latent semantic analysis model. Testing on the SCOP 1.53 database shows that the method based on binary profiles outperforms those based on many other basic building blocks including N-grams, patters and motifs. The ROC50 score is 0.698, which is higher than other methods by nearly 10 percent.

- Session 6: Sequence Analysis I (Coding) | Pp. 212-223

Biological Sequences Encoding for Supervised Classification

Rabie Saidi; Mondher Maddouri; Engelbert Mephu Nguifo

The classification of biological sequences is one of the significant challenges in bioinformatics as well for protein as for nucleic sequences. The presence of these data in huge masses, their ambiguity and especially the high costs of the analysisin terms of time and money, make the use of data mining rather a necessity than a rational choice. However, the data mining techniques, which often process data under the relational format, are confronted with the inappropriate format of the biological sequences. Hence, an inevitable step of pre-processing must be established. This work presents the biological sequences encoding as a preparation step before their classification. We present three existing encoding methods based on the motifs extraction. We also propose to improve one of these methods and we carry out a comparative study which takes into account, of course, the effect of each method on the classification accuracy but also the number of generated attributes and the CPU time.

- Session 6: Sequence Analysis I (Coding) | Pp. 224-238

Fast Search Algorithms for Position Specific Scoring Matrices

Cinzia Pizzi; Pasi Rastas; Esko Ukkonen

Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho–Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho–Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.

- Session 6: Sequence Analysis I (Coding) | Pp. 239-250

A Markovian Approach for the Segmentation of Chimpanzee Genome

Christelle Melodelima; Christian Gautier

Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures such as DNA sequences. Numerous methodological difficulties are encountered when using HMMs to model non geometric distribution such as exons length, to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by introducing macros-states. Our HMMs methods take into account many biological properties and were developped to model the isochore organisation of the chimpanzee genome which is considered as a fondamental level of genome organisation. A clear isochore structure in the chimpanzee genome, correlated with the gene density and guanine-cytosine content, has been identified.

- Session 7: Sequence Analysis II | Pp. 251-262