Catálogo de publicaciones - libros

Compartir en
redes sociales


Bioinformatics Research and Applications: Third International Symposium, ISBRA 2007, Atlanta, GA, USA, May 7-10, 2007. Proceedings

Ion Măndoiu ; Alexander Zelikovsky (eds.)

En conferencia: 3º International Symposium on Bioinformatics Research and Applications (ISBRA) . Atlanta, GA, USA . May 7, 2007 - May 10, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-72030-0

ISBN electrónico

978-3-540-72031-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Algorithm for Haplotype Inferring Via Galled-Tree Networks with Simple Galls

Arvind Gupta; Ján Maňuch; Ladislav Stacho; Xiaohong Zhao

The problem of determining haplotypes from genotypes has gained considerable prominence in the research community. Here the focus is on determining sets of SNP values on individual chromosomes since such information captures the genetic causes of diseases. Present algorithmic tools for haplotyping make effective use of phylogenetic trees. Here the underlying assumption is that recombinations are not present, an assumption based on experimental results. However these results do not fully exclude recombinations and models are needed that incorporate this extra degree of complication. Recently, Gusfield studied the two cases: haplotyping via imperfect phylogenies with a single homoplasy and via galled-tree networks with one gall. In earlier work we characterized the existence of the galled-tree networks. Building on this, we present a polynomial algorithm for haplotyping via galled-tree networks with simple galls (having two mutations). In the end, we give the experimental results comparing our algorithm with PHASE on simulated data.

Pp. 121-132

Estimating Bacterial Diversity from Environmental DNA: A Maximum Likelihood Approach

Frederick Cohan; Danny Krizanc; Yun Lu

The ability to measure bacterial diversity is a prerequisite for the systematic study of bacterial biogeography and ecology. In this paper we describe a method of estimating diversity from an environmental sample of DNA and apply it to data taken from samples from the Sargasso Sea. Our approach combines the coverage depth method of Venter . [2] and the contig spectrum approach of Angly . [4], but uses maximum likelihood to recover the diversity rather than using hand-fit models as in [2]. We assume four species abundance distributions, then maximize the likelihood of fitting the coverage depth at different positions of the consensus sequence provided in the Sargasso Sea sample. The resulting estimates match well with those obtained using less mathematically rigorous approaches.

Pp. 133-144

Invited Talk: Modern Homology Search

Ming Li

Homology search, finding similar parts between two sequences, is the most fundamental task in bioinformatics. A large fraction of the world’s supercomputing time is consumed by homology search.

Pp. 145-145

Statistical Absolute Evaluation of Gene Ontology Terms with Gene Expression Data

Pramod K. Gupta; Ryo Yoshida; Seiya Imoto; Rui Yamaguchi; Satoru Miyano

We propose a new testing procedure for the automatic ontological analysis of gene expression data. The objective of the ontological analysis is to retrieve some functional annotations, e.g. Gene Ontology terms, relevant to underlying cellular mechanisms behind the gene expression profiles, and currently, a large number of tools have been developed for this purpose. The most existing tools implement the same approach that exploits rank statistics of the genes which are ordered by the strength of statistical evidences, e.g. -values computed by testing hypotheses at the individual gene level. However, such an approach often causes the serious false discovery. Particularly, one of the most crucial drawbacks is that the rank-based approaches wrongly judge the ontology term as statistically significant although all of the genes annotated by the ontology term are irrelevant to the underlying cellular mechanisms. In this paper, we first point out some drawbacks of the rank-based approaches from the statistical point of view, and then, propose a new testing procedure in order to overcome the drawbacks. The method that we propose has the theoretical basis on the statistical meta-analysis, and the hypothesis to be tested is suitably stated for the problem of the ontological analysis. We perform Monte Carlo experiments for highlighting the disadvantages of the rank-based approach and the advantages of the proposed method. Finally, we demonstrate the applicability of the proposed method along with the ontological analysis of the gene expression data of human diabetes.

Pp. 146-157

Discovering Relations Among GO-Annotated Clusters by Graph Kernel Methods

Italo Zoppis; Daniele Merico; Marco Antoniotti; Bud Mishra; Giancarlo Mauri

The biological interpretation of large-scale gene expression data is one of the challenges in current bioinformatics. The state-of-the-art approach is to perform clustering and then compute a functional characterization via enrichments by Gene Ontology terms [1]. To better assist the interpretation of results, it may be useful to establish connections among different clusters. This machine learning step is sometimes termed , and several approaches have already been proposed; in particular, they usually rely on enrichments based on flat lists of GO terms. However, GO terms are organized in taxonomical graphs, whose structure should be taken into account when performing enrichment studies. To tackle this problem, we propose a kernel approach that can exploit such structured graphical nature. Finally, we compare our approach against a specific flat list method by analyzing the cdc15-subset of the well known Spellman’s Yeast Cell Cycle dataset [2].

Pp. 158-169

An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets

George Lee; Carlos Rodriguez; Anant Madabhushi

The recent explosion in availability of gene and protein expression data for cancer detection has necessitated the development of sophisticated machine learning tools for high dimensional data analysis. Previous attempts at gene expression analysis have typically used a linear dimensionality reduction method such as Principal Components Analysis (PCA). Linear dimensionality reduction methods do not however account for the inherent nonlinearity within the data. The motivation behind this work is to demonstrate that nonlinear dimensionality reduction methods are more adept at capturing the nonlinearity within the data compared to linear methods, and hence would result in better classification and potentially aid in the visualization and identification of new data classes. Consequently, in this paper, we empirically compare the performance of 3 commonly used linear versus 3 nonlinear dimensionality reduction techniques from the perspective of (a) distinguishing objects belonging to cancer and non-cancer classes and (b) new class discovery in high dimensional gene and protein expression studies for different types of cancer. Quantitative evaluation using a support vector machine and a decision tree classifier revealed statistically significant improvement in classification accuracy by using nonlinear dimensionality reduction methods compared to linear methods.

Pp. 170-181

NEURONgrid: A Toolkit for Generating Parameter-Space Maps Using NEURON in a Grid Environment

Robert J. Calin-Jageman; Chao Xie; Yi Pan; Art Vandenberg; Paul S. Katz

Neuroscience research increasingly involves the exploration of computational models of neurons and neural networks. To ensure systematic model exploration, it is often desirable to conduct a parameter-space analysis in which the behavior of the model is catalogued over a very large range of parameter permutations. Here we report the development and testing of a tool-kit called NEURONgrid for conducting this type of analysis in a grid environment using NEURON (Hines & Carnevale, 1997, 2001), a popular and powerful simulation platform for the neurosciences. NEURONgrid provides helper classes within NEURON for manipulating parameters, a package of NEURON for running in a grid environment, and a management client that enables neuroscientists to submit a parameter-space analysis, monitor progress, and download results. NEURONgrid provides a user-friendly means for conducting intensive model exploration within the neurosciences. It is available for download at http://neurongrid.homeip.net.

Pp. 182-191

An Adaptive Resolution Tree Visualization of Large Influenza Virus Sequence Datasets

Leonid Zaslavsky; Yiming Bao; Tatiana A. Tatusova

Rapid growth of the amount of influenza genome sequence data requires enhancing exploratory analysis tools. Results of the preliminary analysis should be represented in an easy-to-comprehend form and allow convenient manipulation of the data.

We developed an adaptive approach to visualization of large sequence datasets on the web. A dataset is presented in an aggregated tree form with special representation of sub-scale details. The representation is calculated from the full phylogenetic tree and the amount of available screen space. Metadata, such as distribution over seasons or geographic locations, are aggregated/refined consistently with the tree. The user can interactively request further refinement or aggregation for different parts of the tree.

The technique is implemented in Javascript on client site. It is a part of the new AJAX-based implementation of the NCBI Influenza Virus Resource.

Pp. 192-202

Wavelet Image Interpolation (WII): A Wavelet-Based Approach to Enhancement of Digital Mammography Images

Gordana Derado; F. DuBois Bowman; Rajan Patel; Mary Newell; Brani Vidakovic

Cancer detection using mammography focuses, in part, on characteristics of tiny microcalcifications, including the number, size, and spatial arrangement of the microcalcifications, as well as morphological features of individual microcalcifications. We have developed state-of-the-art wavelet-based methods to enhance the resolution of microcalcifications visible on digital mammograms, aimed at improving the specificity of breast cancer diagnoses. In our research, we develop, refine, and evaluate a Wavelet Image Interpolation (WII) procedure and create accompanying software to implement it. WII involves the application of an inverse wavelet transformation to a coarse or degraded image and constructed detail coefficients to produce an enhanced higher resolution image. The construction of detail coefficients is supervised by the observed image and innate regular scaling assessed by a statistical model. We found that our proposed procedure is efficient and useful in capturing relevant clinical information in the context of digital mammographic imaging. Our proposed methodology was tested by an experienced radiologist using 40 images from the University of South Florida Digital Database for Screening Mammography (DDSM).

Pp. 203-214

High Level Programming Environment System for Protein Structure Data

Yanchao Wang; Rajshekhar Sunderraman; Piyaphol Phoungphol

In this paper, we present an application system that extends the Object-Oriented Database (OODB) system by adding domain-specific layers to manage protein structure data. Protein-QL, a domain-specific query language, and Protein-OODB layers are added above the OODB. We have implemented this system for protein domain, but we can easily extend it into other biological domains to build a bio-OODBMS. We define protein’s primary, secondary, and tertiary structures as internal data types to simplify queries in Protein-QL in such a way that the domain scientists can easily master the query language and formulate data requests. We use EyeDB as the base OODB to communicate with Protein-OODB. Our system uses Java RMI to return results back to the clients so that transactions can be conveniently executed by the clients.

Pp. 215-226