Catálogo de publicaciones - libros

Compartir en
redes sociales


Bioinformatics Research and Applications: Third International Symposium, ISBRA 2007, Atlanta, GA, USA, May 7-10, 2007. Proceedings

Ion Măndoiu ; Alexander Zelikovsky (eds.)

En conferencia: 3º International Symposium on Bioinformatics Research and Applications (ISBRA) . Atlanta, GA, USA . May 7, 2007 - May 10, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-72030-0

ISBN electrónico

978-3-540-72031-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

An Efficient Algorithm for Finding Gene-Specific Probes for DNA Microarrays

Mun-Ho Choi; In-Seon Jeong; Seung-Ho Kang; Hyeong-Seok Lim

The accuracy of a DNA microarray is fairly dependent on the quality of the probes it uses; a good probe should be specific for exactly one gene. Most sequence based algorithms use the edit distance to the target sequences as the measure of the specificity of the probe. We propose a novel algorithm for finding gene-specific probes which avoids large amounts of redundant computations of the edit distance, while maintaining the same accuracy as that provided by an exhaustive search. Our approach utilizes the fact that when the starting position of a probe candidate is moved only a few base pairs, the change in the edit distance to the off-target sequence is limited. The proposed algorithm does not use any index structures and is insensitive to the length of the probes. Our approach enables short (20~30 bases) or long (50 or more bases) probes to be computed for genomes of size 10M within a day.

Pp. 453-464

Multiple Sequence Local Alignment Using Monte Carlo EM Algorithm

Chengpeng Bi

The Expectation Maximization (EM) motif-finding algorithm is one of the most popular motif discovery methods. However, the EM algorithm largely depends on its initialization and can be easily trapped in local optima. This paper implements a Monte Carlo version of the EM algorithm that performs multiple sequence local alignment to overcome the drawbacks inherent in conventional EM motif-finding algorithms. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update steps until convergence. MCEMDA is compared with other popular motif-finding algorithms using simulated, prokaryotic and eukaryotic motif sequences. Results show that MCEMDA outperforms other algorithms. MCEMDA successfully discovers a helix-turn-helix motif in protein sequences as well. It provides a general framework for motif-finding algorithm development. A website of this program will be available at .

Pp. 465-476

Cancer Class Discovery Using Non-negative Matrix Factorization Based on Alternating Non-negativity-Constrained Least Squares

Hyunsoo Kim; Haesun Park

Many bioinformatics problems deal with chemical concentrations that should be non-negative. Non-negative matrix factorization (NMF) is an approach to take advantage of non-negativity in data. We have recently developed sparse NMF algorithms via alternating non-negativity-constrained least squares in order to obtain sparser basis vectors or sparser mixing coefficients for each sample, which lead to easier interpretation. However, the additional sparsity constraints are not always required. In this paper, we conduct cancer class discovery using NMF based on alternating non-negativity-constrained least squares (NMF/ANLS) without any additional sparsity constraints after introducing a rigorous convergence criterion for biological data analysis.

Pp. 477-487

A Support Vector Machine Ensemble for Cancer Classification Using Gene Expression Data

Chen Liao; Shutao Li

In this paper, we propose a support vector machine (SVM) ensemble classification method. Firstly, dataset is preprocessed by Wilcoxon rank sum test to filter irrelevant genes. Then one SVM is trained using the training set, and is tested by the training set itself to get prediction results. Those samples with error prediction result or low confidence are selected to train the second SVM, and also the second SVM is tested again. Similarly, the third SVM is obtained using those samples, which cannot be correctly classified using the second SVM with large confidence. The three SVMs form SVM ensemble classifier. Finally, the testing set is fed into the ensemble classifier. The final test prediction results can be got by majority voting. Experiments are performed on two standard benchmark datasets: Breast Cancer, ALL/AML Leukemia. Experimental results demonstrate that the proposed method can reach the state-of-the-art performance on classification.

Pp. 488-495

Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis

Xiujuan Chen; Yichuan Zhao; Yan-Qing Zhang; Robert Harrison

Recently, the use of Receiver Operating Characteristic (ROC) Curve and the area under the ROC Curve (AUC) has been receiving much attention as a measure of the performance of machine learning algorithms. In this paper, we propose a SVM classifier fusion model using genetic fuzzy system. Genetic algorithms are applied to tune the optimal fuzzy membership functions. The performance of SVM classifiers are evaluated by their AUCs. Our experiments show that AUC-based genetic fuzzy SVM fusion model produces not only better AUC but also better accuracy than individual SVM classifiers.

Pp. 496-505

A BP-SCFG Based Approach for RNA Secondary Structure Prediction with Consecutive Bases Dependency and Their Relative Positions Information

Dandan Song; Zhidong Deng

The prediction of RNA secondary structure is a fundamental problem in computational biology. However, in the existing RNA secondary structure prediction approaches, none of them explicitly take the local neighboring bases information into account. That is, when predicting whether a base is paired, only the long range correlation is considered. As a substructure consists of multiple bases, it is affected by consecutive bases dependency and their relative positions in the sequence. In this paper we propose a novel RNA secondary structure prediction approach through a combination of Back Propagation (BP) neural network and statistical calculation with Stochastic Context-Free Grammar (SCFG) approach, in which the consecutive bases dependency and their relative positions information in the sequence are incorporated into the predicting process. When performing on tRNA dataset and three species of rRNA datasets, compared to the SCFG approach alone, our experimental results show that the prediction accuracy is all improved.

Pp. 506-517

Delta: A Toolset for the Structural Analysis of Biological Sequences on a 3D Triangular Lattice

Minghui Jiang; Martin Mayne; Joel Gillespie

The lattice approach to biological structural analysis was made popular by the HP model for protein folding, but had not been used previously for RNA secondary structure prediction. We introduce the Delta toolset for the structural analysis of biological sequences on a 3D triangular lattice. The Delta toolset includes a proof-of-concept RNA folding program that is both fast and accurate in predicting the secondary structures with pseudoknots of short RNA sequences.

Pp. 518-529

Statistical Estimate for the Size of the Protein Structural Vocabulary

Xuezheng Fu; Bernard Chen; Yi Pan; Robert W. Harrison

The concept of structural clusters defining the vocabulary of protein structure is one of the central concepts in the modern theory of protein folding. Typically clusters are found by a variation of the K-means or K-NN algorithm. In this paper we study approaches to estimating the number of clusters in data. The optimal number of clusters is believed to result in a reliable clustering. Stability with respect to bootstrap sampling was adapted as the cluster validation measure for estimating the reliable clustering. In order to test this algorithm, six random subsets were drawn from the unique chains in the PDB. The algorithm converged in each case to unique set of reliable clusters. Since these clusters were drawn randomly from the total current set of chains, counting the number of coincidences and using basic sampling theory provides a rigorous statistical estimate of the number of unique clusters in the dataset.

Pp. 530-538

Coclustering Based Parcellation of Human Brain Cortex Using Diffusion Tensor MRI

Cui Lin; Shiyong Lu; Danqing Wu; Jing Hua; Otto Muzik

The fundamental goal of computational neuroscience is to discover anatomical features that reflect the functional organization of the brain. Investigations of the physical connections between neuronal structures and measurements of brain activity in vivo have given rise to the concepts of anatomical and functional connectivity, which have been useful for our understanding of brain mechanisms and their plasticity. However, at present there is no generally accepted computational framework for the quantitative assessment of cortical connectivity. In this paper, we present accurate analytical and modeling tools that can reveal anatomical connectivity pattern and facilitate the interpretation of high-level knowledge regarding brain functions are strongly demanded. We also present a coclustering algorithm, called Business model based Coclustering Algorithm (BCA), which allows an automated and reproducible assessment of the connectivity pattern between different cortical areas based on Diffusion Tensor Imaging (DTI) data. The proposed BCA algorithm not only partitions the cortical mantel into well-defined clusters, but at the same time maximizes the connection strength between these clusters. Moreover, the BCA algorithm is computationally robust and allows both outlier detection as well as operator-independent determination of the number of clusters. We applied the BCA algorithm to human DTI datasets and show good performance in detecting anatomical connectivity patterns in the human brain.

Pp. 539-550

An Algorithm for Hierarchical Classification of Genes of Prokaryotic Genomes

Hongwei Wu; Fenglou Mao; Victor Olman; Ying Xu

We present in this paper our hierarchical classification of genes for prokaryotic genomes from a methodological point of view. Our classification scheme is unique in that (1) the functional equivalence relationships among genes are assessed by using both sequence similarity and genomic context information, (2) genes are grouped into clusters of multiple resolution levels based on their equivalence relationships among each other, and (3) gene clusters, which are either orone another, naturally form a hierarchical structure. This classification scheme has been applied for the genes of 224 complete prokaryotic genomes (release as of March, 2005). The classification results are available at http://csbl.bmb.uga.edu/HCG, and are validated through comparisons with the taxonomy of these 224 genomes, and with two existing gene classification schemes, Clusters of Orthologous Groups of proteins (COG) and Pfam, respectively.

Pp. 551-563