Catálogo de publicaciones - libros

Compartir en
redes sociales


Bioinformatics Research and Applications: Third International Symposium, ISBRA 2007, Atlanta, GA, USA, May 7-10, 2007. Proceedings

Ion Măndoiu ; Alexander Zelikovsky (eds.)

En conferencia: 3º International Symposium on Bioinformatics Research and Applications (ISBRA) . Atlanta, GA, USA . May 7, 2007 - May 10, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-72030-0

ISBN electrónico

978-3-540-72031-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Mining Discriminative Distance Context of Transcription Factor Binding Sites on ChIP Enriched Regions

Hyunmin Kim; Katherina J. Kechris; Lawrence Hunter

Genome-wide identification of transcription factor binding sites (TFBSs) is critical for understanding transcriptional regulation of the gene expression network. ChIP-chip experiments accelerate the procedure of mapping target TFBSs for diverse cellular conditions. We address the problem of discriminating potential TFBSs in ChIP-enriched regions from those of non ChIP-enriched regions using ensemble rule algorithms and a variety of predictive variables, including those based on sequence and chromosomal context. In addition, we developed an input variable based on a scoring scheme that reflects the distance context of surrounding putative TFBSs. Focusing on hepatocyte regulators, this novel feature improved the performance of identifying potential TFBSs, and the measured importance of the predictive variables was consistent with biological meanings. In summary, we found that distance-based features are better discriminators of ChIP-enriched TFBS over other features based on sequence or chromosomal context.

Pp. 338-349

Enhanced Prediction of Cleavage in Bovine Precursor Sequences

Allison N. Tegge; Sandra L. Rodriguez-Zas; J. V. Sweedler; Bruce R. Southey

Neuropeptides are important signaling molecules that influence a wide variety of biological processes. The prediction of neuropeptides from precursor proteins is difficult due to the numerous and complex series of enzymatic processing and posttranslational modification steps. Bioinformatics prediction of cleavage sites using statistical models was used to overcome the challenge of identifying neuropeptides. Binary logistic models were trained on a bovine dataset and validated on a mammalian dataset that contained no bovine precursors. A model that incorporated amino acid locations and properties provided more accurate and precise cleavage predictions than one using amino acid locations alone. All models consistently resulted in highly accurate predictions of cleavage sites in both datasets. The logistic model proposed can accurately predict cleavage sites in mammalian species and minimize the time consuming and costly experimental validation of neuropeptides.

Pp. 350-360

Invited Talk: A Computational Study of Bidirectional Promoters in the Human Genome

Mary Qu Yang; Laura L. Elnitski

A is a region along a strand of DNA that regulates the expression of two genes flanking the region. Each of these genes is transcibed in a direction that points away from the other gene; two such genes are said to be in a configuration. We search the UCSC List of Known Genes and GenBank Expressed Sequence Tag (EST) data for pairs of genes in such a configuration in order to identify new bidirectional promoters.

The EST data constitutes a larger and more intricate dataset than the UCSC List of Known Genes. However, working with EST data presents a challenge, as the EST database may be highly redundant and may also contain overlapping ESTs. To deal with these problems, we have developed an algorithm to identify bidirectional promoters based on the above data sources; the algorithm is capable of handling redundant ESTs, and also ESTs that overlap or disagree in orientation.

This analysis resulted in the identification of thousands of new candidate head-to-head gene pairs, corroborated the 5’ ends of many known human genes, revealed new 5’ exons of previously characterized genes, and in some cases identified novel genes. Further analyses yielded evidence for coordinate expression of genes in a head-to-head configuration, and examined the prevalence of bidirectional promoters in different biological pathways.

Pp. 361-371

The Identification of Antisense Gene Pairs Through Available Software

Mark J. Lawson; Liqing Zhang

Antisense genes have been shown to have a variety of functions in both prokaryotes and recently in eukaryotes as well. They are hypothesized to be an important part of every genome and have been shown to be evolutionarily conserved as well. Naturally, it is in our interest to develop a software for identifying antisense pairs. While a variety of approaches and software do exist, each approach has its limitations and the software is not meant for large-scale analyses for identifying both cis and trans antisense genes. Here we present a novel way to identify antisense genes and show the results we obtained through it. While in no means a perfect solution, we do manage to show a possible way that may lead to more accurate prediction of antisense genes.

Pp. 372-381

Inferring Weak Adaptations and Selection Biases in Proteins from Composition and Substitution Matrices

Steinar Thorvaldsen; Elinor Ytterstad; Tor Flå

There is a desire for increasing use of statistical methods in analysing the growing amounts of bio-sequences. We present statistical methods that are useful when a protein alignment can be divided into two groups based on known features or traits. The approach is based on stratification of the data, and to show the applicability of the methods we present analysis of genomic data from proteobacteria orders. A dataset of 25 periplasmic/extracellular bacterial enzyme proteins was compiled to identify genotypic characteristics that separate the cold adapted proteins from ortholog sequences with a higher optimal growth temperature. Our results reveal that the cold adapted protein has a significantly more positively charged exterior. Life in a cold climate seems to be enabled by many minor structural modifications rather than a particular amino acid substitution. Redistribution of charge might be one of the most important signatures for cold adaptation.

Pp. 382-393

Markov Model Variants for Appraisal of Coding Potential in Plant DNA

Michael E. Sparks; Volker Brendel; Karin S. Dorman

Markov chain models are commonly used for content-based appraisal of coding potential in genomic DNA. The ability of these models to distinguish coding from non-coding sequences depends on the method of parameter estimation, the validity of the estimated parameters for the species of interest, and the extent to which oligomer usage characterizes coding potential. We assessed performances of Markov chain models in two model plant species, and rice, comparing canonical fixed-order, -interpolated, and top-down and bottom-up deleted interpolated Markov models. All methods achieved comparable identification accuracies, with differences usually within statistical error. Because classification performance is related to G+C composition, we also considered a strategy where training and test data are first partitioned by G+C content. All methods demonstrated considerable gains in accuracy under this approach, especially in rice. The methods studied were implemented in the C programming language and organized into a library, , distributed under the GNU LGPL.

Pp. 394-405

Predicting Palmitoylation Sites Using a Regularised Bio-basis Function Neural Network

Zheng Rong Yang

Palmitoylation is one of the most important post-translational modifications involving molecular signalling activities. Two simple methods have been developed very recently for predicting palmitoylation sites, but the sensitivity (the prediction accuracy of palmitoylation sites) of both methods is low (< 65%). A regularised bio-basis function neural network is implemented in this paper aiming to improve the sensitivity. A set of protein sequences with experimentally determined palmitoylation sites are downloaded from NCBI for the study. The protein-oriented cross-validation strategy is used for proper model construction. The experiments show that the regularised bio-basis function neural network significantly outperforms the two existing methods as well as the support vector machine and the radial basis function neural network. Specifically the sensitivity has been significantly improved with a slightly improved specificity (the prediction accuracy of non-palmitoylation sites).

Pp. 406-417

A Novel Kernel-Based Approach for Predicting Binding Peptides for HLA Class II Molecules

Hao Yu; Minlie Huang; Xiaoyan Zhu; Yabin Guo

Peptides that bind to Human Leukocyte Antigens (HLA) can be presented to T-cell receptor and trigger immune response. Identification of specific binding peptides is critical for immunology research and vaccine design. However, accurate prediction of peptides binding to HLA molecules is challenging. A variety of methods such as HMM and ANN have been applied to predict peptides that can bind to HLA class I molecules and therefore the number of candidate binders for experimental assay can be largely reduced. However, it is a more complex process to predict peptides that bind to HLA class II molecules. In this paper, we proposed a kernel-based method, integrating the BLOSUM matrix with string kernel to form a new kernel. The substitution score between amino acids in BLOSUM matrix is incorporated into computing the similarity between two binding peptides, which exhibits more biological meaning over traditional string kernels. The promising results of this approach show advantages than other methods.

Pp. 418-429

A Database for Prediction of Unique Peptide Motifs as Linear Epitopes

Margaret Dah-Tsyr Chang; Hao-Teng Chang; Rong-Yuan Huang; Wen-Shyong Tzou; Chih-Hong Liu; Wei-Jun Zhung; Hsien-Wei Wang; Chun-Tien Chang; Tun-Wen Pai

A linear epitope prediction database (LEPD) is designed for identifi- cation of unique peptide motifs (UPMs) as specific linear epitopes for all protein families defined by Pfam. The UPMs in LEPD are extracted from each protein family by employing reinforced merging techniques that merge the primary unique patterns into a consecutive peptide based on the neighboring relationships and various levels of parameter settings. These merged peptide motifs are examined using the physicochemical and structural propensity scales for antigenic characteristics and are verified by employing background model analysis for specificity. The filtered UPMs with high antigenicity and specificity are considered as linear epitopes that provide important information for designing antibodies and vaccines. The predicted epitopes of each protein family in the LEPD can be searched in a straightforward manner, and the corresponding chemical properties be displayed in graphical and tabular formats. To verify the specificity of the predicted epitopes, each identified UPM is analyzed by scanning over the complete genomes of a series of model organisms. For any query protein possessing a resolved 3D structure, the proposed database also provides interactive visualization of the protein structures for allocation and comparison of the predicted linear epitopes. The accuracy of the prediction algorithm is evaluated to be higher than 70% in terms of mapping a UPM as a linear epitope as compared to the known databases.

Pp. 430-440

A Novel Greedy Algorithm for the Minimum Common String Partition Problem

Dan He

The Minimum Common String Partition problem (MCSP) is to partition two given input strings into the same collection of substrings, where the number of substrings in the partition is minimized. This problem is a key problem in genome rearrangement, and is closely related to the problem of sorting by reversals with duplicates. MCSP is NP-hard, even for the most trivial case, 2-MCSP, where each letter occurs at most twice in each input string. There are various approximation algorithms which can achieve very good approximation ratios but with complicated implementations, for example, 1.5-approximation algorithm for 2-MCSP, 1.1037-approximation algorithm for 2-MCSP and a 4-approximation algorithm for 3-MCSP. There is also a simple greedy algorithm for MCSP which extracts the longest common substring from the given strings at each step. In this paper, we propose a novel greedy algorithm for MCSP, where we extract the longest common substring containing a symbol occurring only once at each step whenever there is a such symbol. We show our algorithm is more “worst case” greedy at each step than the greedy algorithm and the expected performance of our algorithm is better than that of the greedy algorithm. Our experiments show that our method achieves a better partition on average than the greedy algorithm does. Another advantage of our algorithm is that it is much faster than the greedy algorithm.

Pp. 441-452