Catálogo de publicaciones - libros
Bioinformatics Research and Applications: Third International Symposium, ISBRA 2007, Atlanta, GA, USA, May 7-10, 2007. Proceedings
Ion Măndoiu ; Alexander Zelikovsky (eds.)
En conferencia: 3º International Symposium on Bioinformatics Research and Applications (ISBRA) . Atlanta, GA, USA . May 7, 2007 - May 10, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-72030-0
ISBN electrónico
978-3-540-72031-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
Mining Discriminative Distance Context of Transcription Factor Binding Sites on ChIP Enriched Regions
Hyunmin Kim; Katherina J. Kechris; Lawrence Hunter
Genome-wide identification of transcription factor binding sites (TFBSs) is critical for understanding transcriptional regulation of the gene expression network. ChIP-chip experiments accelerate the procedure of mapping target TFBSs for diverse cellular conditions. We address the problem of discriminating potential TFBSs in ChIP-enriched regions from those of non ChIP-enriched regions using ensemble rule algorithms and a variety of predictive variables, including those based on sequence and chromosomal context. In addition, we developed an input variable based on a scoring scheme that reflects the distance context of surrounding putative TFBSs. Focusing on hepatocyte regulators, this novel feature improved the performance of identifying potential TFBSs, and the measured importance of the predictive variables was consistent with biological meanings. In summary, we found that distance-based features are better discriminators of ChIP-enriched TFBS over other features based on sequence or chromosomal context.
Pp. 338-349
Enhanced Prediction of Cleavage in Bovine Precursor Sequences
Allison N. Tegge; Sandra L. Rodriguez-Zas; J. V. Sweedler; Bruce R. Southey
Neuropeptides are important signaling molecules that influence a wide variety of biological processes. The prediction of neuropeptides from precursor proteins is difficult due to the numerous and complex series of enzymatic processing and posttranslational modification steps. Bioinformatics prediction of cleavage sites using statistical models was used to overcome the challenge of identifying neuropeptides. Binary logistic models were trained on a bovine dataset and validated on a mammalian dataset that contained no bovine precursors. A model that incorporated amino acid locations and properties provided more accurate and precise cleavage predictions than one using amino acid locations alone. All models consistently resulted in highly accurate predictions of cleavage sites in both datasets. The logistic model proposed can accurately predict cleavage sites in mammalian species and minimize the time consuming and costly experimental validation of neuropeptides.
Pp. 350-360
Invited Talk: A Computational Study of Bidirectional Promoters in the Human Genome
Mary Qu Yang; Laura L. Elnitski
A is a region along a strand of DNA that regulates the expression of two genes flanking the region. Each of these genes is transcibed in a direction that points away from the other gene; two such genes are said to be in a configuration. We search the UCSC List of Known Genes and GenBank Expressed Sequence Tag (EST) data for pairs of genes in such a configuration in order to identify new bidirectional promoters.
The EST data constitutes a larger and more intricate dataset than the UCSC List of Known Genes. However, working with EST data presents a challenge, as the EST database may be highly redundant and may also contain overlapping ESTs. To deal with these problems, we have developed an algorithm to identify bidirectional promoters based on the above data sources; the algorithm is capable of handling redundant ESTs, and also ESTs that overlap or disagree in orientation.
This analysis resulted in the identification of thousands of new candidate head-to-head gene pairs, corroborated the 5’ ends of many known human genes, revealed new 5’ exons of previously characterized genes, and in some cases identified novel genes. Further analyses yielded evidence for coordinate expression of genes in a head-to-head configuration, and examined the prevalence of bidirectional promoters in different biological pathways.
Pp. 361-371
The Identification of Antisense Gene Pairs Through Available Software
Mark J. Lawson; Liqing Zhang
Antisense genes have been shown to have a variety of functions in both prokaryotes and recently in eukaryotes as well. They are hypothesized to be an important part of every genome and have been shown to be evolutionarily conserved as well. Naturally, it is in our interest to develop a software for identifying antisense pairs. While a variety of approaches and software do exist, each approach has its limitations and the software is not meant for large-scale analyses for identifying both cis and trans antisense genes. Here we present a novel way to identify antisense genes and show the results we obtained through it. While in no means a perfect solution, we do manage to show a possible way that may lead to more accurate prediction of antisense genes.
Pp. 372-381
Inferring Weak Adaptations and Selection Biases in Proteins from Composition and Substitution Matrices
Steinar Thorvaldsen; Elinor Ytterstad; Tor Flå
There is a desire for increasing use of statistical methods in analysing the growing amounts of bio-sequences. We present statistical methods that are useful when a protein alignment can be divided into two groups based on known features or traits. The approach is based on stratification of the data, and to show the applicability of the methods we present analysis of genomic data from proteobacteria orders. A dataset of 25 periplasmic/extracellular bacterial enzyme proteins was compiled to identify genotypic characteristics that separate the cold adapted proteins from ortholog sequences with a higher optimal growth temperature. Our results reveal that the cold adapted protein has a significantly more positively charged exterior. Life in a cold climate seems to be enabled by many minor structural modifications rather than a particular amino acid substitution. Redistribution of charge might be one of the most important signatures for cold adaptation.
Pp. 382-393
Markov Model Variants for Appraisal of Coding Potential in Plant DNA
Michael E. Sparks; Volker Brendel; Karin S. Dorman
Markov chain models are commonly used for content-based appraisal of coding potential in genomic DNA. The ability of these models to distinguish coding from non-coding sequences depends on the method of parameter estimation, the validity of the estimated parameters for the species of interest, and the extent to which oligomer usage characterizes coding potential. We assessed performances of Markov chain models in two model plant species, and rice, comparing canonical fixed-order, -interpolated, and top-down and bottom-up deleted interpolated Markov models. All methods achieved comparable identification accuracies, with differences usually within statistical error. Because classification performance is related to G+C composition, we also considered a strategy where training and test data are first partitioned by G+C content. All methods demonstrated considerable gains in accuracy under this approach, especially in rice. The methods studied were implemented in the C programming language and organized into a library, , distributed under the GNU LGPL.
Pp. 394-405
Predicting Palmitoylation Sites Using a Regularised Bio-basis Function Neural Network
Zheng Rong Yang
Palmitoylation is one of the most important post-translational modifications involving molecular signalling activities. Two simple methods have been developed very recently for predicting palmitoylation sites, but the sensitivity (the prediction accuracy of palmitoylation sites) of both methods is low (< 65%). A regularised bio-basis function neural network is implemented in this paper aiming to improve the sensitivity. A set of protein sequences with experimentally determined palmitoylation sites are downloaded from NCBI for the study. The protein-oriented cross-validation strategy is used for proper model construction. The experiments show that the regularised bio-basis function neural network significantly outperforms the two existing methods as well as the support vector machine and the radial basis function neural network. Specifically the sensitivity has been significantly improved with a slightly improved specificity (the prediction accuracy of non-palmitoylation sites).
Pp. 406-417
A Novel Kernel-Based Approach for Predicting Binding Peptides for HLA Class II Molecules
Hao Yu; Minlie Huang; Xiaoyan Zhu; Yabin Guo
Peptides that bind to Human Leukocyte Antigens (HLA) can be presented to T-cell receptor and trigger immune response. Identification of specific binding peptides is critical for immunology research and vaccine design. However, accurate prediction of peptides binding to HLA molecules is challenging. A variety of methods such as HMM and ANN have been applied to predict peptides that can bind to HLA class I molecules and therefore the number of candidate binders for experimental assay can be largely reduced. However, it is a more complex process to predict peptides that bind to HLA class II molecules. In this paper, we proposed a kernel-based method, integrating the BLOSUM matrix with string kernel to form a new kernel. The substitution score between amino acids in BLOSUM matrix is incorporated into computing the similarity between two binding peptides, which exhibits more biological meaning over traditional string kernels. The promising results of this approach show advantages than other methods.
Pp. 418-429
A Database for Prediction of Unique Peptide Motifs as Linear Epitopes
Margaret Dah-Tsyr Chang; Hao-Teng Chang; Rong-Yuan Huang; Wen-Shyong Tzou; Chih-Hong Liu; Wei-Jun Zhung; Hsien-Wei Wang; Chun-Tien Chang; Tun-Wen Pai
A linear epitope prediction database (LEPD) is designed for identifi- cation of unique peptide motifs (UPMs) as specific linear epitopes for all protein families defined by Pfam. The UPMs in LEPD are extracted from each protein family by employing reinforced merging techniques that merge the primary unique patterns into a consecutive peptide based on the neighboring relationships and various levels of parameter settings. These merged peptide motifs are examined using the physicochemical and structural propensity scales for antigenic characteristics and are verified by employing background model analysis for specificity. The filtered UPMs with high antigenicity and specificity are considered as linear epitopes that provide important information for designing antibodies and vaccines. The predicted epitopes of each protein family in the LEPD can be searched in a straightforward manner, and the corresponding chemical properties be displayed in graphical and tabular formats. To verify the specificity of the predicted epitopes, each identified UPM is analyzed by scanning over the complete genomes of a series of model organisms. For any query protein possessing a resolved 3D structure, the proposed database also provides interactive visualization of the protein structures for allocation and comparison of the predicted linear epitopes. The accuracy of the prediction algorithm is evaluated to be higher than 70% in terms of mapping a UPM as a linear epitope as compared to the known databases.
Pp. 430-440
A Novel Greedy Algorithm for the Minimum Common String Partition Problem
Dan He
The Minimum Common String Partition problem (MCSP) is to partition two given input strings into the same collection of substrings, where the number of substrings in the partition is minimized. This problem is a key problem in genome rearrangement, and is closely related to the problem of sorting by reversals with duplicates. MCSP is NP-hard, even for the most trivial case, 2-MCSP, where each letter occurs at most twice in each input string. There are various approximation algorithms which can achieve very good approximation ratios but with complicated implementations, for example, 1.5-approximation algorithm for 2-MCSP, 1.1037-approximation algorithm for 2-MCSP and a 4-approximation algorithm for 3-MCSP. There is also a simple greedy algorithm for MCSP which extracts the longest common substring from the given strings at each step. In this paper, we propose a novel greedy algorithm for MCSP, where we extract the longest common substring containing a symbol occurring only once at each step whenever there is a such symbol. We show our algorithm is more “worst case” greedy at each step than the greedy algorithm and the expected performance of our algorithm is better than that of the greedy algorithm. Our experiments show that our method achieves a better partition on average than the greedy algorithm does. Another advantage of our algorithm is that it is much faster than the greedy algorithm.
Pp. 441-452