Catálogo de publicaciones - libros
Intelligent Data Engineering and Automated Learning: IDEAL 2005: 6th International Conference, Brisbane, Australia, July 6-8, 2005, Proceedings
Marcus Gallagher ; James P. Hogan ; Frederic Maire (eds.)
En conferencia: 6º International Conference on Intelligent Data Engineering and Automated Learning (IDEAL) . Brisbane, QLD, Australia . July 6, 2005 - July 8, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Database Management; Algorithm Analysis and Problem Complexity; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Information Systems Applications (incl. Internet); Computers and Society
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-26972-4
ISBN electrónico
978-3-540-31693-0
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11508069_51
Matching Peptide Sequences with Mass Spectra
K. W. Lau; B. Stapley; S. Hubbard; H. Yin
We study a method of mapping both mass spectra and sequences to feature vectors and the correlation between them. The method of calculating the feature vector from mass spectra is presented, together with a method for representing sequences. A correlation metric comparing both representations is studied. It shows strong correlation between two representation for the same peptides. It also demostrates that the effect of correlation is increased by using the longer sequences induced from the theoretical mass spectra. The method provides a promising step towards de novo sequencing.
- Bioinformatics | Pp. 390-397
doi: 10.1007/11508069_52
Extraction by Example: Induction of Structural Rules for the Analysis of Molecular Sequence Data from Heterogeneous Sources
Olivo Miotto; Tin Wee Tan; Vladimir Brusic
Biological research requires information from multiple data sources that use a variety of database-specific formats. Manual gathering of information is time consuming and error-prone, making automated data aggregation a compelling option for large studies. We describe a method for extracting information from diverse sources that involves structural rules specified . We developed a system for aggregation of biological knowledge (ABK) and used it to conduct an epidemiological study of dengue virus (DENV) sequences. Additional information on geographical origin and isolation date is critical for understanding evolutionary relationships, but this data is inconsistently structured in database entries. Using three public databases, we found that structural rules can be used successfully even when applied on inconsistently structured data that is distributed across multiple fields. High reusability, combined with the ability to integrate analysis tools, make this method suitable for a wide variety of large-scale studies involving viral sequences.
- Bioinformatics | Pp. 398-405
doi: 10.1007/11508069_53
A Multi-population Test Approach to Informative Gene Selection
Jun Luo; Jinwen Ma
This paper proposes a multi-population test method for informative gene selection of a tumor from microarray data based on the statistical multi-population test with the sample data being grouped evenly. To test the effectiveness of the multi-population test method, we use the support vector machine (SVM) to construct a tumor diagnosis system (i.e., a binary classifier) based on the identified informative genes on the colon and leukemia data. It is shown by the experiments that the constructed diagnosis system with the multi-population test method can 100% correctness rate of diagnosis on colon dataset and 97.1% correctness rate of diagnosis on leukemia dataset, respectively.
- Bioinformatics | Pp. 406-413
doi: 10.1007/11508069_54
Gene Selection of DNA Microarray Data Based on Regularization Networks
Xin Zhou; Kezhi Mao
Normally the microarray data contain a large number of genes (usually more than 1000) and a relatively small number of samples (usually fewer than 100). This makes the discriminant analysis of DNA microarray data hard to handle. Selecting important genes to the discriminant problem is hence of much practically significance in microarray data analysis. If put in the context of pattern classification, gene selection can be casted as a feature selection problem. Feature selection approaches are broadly grouped into filter and wrapper methods. The wrapper method outperforms the filter method in general. However the accuracy of wrapper methods is coupled with intensive computations. In present study, we proposed a wrapper-based gene selection algorithm by employing the Regularization Network as the classifier. Compared with classical wrapper method, the computational costs in our gene selection algorithm is significantly reduced, because the evaluation criterion we used does not demand repeated trainings in the leave-one-out procedure.
- Bioinformatics | Pp. 414-421
doi: 10.1007/11508069_55
Application of Mixture Models to Detect Differentially Expressed Genes
Liat Ben-Tovim Jones; Richard Bean; Geoff McLachlan; Justin Zhu
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.
- Bioinformatics | Pp. 422-431
doi: 10.1007/11508069_56
A Comparative Study of Two Novel Predictor Set Scoring Methods
Chia Huey Ooi; Madhu Chetty
Due to the large number of genes measured in a typical microarray dataset, feature selection plays an essential role in tumor classification. In turn, relevance and redundancy are key components in determining the optimal predictor set. However, a third component – the relative weights given to the first two also assumes an equal, if not greater importance in feature selection. Based on this third component, we developed two novel feature selection methods capable of producing high, unbiased classification accuracy in multiclass microarray dataset. In an in-depth analysis comparing the two methods, the optimal values of the relative weights are also estimated.
- Bioinformatics | Pp. 432-439
doi: 10.1007/11508069_57
Deriving Matrix of Peptide-MHC Interactions in Diabetic Mouse by Genetic Algorithm
Menaka Rajapakse; Lonce Wyse; Bertil Schmidt; Vladimir Brusic
Finding motifs that can elucidate rules that govern peptide binding to medically important receptors is important for screening targets for drugs and vaccines. This paper focuses on elucidation of peptide binding to I-A molecule of the non-obese diabetic (NOD) mouse – an animal model for insulin-dependent diabetes mellitus (IDDM). A number of proposed motifs that describe peptide binding to I-A have been proposed. These motifs results from independent experimental studies carried out on small data sets. Testing with multiple data sets showed that each of the motifs at best describes only a subset of the solution space, and these motifs therefore lack generalization ability. This study focuses on seeking a motif with higher generalization ability so that it can predict binders in all Adata sets with high accuracy. A binding score matrix representing peptide binding motif to Awas derived using genetic algorithm (GA). The evolved score matrix significantly outperformed previously reported motifs.
- Bioinformatics | Pp. 440-447
doi: 10.1007/11508069_58
SVM Based Prediction of Bacterial Transcription Start Sites
James Gordon; Michael Towsey
Identifying bacterial promoters is the key to understanding gene expression. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). Knowing the TSS position, one can predict promoter positions to within a few base pairs, and vice versa. As a route to promoter identification, we formally address the problem of TSS prediction, drawing on the RegulonDB database of known (mapped) TSS locations. The accepted method of finding promoters (and therefore TSSs) is to use position weight matrices (PWMs). We use an alternative approach based on support vector machines (SVMs). In particular, we quantify performance of several SVM models versus a PWM approach, using area under the detection-error tradeoff (DET) curve as a performance metric. SVM models are shown to outperform the PWM at TSS prediction, and to substantially reduce numbers of false positives, which are the bane of this problem.
- Bioinformatics | Pp. 448-453
doi: 10.1007/11508069_59
Exploiting Sequence Dependencies in the Prediction of Peroxisomal Proteins
Mark Wakabayashi; John Hawkins; Stefan Maetschke; Mikael Bodén
Prediction of peroxisomal matrix proteins generally depends on the presence of one of two distinct motifs at the end of the amino acid sequence. PTS1 peroxisomal proteins have a well conserved tripeptide at the C-terminal end. However, the preceding residues in the sequence arguably play a crucial role in targeting the protein to the peroxisome. Previous work in applying machine learning to the prediction of peroxisomal matrix proteins has failed to capitalize on the full extent of these dependencies. We benchmark a range of machine learning algorithms, and show that a classifier – based on the Support Vector Machine – produces more accurate results when dependencies between the conserved motif and the preceding section are exploited. We publish an updated and rigorously curated data set that results in increased prediction accuracy of most tested models.
- Bioinformatics | Pp. 454-461
doi: 10.1007/11508069_60
Protein Fold Recognition Using Neural Networks and Support Vector Machines
Nan Jiang; Wendy Xinyu Wu; Ian Mitchell
In this paper, a new fold recognition model with mixed environment-specific substitution mapping (called MESSM) is proposed with three key features: 1) a structurally-derived substitution score is generated using neural networks; 2) a mixed environment-specific substitution mapping is developed by combing the structural-derived substitution score with sequence profile from well-developed sequence substitution matrices; 3) a support vector machine is employed to measure the significance of the sequence-structure alignment. Tested on two benchmark problems, the MESSM model shows comparable performance to those more computational intensive, energy potential based fold recognition models. The results also demonstrate that the new fold recognition model with mixed substitution mapping has a better performance than the one with either structure or sequence profile only. The MESSM model presents a new way to develop an efficient tool for protein fold recognition.
- Bioinformatics | Pp. 462-469