Catálogo de publicaciones - libros

Compartir en
redes sociales


Intelligent Data Engineering and Automated Learning: IDEAL 2005: 6th International Conference, Brisbane, Australia, July 6-8, 2005, Proceedings

Marcus Gallagher ; James P. Hogan ; Frederic Maire (eds.)

En conferencia: 6º International Conference on Intelligent Data Engineering and Automated Learning (IDEAL) . Brisbane, QLD, Australia . July 6, 2005 - July 8, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Database Management; Algorithm Analysis and Problem Complexity; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Information Systems Applications (incl. Internet); Computers and Society

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26972-4

ISBN electrónico

978-3-540-31693-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Matching Peptide Sequences with Mass Spectra

K. W. Lau; B. Stapley; S. Hubbard; H. Yin

We study a method of mapping both mass spectra and sequences to feature vectors and the correlation between them. The method of calculating the feature vector from mass spectra is presented, together with a method for representing sequences. A correlation metric comparing both representations is studied. It shows strong correlation between two representation for the same peptides. It also demostrates that the effect of correlation is increased by using the longer sequences induced from the theoretical mass spectra. The method provides a promising step towards de novo sequencing.

- Bioinformatics | Pp. 390-397

Extraction by Example: Induction of Structural Rules for the Analysis of Molecular Sequence Data from Heterogeneous Sources

Olivo Miotto; Tin Wee Tan; Vladimir Brusic

Biological research requires information from multiple data sources that use a variety of database-specific formats. Manual gathering of information is time consuming and error-prone, making automated data aggregation a compelling option for large studies. We describe a method for extracting information from diverse sources that involves structural rules specified . We developed a system for aggregation of biological knowledge (ABK) and used it to conduct an epidemiological study of dengue virus (DENV) sequences. Additional information on geographical origin and isolation date is critical for understanding evolutionary relationships, but this data is inconsistently structured in database entries. Using three public databases, we found that structural rules can be used successfully even when applied on inconsistently structured data that is distributed across multiple fields. High reusability, combined with the ability to integrate analysis tools, make this method suitable for a wide variety of large-scale studies involving viral sequences.

- Bioinformatics | Pp. 398-405

A Multi-population Test Approach to Informative Gene Selection

Jun Luo; Jinwen Ma

This paper proposes a multi-population test method for informative gene selection of a tumor from microarray data based on the statistical multi-population test with the sample data being grouped evenly. To test the effectiveness of the multi-population test method, we use the support vector machine (SVM) to construct a tumor diagnosis system (i.e., a binary classifier) based on the identified informative genes on the colon and leukemia data. It is shown by the experiments that the constructed diagnosis system with the multi-population test method can 100% correctness rate of diagnosis on colon dataset and 97.1% correctness rate of diagnosis on leukemia dataset, respectively.

- Bioinformatics | Pp. 406-413

Gene Selection of DNA Microarray Data Based on Regularization Networks

Xin Zhou; Kezhi Mao

Normally the microarray data contain a large number of genes (usually more than 1000) and a relatively small number of samples (usually fewer than 100). This makes the discriminant analysis of DNA microarray data hard to handle. Selecting important genes to the discriminant problem is hence of much practically significance in microarray data analysis. If put in the context of pattern classification, gene selection can be casted as a feature selection problem. Feature selection approaches are broadly grouped into filter and wrapper methods. The wrapper method outperforms the filter method in general. However the accuracy of wrapper methods is coupled with intensive computations. In present study, we proposed a wrapper-based gene selection algorithm by employing the Regularization Network as the classifier. Compared with classical wrapper method, the computational costs in our gene selection algorithm is significantly reduced, because the evaluation criterion we used does not demand repeated trainings in the leave-one-out procedure.

- Bioinformatics | Pp. 414-421

Application of Mixture Models to Detect Differentially Expressed Genes

Liat Ben-Tovim Jones; Richard Bean; Geoff McLachlan; Justin Zhu

An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.

- Bioinformatics | Pp. 422-431

A Comparative Study of Two Novel Predictor Set Scoring Methods

Chia Huey Ooi; Madhu Chetty

Due to the large number of genes measured in a typical microarray dataset, feature selection plays an essential role in tumor classification. In turn, relevance and redundancy are key components in determining the optimal predictor set. However, a third component – the relative weights given to the first two also assumes an equal, if not greater importance in feature selection. Based on this third component, we developed two novel feature selection methods capable of producing high, unbiased classification accuracy in multiclass microarray dataset. In an in-depth analysis comparing the two methods, the optimal values of the relative weights are also estimated.

- Bioinformatics | Pp. 432-439

Deriving Matrix of Peptide-MHC Interactions in Diabetic Mouse by Genetic Algorithm

Menaka Rajapakse; Lonce Wyse; Bertil Schmidt; Vladimir Brusic

Finding motifs that can elucidate rules that govern peptide binding to medically important receptors is important for screening targets for drugs and vaccines. This paper focuses on elucidation of peptide binding to I-A molecule of the non-obese diabetic (NOD) mouse – an animal model for insulin-dependent diabetes mellitus (IDDM). A number of proposed motifs that describe peptide binding to I-A have been proposed. These motifs results from independent experimental studies carried out on small data sets. Testing with multiple data sets showed that each of the motifs at best describes only a subset of the solution space, and these motifs therefore lack generalization ability. This study focuses on seeking a motif with higher generalization ability so that it can predict binders in all Adata sets with high accuracy. A binding score matrix representing peptide binding motif to Awas derived using genetic algorithm (GA). The evolved score matrix significantly outperformed previously reported motifs.

- Bioinformatics | Pp. 440-447

SVM Based Prediction of Bacterial Transcription Start Sites

James Gordon; Michael Towsey

Identifying bacterial promoters is the key to understanding gene expression. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). Knowing the TSS position, one can predict promoter positions to within a few base pairs, and vice versa. As a route to promoter identification, we formally address the problem of TSS prediction, drawing on the RegulonDB database of known (mapped) TSS locations. The accepted method of finding promoters (and therefore TSSs) is to use position weight matrices (PWMs). We use an alternative approach based on support vector machines (SVMs). In particular, we quantify performance of several SVM models versus a PWM approach, using area under the detection-error tradeoff (DET) curve as a performance metric. SVM models are shown to outperform the PWM at TSS prediction, and to substantially reduce numbers of false positives, which are the bane of this problem.

- Bioinformatics | Pp. 448-453

Exploiting Sequence Dependencies in the Prediction of Peroxisomal Proteins

Mark Wakabayashi; John Hawkins; Stefan Maetschke; Mikael Bodén

Prediction of peroxisomal matrix proteins generally depends on the presence of one of two distinct motifs at the end of the amino acid sequence. PTS1 peroxisomal proteins have a well conserved tripeptide at the C-terminal end. However, the preceding residues in the sequence arguably play a crucial role in targeting the protein to the peroxisome. Previous work in applying machine learning to the prediction of peroxisomal matrix proteins has failed to capitalize on the full extent of these dependencies. We benchmark a range of machine learning algorithms, and show that a classifier – based on the Support Vector Machine – produces more accurate results when dependencies between the conserved motif and the preceding section are exploited. We publish an updated and rigorously curated data set that results in increased prediction accuracy of most tested models.

- Bioinformatics | Pp. 454-461

Protein Fold Recognition Using Neural Networks and Support Vector Machines

Nan Jiang; Wendy Xinyu Wu; Ian Mitchell

In this paper, a new fold recognition model with mixed environment-specific substitution mapping (called MESSM) is proposed with three key features: 1) a structurally-derived substitution score is generated using neural networks; 2) a mixed environment-specific substitution mapping is developed by combing the structural-derived substitution score with sequence profile from well-developed sequence substitution matrices; 3) a support vector machine is employed to measure the significance of the sequence-structure alignment. Tested on two benchmark problems, the MESSM model shows comparable performance to those more computational intensive, energy potential based fold recognition models. The results also demonstrate that the new fold recognition model with mixed substitution mapping has a better performance than the one with either structure or sequence profile only. The MESSM model presents a new way to develop an efficient tool for protein fold recognition.

- Bioinformatics | Pp. 462-469