Catálogo de publicaciones - libros
Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics: 5th European Conference, EvoBIO 2007, Valencia, Spain, April 11-13, 2007. Proceedings
Elena Marchiori ; Jason H. Moore ; Jagath C. Rajapakse (eds.)
En conferencia: 5º European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO) . Valencia, Spain . April 11, 2007 - April 13, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Artificial Intelligence (incl. Robotics); Programming Techniques; Computation by Abstract Devices; Algorithm Analysis and Problem Complexity; Computational Biology/Bioinformatics; Pattern Recognition
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-71782-9
ISBN electrónico
978-3-540-71783-6
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Tabla de contenidos
Identifying Regulatory Sites Using Neighborhood Species
Claudia Angelini; Luisa Cutillo; Italia De Feis; Richard van der Wath; Pietro Lio’
The annotation of transcription binding sites in new sequenced genomes is an important and challenging problem. We have previously shown how a regression model that linearly relates gene expression levels to the matching scores of nucleotide patterns allows us to identify DNA-binding sites from a collection of co-regulated genes and their nearby non-coding DNA sequences. Our methodology uses Bayesian models and stochastic search techniques to select transcription factor binding site candidates. Here we show that this methodology allows us to identify binding sites in nearby species. We present examples of annotation crossing from to . We found that the eng1 motif is also regulating a set of 9 genes in . Our framework may have an effective interest in conveying information in the annotation process of a new species. Finally we discuss a number of statistical and biological issues related to the identification of binding sites through covariates of genes expression and sequences.
Pp. 1-10
Genetic Programming and Other Machine Learning Approaches to Predict Median Oral Lethal Dose (LD) and Plasma Protein Binding Levels (%PPB) of Drugs
Francesco Archetti; Stefano Lanzeni; Enza Messina; Leonardo Vanneschi
Computational methods allowing reliable pharmacokinetics predictions for newly synthesized compounds are critically relevant for drug discovery and development. Here we present an empirical study focusing on various versions of Genetic Programming and other well known Machine Learning techniques to predict Median Oral Lethal Dose (LD) and Plasma Protein Binding (%PPB) levels. Since these two parameters respectively characterize the harmful effects and the distribution into human body of a drug, their accurate prediction is essential for the selection of effective molecules. The obtained results confirm that Genetic Programming is a promising technique for predicting pharmacokinetics parameters, both from the point of view of the accurateness and of the generalization ability.
Pp. 11-23
Hypothesis Testing with Classifier Systems for Rule-Based Risk Prediction
Flavio Baronti; Antonina Starita
Analysis of medical datasets has some specific requirements not always fulfilled by standard Machine Learning methods. In particular, heterogeneous and missing data must be tolerated, the results should be easily interpretable. Moreover, with genetic data, often the combination of two or more attributes leads to non-linear effects not detectable for each attribute on its own. We present a new ML algorithm, HCS, taking inspiration from learning classifier systems, decision trees and statistical hypothesis testing. We show the results of applying this algorithm to a well-known benchmark dataset, and to HNSCC, a dataset studying the connection between smoke and genetic patterns to the development of oral cancer.
Pp. 24-34
Robust Peak Detection and Alignment of nanoLC-FT Mass Spectrometry Data
Marius C. Codrea; Connie R. Jiménez; Sander Piersma; Jaap Heringa; Elena Marchiori
In liquid chromatography-mass spectrometry (LC-MS) based expression proteomics, samples from different groups are analyzed comparatively in order to detect differences that can possibly be caused by the disease under study (potential biomarker detection). To this end, advanced computational techniques are needed. Peak alignment and detection are two key steps in the analysis process of LC-MS datasets. In this paper we propose an algorithm for LC-MS peak detection and alignment. The goal of the algorithm is to group together peaks generated by the same peptide but detected in different samples. It employs clustering with a new weighted similarity measure and automatic selection of the number of clusters. Moreover, it supports parallelization by acting on blocks. Finally, it allows incorporation of available domain knowledge for constraining and refining the search for aligned peaks. Application of the algorithm to a LC-MS dataset generated by a spike-in experiment substantiates the effectiveness of the proposed technique.
Pp. 35-46
One-Versus-One and One-Versus-All Multiclass SVM-RFE for Gene Selection in Cancer Classification
Kai-Bo Duan; Jagath C. Rajapakse; Minh N. Nguyen
We propose a feature selection method for multiclass classification. The proposed method selects features in backward elimination and computes feature ranking scores at each step from analysis of weight vectors of multiple two-class linear Support Vector Machine classifiers from one-versus-one or one-versus-all decomposition of a multi-class classification problem. We evaluated the proposed method on three gene expression datasets for multiclass cancer classification. For comparison, one filtering feature selection method was included in the numerical study. The study demonstrates the effectiveness of the proposed method in selecting a compact set of genes to ensure a good classification accuracy.
Pp. 47-56
Understanding Signal Sequences with Machine Learning
Jean-Luc Falcone; Renée Kreuter; Dominique Belin; Bastien Chopard
Protein translocation, the transport of newly synthesized proteins out of the cell, is a fundamental mechanism of life. We are interested in understanding how cells recognize the proteins that are to be exported and how the necessary information is encoded in the so called “Signal Sequences”. In this paper, we address these problems by building a physico-chemical model of signal sequence recognition, using experimental data. This model was built using . In a first phase the classifier were built from a set of features derived from the current knowledge about signal sequences. It was then expanded by feature generation with . The resulting predictors are efficient, achieving an accuracy of more than 99% with our wild-type proteins set. Furthermore the generated features can give us a biological insight about the export mechanism. Our tool is freely available through a web interface.
Pp. 57-67
Targeting Differentially Co-regulated Genes by Multiobjective and Multimodal Optimization
Oscar Harari; Cristina Rubio-Escudero; Igor Zwir
A critical challenge of the postgenomic era is to understand how genes are differentially regulated in and between genetic networks. The fact that such co-regulated genes may be differentially regulated suggests that subtle differences in the shared -acting regulatory elements are likely significant, however it is unknown which of these features increase or reduce expression of genes. In principle, this expression can be measured by microarray experiments, though they incorporate systematic errors, and moreover produce a limited classification (e.g. up/down regulated genes). In this work, we present an unsupervised machine learning method to tackle the complexities governing gene expression, which considers gene expression data as one feature among many. It analyzes features concurrently, recognizes dynamic relations and generates profiles, which are groups of promoterssharing common features. The method makes use of multiobjective techniques to evaluate the performance of profiles, and has a multimodal approach to produce alternative descriptions of same expression target. We apply this method to probe the regulatory networks governed by the PhoP/PhoQ two-component system in the enteric bacteria and . Our analysis uncovered profiles that were experimentally validated, suggesting correlations between promoter regulatory features and gene expression kinetics measured by green fluorescent protein (GFP) assays.
Pp. 68-77
Modeling Genetic Networks: Comparison of Static and Dynamic Models
Cristina Rubio-Escudero; Oscar Harari; Oscar Cordón; Igor Zwir
Biomedical research has been revolutionized by high-throughput techniques and the enormous amount of biological data they are able to generate. The interest shown over network models and systems biology is rapidly raising. Genetic networks arise as an essential task to mine these data since they explain the function of genes in terms of how they influence other genes. Many modeling approaches have been proposed for building genetic networks up. However, it is not clear what the advantages and disadvantages of each model are. There are several ways to discriminate network building models, being one of the most important whether the data being mined presents a static or dynamic fashion. In this work we compare static and dynamic models over a problem related to the inflammation and the host response to injury. We show how both models provide complementary information and cross-validate the obtained results.
Pp. 78-89
A Genetic Embedded Approach for Gene Selection and Classification of Microarray Data
Jose Crispin Hernandez Hernandez; Béatrice Duval; Jin-Kao Hao
Classification of microarray data requires the selection of subsets of relevant genes in order to achieve good classification performance. This article presents a genetic embedded approach that performs the selection task for a SVM classifier. The main feature of the proposed approach concerns the highly specialized crossover and mutation operators that take into account gene ranking information provided by the SVM classifier. The effectiveness of our approach is assessed using three well-known benchmark data sets from the literature, showing highly competitive results.
Pp. 90-101
Modeling the Shoot Apical Meristem in : Parameter Estimation for Spatial Pattern Formation
Tim Hohm; Eckart Zitzler
Understanding the self-regulatory mechanisms controlling the spatial and temporal structure of multicellular organisms represents one of the major challenges in molecular biology. In the context of plants, shoot apical meristems (SAMs), which are populations of dividing, undifferentiated cells that generate organs at the tips of stems and branches throughout the life of a plant, are of particular interest and currently studied intensively. Here, one key goal is to identify the genetic regulatory network organizing the structure of a SAM and generating the corresponding spatial gene expression patterns.
This paper addresses one step in the design of SAM models based on ordinary differential equations (ODEs): parameter estimation for spatial pattern formation. We assume that the topology of the genetic regulatory network is given, while the parameters of an ODE system need to be determined such that a particular stable pattern over the SAM cell population emerges. To this end, we propose an evolutionary algorithm-based approach and investigate different ways to improve the efficiency of the search process. Preliminary results are presented for the Brusselator, a well-known reaction-diffusion system.
Pp. 102-113