Catálogo de publicaciones - libros

Compartir en
redes sociales


Data Integration in the Life Sciences: Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006, Proceedings

Ulf Leser ; Felix Naumann ; Barbara Eckman (eds.)

En conferencia: 3º International Workshop on Data Integration in the Life Sciences (DILS) . Hinxton, UK . July 20, 2006 - July 22, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Health Informatics; Database Management; Information Systems Applications (incl. Internet); Bioinformatics; Computer Appl. in Life Sciences

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-36593-8

ISBN electrónico

978-3-540-36595-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Data Structures for Genome Annotation, Alternative Splicing, and Validation

Sven Mielordt; Ivo Grosse; Jürgen Kleffe

To establish a clean basis for studying alternative splicing and gene regulation in life science projects, a powerful data modeling and also a strict validation procedure for assigning levels of reliability to given gene models is essential. One common problem of public genome databases are insufficiently organized and linked description data, which make it difficult to study relations of the alternative isoforms of a gene that are relevant for medi cine and plant genome research. This is a severe obstacle for the integration of biological data and motivated us to establish a new modeling instance and that we call splice template or sTMP. Every sTMP has a unique splicing pattern, but the length of the first and the last exon remains undefined. This allows to model different gene isoforms with the same splicing pattern. By utilizing this more fine-grained data structure, many cases of plurivalent mRNA-CDS relations are uncovered. There are more than 3,000 extra CDSs in the human genome compatible with the categories sTMP, mRNA and CDS, which exceed the classical one-to-one relations of mRNAs and CDSs. In one case, 11 extra CDSs are compatible with one mRNA. Crosslinks between mRNAs derived from different sTMPs leading to the same CDS are now accessible as well as disease-related ruptures in UTR regions. This allows discovering and validating disease and tissue specific differences in alternative splicing, gene expression and regulation. Another problem in public databases is a too much relaxed standard for labeling genes “confirmed by ESTs and full-length-cDNAs.” We provide a pipeline that handles gene annotations from different sources, integrates them into complex gene models and assigns strict validation tags, constrained by a local low-error model for the alignments of genome annotation and transcripts. The data structures are being implemented and made publicly available at the Plant Data Warehouse of the Bioinformatics Center Gatersleben-Halle (http://portal.bic-gh.de/sTMP).

- Systems I | Pp. 114-123

BioFuice: Mapping-Based Data Integration in Bioinformatics

Toralf Kirsten; Erhard Rahm

We introduce the BioFuice approach for integrating data from different private and public data sources and ontologies. BioFuice follows a peer-to-peer-like data integration based on bidirectional mappings. Sources and mappings are associated with a domain model to support a semantically meaningful interoperability. BioFuice extends the generic iFuice integration platform which utilizes specific operators for data fusion and workflow-like script programs. BioFuice supports explorative data analysis and query and search capabilities. We outline the integration approach by an illustrating scenario, the architecture of BioFuice and its query interface.

- Systems I | Pp. 124-135

A Method for Similarity-Based Grouping of Biological Data

Vaida Jakonienė; David Rundqvist; Patrick Lambrix

Similarity-based grouping of data entries in one or more data sources is a task underlying many different data management tasks, such as, structuring search results, removal of redundancy in databases and data integration. Similarity-based grouping of data entries is not a trivial task in the context of life science data sources as the stored data is complex, highly correlated and represented at different levels of granularity. The contribution of this paper is two-fold. 1) We propose a method for similarity-based grouping and 2) we show results from test cases. As the main steps the method contains specification of grouping rules, pairwise grouping between entries, actual grouping of similar entries, and evaluation and analysis of the results. Often, different strategies can be used in the different steps. The method enables exploration of the influence of the choices and supports evaluation of the results with respect to given classifications. The grouping method is illustrated by test cases based on different strategies and classifications. The results show the complexity of the similarity-based grouping tasks and give deeper insights in the selected grouping tasks, the analyzed data source, and the influence of different strategies on the results.

- Potpourri | Pp. 136-151

On Querying OBO Ontologies Using a DAG Pattern Query Language

Amarnath Gupta; Simone Santini

The Open Biomedical Ontologies (OBO) is a consortium that serves as a repository of ontologies that are structured like directed acyclic graphs. In this paper we present a language DQL for querying a database of directed acyclic graphs. The query language has a comprehension style syntax and contains a pattern specification sub-language DPL. DPL can be viewed as an extension of tree-pattern query language like XPath. The language allows extraction of nodes, paths and subgraphs from DAGs, and permits construction of result structures by composing them. We show that using such a language on OBO ontologies (such as the gene ontology), we can express more complex and scientifically valuable queries.

- Potpourri | Pp. 152-167

Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases

Greeshma Neglur; Robert L. Grossman; Natalia Maltsev; Clement Yu

This paper describes a technique for efficiently searching metabolic pathways similar to a given query pathway, from a pathway database. Metabolic pathways can be converted into labeled directed graphs where the nodes represent chemical compounds. Similarity between two graphs can be computed using a metric based on Maximal Common Subgraph (MCS). By maintaining an inverted file that indexes all pathways in a database on their edges, our algorithm finds and ranks all pathways similar to the user input query pathway in time, which is linear in the total number of occurrences of the edges in common with the query in the entire database.

- Potpourri | Pp. 168-184

Arevir: A Secure Platform for Designing Personalized Antiretroviral Therapies Against HIV

Kirsten Roomp; Niko Beerenwinkel; Tobias Sing; Eugen Schülter; Joachim Büch; Saleta Sierra-Aragon; Martin Däumer; Daniel Hoffmann; Rolf Kaiser; Thomas Lengauer; Joachim Selbig

Despite the availability of antiretroviral combination therapies, success in drug treatment of HIV-infected patients is limited. One reason for therapy failure is the development of drug-resistant genetic variants. In principle, the viral genomic sequence provides resistance information and could thus guide the selection of an optimal drug combination. In practice however, the benefit of this procedure is impaired by (1) the difficulty in inferring the clinically relevant information from the genotype of the virus and (2) the restricted availability of this information. We have developed a secure platform for collaborative research aimed at optimizing anti-HIV therapies, called . A relational database schema was designed and implemented together with a web-based user interface. Our system provides a basis for monitoring patients, decision-support, and computational analyses. Thus, it merges clinical, diagnostic and bioinformatics efforts to exploit genomic and patient therapy data in clinical practice.

- Systems II | Pp. 185-194

The Distributed Annotation System for Integration of Biological Data

Andreas Prlić; Ewan Birney; Tony Cox; Thomas A. Down; Rob Finn; Stefan Gräf; David Jackson; Andreas Kähäri; Eugene Kulesha; Roger Pettett; James Smith; Jim Stalker; Tim J. P. Hubbard

The Distributed Annotation System (DAS) is a protocol for sharing of biological data which allows for dynamical data integration. It has become widely used in both the genome and protein bioinformatics communities. Here we provide an overview of the available DAS infrastructure and present our latest developments, including a registration server that facilitates service discovery by DAS clients while automatically monitoring service availability. Currently there are 108 registered DAS servers, provided by 24 institutions in 10 countries.

- Systems II | Pp. 195-203

An Information Management System for Collaboration Within Distributed Working Environment

Maria Samsonova; Andrei Pisarev; Konstantin Kozlov; Ekaterina Poustelnikova; Arthur Tkachenko

Over a period of several years we apply the systems biology approach to investigate the dynamic regulatory mechanisms controlling the expression of segmentation genes in Drosophila embryo. Due to ongoing data acquisition, development of new processing and analysis methods, as well as modification and improvement of old ones serious problems arose with data and workflows management. Different geographical location of research groups poses additional difficulties. To solve these problems we have developed an information management system using multiagent and REST architectures. This system is easily extendable to deal with new data processing and analysis methods, flexible in specification and modification of these methods, scalable and supports distributed processing and analysis of data.

- Systems II | Pp. 204-215

Ontology Analysis on Complexity and Evolution Based on Conceptual Model

Zhe Yang; Dalu Zhang; Chuan Ye

With the tremendous development in size, the complexity of ontology increases. Thus ontology evaluation becomes extremely important for developers to determine the fundamental characteristics of ontologies in order to improve the quality, estimate cost and reduce future maintenance. Our research examines the concepts and their hierarchy in ontology conceptual model, the common feature of most ontologies, which reflects the fundamental complexity. We suggest some well-defined metrics of complexity, which mainly examine the quantity, ratio and correlativity of concepts and relationships, to evaluate ontology from the viewpoint of complexity and evolution. In the study, we measured three ontologies in Gene Ontology to verify our metrics. The results indicate that these metrics works well, and the biological process ontology is the most complex one from the view of complexity, and the molecular function ontology is the unsteadiest one from the view of evolution.

- Short Papers | Pp. 216-223

Distributed Execution of Workflows in the INB

Ismael Navas-Delgado; Antonio J. Pérez; Jose F. Aldana-Montes; Oswaldo Trelles

Our workflow platform offers a view of the different tools available as a single and uniform pool of services readily available for enhancing query processing. This proposal is based on an architecture for publishing biological data and services, and is designed to be a flexible client for making use of BioMOBY servers, extending them with persistency of the information retrieved for each user. We also present in this paper some biological results, which have been obtained by taking advantage of the proposed workflow execution system. This work has been developed and implemented in the National Institute for Bioinformatics (INB) in Spain (available at http://www.inab.org/MOWServ).

- Short Papers | Pp. 224-231