Catálogo de publicaciones - libros

Compartir en
redes sociales


Data Integration in the Life Sciences: Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006, Proceedings

Ulf Leser ; Felix Naumann ; Barbara Eckman (eds.)

En conferencia: 3º International Workshop on Data Integration in the Life Sciences (DILS) . Hinxton, UK . July 20, 2006 - July 22, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Information Storage and Retrieval; Health Informatics; Database Management; Information Systems Applications (incl. Internet); Bioinformatics; Computer Appl. in Life Sciences

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-36593-8

ISBN electrónico

978-3-540-36595-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Knowledge Networks of Biological and Medical Data: An Exhaustive and Flexible Solution to Model Life Science Domains

Sascha Losko; Karsten Wenger; Wenzel Kalus; Andrea Ramge; Jens Wiehler; Klaus Heumann

The huge amount of unstructured information generated by academic and industrial research groups must be easily available to facilitate scientific projects. In particular, information that is conveyed by unstructured or semi-structured text represents a vast resource for the scientific community. Systems capable of mining these textual data sets are the only option to unveil the information hidden in free text on a large scale. The BioLT Literature Mining Tool allows exhaustive extraction of information from text resources. Using advanced tagger/parser mechanisms and topic-specific dictionaries, the BioLT tool delivers structured relationships. Beyond information hidden in free text, other resources in biological and medical research are relevant, including experimental data from “-omics” platforms, phenotype information and clinical data. The BioXM Knowledge Management Environment efficiently models such complex research environments. This platform enables scientists to create knowledge networks with flexible workflows for handling experimental information and metadata, including annotation or ontologies. Information from public databases can be incorporated using the embedded BioRS Integration and Retrieval System. Users can navigate and modify the information networks. Thus, research projects can be modeled and extended dynamically.

- Short Papers | Pp. 232-239

On Characterising and Identifying Mismatches in Scientific Workflows

Khalid Belhajjame; Suzanne M. Embury; Norman W. Paton

Workflows are gaining importance as a means for modelling and enacting scientific experiments. A major issue which arises when aggregating a collection of analysis operations within a workflow is the compatibility of their inputs and outputs: the analysis operations are supplied by independently developed web services which are likely to have incompatible inputs and outputs. We use the term mismatch to refer to such incompatibility. This paper characterises the mismatches a scientific workflow may suffer from and specifies mappings for their resolution.

- Short Papers | Pp. 240-247

Collection-Oriented Scientific Workflows for Integrating and Analyzing Biological Data

Timothy McPhillips; Shawn Bowers; Bertram Ludäscher

Steps in scientific workflows often generate collections of results, causing the data flowing through workflows to become increasingly nested. Because conventional workflow components (or actors) typically operate on simple or application-specific data types, additional actors often are required to manage these nested data collections. As a result, conventional workflows become increasingly complex as data becomes more nested. This paper describes a new paradigm for developing scientific workflows that transparently manages nested data collections. Collection-oriented workflows have a number of advantages over conventional approaches including simpler workflow designs (, requiring fewer actors and control-flow constructs) that are invariant under changes in data nesting. Our implementation within the K scientific workflow system enables the explicit representation of collections and collection schemas, concurrent operation over collection contents via multi-level pipeline parallelism, and allows collection-aware actors to be composed readily from conventional actors.

- Workflow | Pp. 248-263

Towards a Model of Provenance and User Views in Scientific Workflows

Shirley Cohen; Sarah Cohen-Boulakia; Susan Davidson

Scientific experiments are becoming increasingly large and complex, with a commensurate increase in the amount and complexity of data generated. Data, both intermediate and final results, is derived by chaining and nesting together multiple database searches and analytical tools. In many cases, the means by which the data are produced is not known, making the data difficult to interpret and the experiment impossible to reproduce. Provenance in scientific workflows is thus of paramount importance.

In this paper, we provide a formal model of provenance for scientific workflows which is general (i.e. can be used with existing workflow systems, such as Kepler, myGrid and Chimera) and sufficiently expressive to answer the provenance queries we encountered in a number of case studies. Interestingly, our model not only takes into account the chained and nested structure of scientific workflows, but allows asks for provenance at different levels of abstraction ().

- Workflow | Pp. 264-279

An Extensible Light-Weight XML-Based Monitoring System for Sequence Databases

Dieter Van de Craen; Frank Neven; Kerstin Koch

Life science researchers want biological information in their interest to become available to them as soon as possible. A monitoring system is a solution that relieves biologists from periodic exploration of databases. In particular, it allows them to express their interest in certain data by means of queries/constraints; they are then notified when new data arrives satisfying these queries/constraints. We describe a sequence monitoring system XSeqM where users can combine metadata queries on sequence records with constraints on an alignment against a given source sequence. The system is an XML-based solution where constraints are specified through search fields in a user-friendly web interface and which are then translated to corresponding XPath-expressions. The system is easily extensible as addition of new databases to the system then only amounts to the specification of new mappings from search fields to XPath-expressions. To protect private source sequences obtained in labs, it is imperative that researchers do not have to upload their sequences to a general untrusted system, but that they can run XSeqM locally. To keep the system light-weight, we therefore introduce an optimization technique based on query containment to reduce the number of XPath-evaluations which constitutes the bottleneck of the system. We experimentally validate this technique and show that it can drastically improve the running time.

- Workflow | Pp. 280-296