Catálogo de publicaciones - libros
Data Integration in the Life Sciences: Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006, Proceedings
Ulf Leser ; Felix Naumann ; Barbara Eckman (eds.)
En conferencia: 3º International Workshop on Data Integration in the Life Sciences (DILS) . Hinxton, UK . July 20, 2006 - July 22, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Information Storage and Retrieval; Health Informatics; Database Management; Information Systems Applications (incl. Internet); Bioinformatics; Computer Appl. in Life Sciences
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-36593-8
ISBN electrónico
978-3-540-36595-2
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Cobertura temática
Tabla de contenidos
doi: 10.1007/11799511_21
Knowledge Networks of Biological and Medical Data: An Exhaustive and Flexible Solution to Model Life Science Domains
Sascha Losko; Karsten Wenger; Wenzel Kalus; Andrea Ramge; Jens Wiehler; Klaus Heumann
The huge amount of unstructured information generated by academic and industrial research groups must be easily available to facilitate scientific projects. In particular, information that is conveyed by unstructured or semi-structured text represents a vast resource for the scientific community. Systems capable of mining these textual data sets are the only option to unveil the information hidden in free text on a large scale. The BioLT Literature Mining Tool allows exhaustive extraction of information from text resources. Using advanced tagger/parser mechanisms and topic-specific dictionaries, the BioLT tool delivers structured relationships. Beyond information hidden in free text, other resources in biological and medical research are relevant, including experimental data from “-omics” platforms, phenotype information and clinical data. The BioXM Knowledge Management Environment efficiently models such complex research environments. This platform enables scientists to create knowledge networks with flexible workflows for handling experimental information and metadata, including annotation or ontologies. Information from public databases can be incorporated using the embedded BioRS Integration and Retrieval System. Users can navigate and modify the information networks. Thus, research projects can be modeled and extended dynamically.
- Short Papers | Pp. 232-239
doi: 10.1007/11799511_22
On Characterising and Identifying Mismatches in Scientific Workflows
Khalid Belhajjame; Suzanne M. Embury; Norman W. Paton
Workflows are gaining importance as a means for modelling and enacting scientific experiments. A major issue which arises when aggregating a collection of analysis operations within a workflow is the compatibility of their inputs and outputs: the analysis operations are supplied by independently developed web services which are likely to have incompatible inputs and outputs. We use the term mismatch to refer to such incompatibility. This paper characterises the mismatches a scientific workflow may suffer from and specifies mappings for their resolution.
- Short Papers | Pp. 240-247
doi: 10.1007/11799511_23
Collection-Oriented Scientific Workflows for Integrating and Analyzing Biological Data
Timothy McPhillips; Shawn Bowers; Bertram Ludäscher
Steps in scientific workflows often generate collections of results, causing the data flowing through workflows to become increasingly nested. Because conventional workflow components (or actors) typically operate on simple or application-specific data types, additional actors often are required to manage these nested data collections. As a result, conventional workflows become increasingly complex as data becomes more nested. This paper describes a new paradigm for developing scientific workflows that transparently manages nested data collections. Collection-oriented workflows have a number of advantages over conventional approaches including simpler workflow designs (, requiring fewer actors and control-flow constructs) that are invariant under changes in data nesting. Our implementation within the K scientific workflow system enables the explicit representation of collections and collection schemas, concurrent operation over collection contents via multi-level pipeline parallelism, and allows collection-aware actors to be composed readily from conventional actors.
- Workflow | Pp. 248-263
doi: 10.1007/11799511_24
Towards a Model of Provenance and User Views in Scientific Workflows
Shirley Cohen; Sarah Cohen-Boulakia; Susan Davidson
Scientific experiments are becoming increasingly large and complex, with a commensurate increase in the amount and complexity of data generated. Data, both intermediate and final results, is derived by chaining and nesting together multiple database searches and analytical tools. In many cases, the means by which the data are produced is not known, making the data difficult to interpret and the experiment impossible to reproduce. Provenance in scientific workflows is thus of paramount importance.
In this paper, we provide a formal model of provenance for scientific workflows which is general (i.e. can be used with existing workflow systems, such as Kepler, myGrid and Chimera) and sufficiently expressive to answer the provenance queries we encountered in a number of case studies. Interestingly, our model not only takes into account the chained and nested structure of scientific workflows, but allows asks for provenance at different levels of abstraction ().
- Workflow | Pp. 264-279
doi: 10.1007/11799511_25
An Extensible Light-Weight XML-Based Monitoring System for Sequence Databases
Dieter Van de Craen; Frank Neven; Kerstin Koch
Life science researchers want biological information in their interest to become available to them as soon as possible. A monitoring system is a solution that relieves biologists from periodic exploration of databases. In particular, it allows them to express their interest in certain data by means of queries/constraints; they are then notified when new data arrives satisfying these queries/constraints. We describe a sequence monitoring system XSeqM where users can combine metadata queries on sequence records with constraints on an alignment against a given source sequence. The system is an XML-based solution where constraints are specified through search fields in a user-friendly web interface and which are then translated to corresponding XPath-expressions. The system is easily extensible as addition of new databases to the system then only amounts to the specification of new mappings from search fields to XPath-expressions. To protect private source sequences obtained in labs, it is imperative that researchers do not have to upload their sequences to a general untrusted system, but that they can run XSeqM locally. To keep the system light-weight, we therefore introduce an optimization technique based on query containment to reduce the number of XPath-evaluations which constitutes the bottleneck of the system. We experimentally validate this technique and show that it can drastically improve the running time.
- Workflow | Pp. 280-296