Catálogo de publicaciones - libros

Compartir en
redes sociales


Distributed, High-Performance and Grid Computing in Computational Biology: International Workshop, GCCB 2006, Eilat, Israel, January 21, 2007. Proceedings

Werner Dubitzky ; Assaf Schuster ; Peter M. A. Sloot ; Michael Schroeder ; Mathilde Romberg (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-69841-8

ISBN electrónico

978-3-540-69968-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Combining a High-Throughput Bioinformatics Grid and Bioinformatics Web Services

Chunyan Wang; Paul M. K. Gordon; Andrei L. Turinsky; Jason Burgess; Terry Dalton; Christoph W. Sensen

We have created a high-throughput grid for biological sequence analysis, which is freely accessible via bioinformatics Web services. The system allows the execution of computationally intensive sequence alignment algorithms, such as Smith-Waterman or hidden Markov model searches, with speedups up to three orders of magnitude over single-CPU installations. Users around the world can now process highly sensitive sequence alignments with a turnaround time similar to that of BLAST tools. The grid combines high-throughput accelerators at two bioinformatics facilities in different geographical locations. The tools include TimeLogic DeCypher boards, a Paracel GeneMatcher2 accelerator, and Paracel BlastMachines. The Sun N1 Grid Engine software performs distributed resource management. Clients communicate with the grid through existing open BioMOBY Web services infrastructure. We also illustrate bioinformatics grid strategies for distributed load balancing, and report several nontrivial technical solutions that may serve as templates for adaptation by other bioinformatics groups.

- Session 1a. “Sequence Analysis” | Pp. 1-10

Using Public Resource Computing and Systematic Pre-calculation for Large Scale Sequence Analysis

Thomas Rattei; Mathias Walter; Roland Arnold; David P. Anderson; Werner Mewes

High volumes of serial computational tasks in bioinformatics, such as homology searches or profile matching, are often executed in distributed environments using classical batching systems like LSF or Sun Grid-Engine. Despite their simple usability they are limited to organizationally owned resources. In contrast to proprietary projects that implement large-scale grids we report on a grid-enabled solution for sequence homology and protein domain searches using BOINC, the Berkeley Open Infrastructure for Network Computing. We demonstrate that BOINC provides a powerful and versatile platform for public resource computing in bioinformatics that makes large-scale pre-calculation of sequence analyses feasible. The FASTA-/Smith-Watermanand HMMer applications for BOINC are freely available from the authors upon request. Data from SIMAP is publicly available through Web-Services at http://mips.gsf.de/simap.

- Session 1a. “Sequence Analysis” | Pp. 11-18

Accelerated microRNA-Precursor Detection Using the Smith-Waterman Algorithm on FPGAs

Patrick May; Gunnar W. Klau; Markus Bauer; Thomas Steinke

During the last few years more and more functionalities of RNA have been discovered that were previously thought of being carried out by proteins alone. One of the most striking discoveries was the detection of microRNAs, a class of noncoding RNAs that play an important role in post-transcriptional gene regulation. Large-scale analyses are needed for the still increasingly growing amount of sequence data derived from new experimental technologies. In this paper we present a framework for the detection of the distinctive precursor structure of microRNAS that is based on the well-known Smith-Waterman algorithm. By conducting the computation of the local alignment on a FPGA, we are able to gain a substantial speedup compared to a pure software implementation bringing together supercomputer performance and bioinformatics research. We conducted experiments on real genomic data and we found several new putative hits for microRNA precursor structures.

- Session 1a. “Sequence Analysis” | Pp. 19-32

Implementation of a Distributed Architecture for Managing Collection and Dissemination of Data for Fetal Alcohol Spectrum Disorders Research

Andrew Arenson; Ludmila Bakhireva; Tina Chambers; Christina Deximo; Tatiana Foroud; Joseph Jacobson; Sandra Jacobson; Kenneth Lyons Jones; Sarah Mattson; Philip May; Elizabeth Moore; Kimberly Ogle; Edward Riley; Luther Robinson; Jeffrey Rogers; Ann Streissguth; Michel Tavares; Joseph Urbanski; Helen Yezerets; Craig A. Stewart

We implemented a distributed system for management of data for an international collaboration studying Fetal Alcohol Spectrum Disorders (FASD). Subject privacy was protected, researchers without dependable Internet access were accommodated, and researchers’ data were shared globally. Data dictionaries codified the nature of the data being integrated, data compliance was assured through multiple consistency checks, and recovery systems provided a secure, robust, persistent repository. The system enabled new types of science to be done, using distributed technologies that are expedient for current needs while taking useful steps towards integrating the system in a future grid-based cyberinfrastructure. The distributed architecture, verification steps, and data dictionaries suggest general strategies for researchers involved in collaborative studies, particularly where data must be de-identified before being shared. The system met both the collaboration’s needs and the NIH Roadmap’s goal of wide access to databases that are robust and adaptable to researchers’ needs.

- Session 1b. “Grids for Screening and Property Prediction” | Pp. 33-44

Grid-Enabled High Throughput Virtual Screening

Nicolas Jacq; Vincent Breton; Hsin-Yen Chen; Li-Yung Ho; Martin Hofmann; Hurng-Chun Lee; Yannick Legré; Simon C. Lin; Astrid Maaß; Emmanuel Medernach; Ivan Merelli; Luciano Milanesi; Giulio Rastelli; Matthieu Reichstadt; Jean Salzemann; Horst Schwichtenberg; Mahendrakar Sridhar; Vinod Kasam; Ying-Ta Wu; Marc Zimmermann

Large scale grids for in silico drug discovery open opportunities of particular interest to neglected and emerging diseases. In 2005 and 2006, we have been able to deploy large scale virtual docking within the framework of the WISDOM initiative against malaria and avian influenza requiring about 100 years of CPU on the EGEE, Auvergrid and TWGrid infrastructures. These achievements demonstrated the relevance of large scale grids for the virtual screening by molecular docking. This also allowed evaluating the performances of the grid infrastructures and to identify specific issues raised by large scale deployment.

- Session 1b. “Grids for Screening and Property Prediction” | Pp. 45-59

Grid Computing for the Estimation of Toxicity: Acute Toxicity on Fathead Minnow ()

Uko Maran; Sulev Sild; Paolo Mazzatorta; Mos Casalegno; Emilio Benfenati; Mathilde Romberg

The computational estimation of toxicity is time-consuming and therefore needs support for distributed, high-performance and/or grid computing. The major technology behind the estimation of toxicity is quantitative structure activity relationship modelling. It is a complex procedure involving data gathering, preparation and analysis. The current paper describes the use of grid computing in the computational estimation of toxicity and provides a comparative study on the acute toxicity of fathead minnow () comparing the heuristic multi-linear regression and artificial neural network approaches for quantitative structure activity relationship models.

- Session 1b. “Grids for Screening and Property Prediction” | Pp. 60-74

Peer-to-Peer Experimentation in Protein Structure Prediction: An Architecture, Experiment and Initial Results

Xueping Quan; Chris Walton; Dietlind L. Gerloff; Joanna L. Sharman; Dave Robertson

Peer-to-peer approaches offer some direct solutions to modularity and scaling properties in large scale distributed systems but their role in supporting precise experimental analysis in bioinformatics has not been explored closely in practical settings. We describe a method by which precision in experimental process can be maintained within a peer-to-peer architecture and show how this can support experiments. As an example we show how our system is used to analyse real data of relevance to the structural bioinformatics community. Comparative models of yeast protein structures from three individual resources were analysed for consistency between them. We created a new resource containing only model fragments supported by agreement between the methods. Resources of this kind provide small sets of likely accurate predictions for non-expert users and are of interest in applied bioinformatics research.

- Session 1b. “Grids for Screening and Property Prediction” | Pp. 75-98

Gene Prediction in Metagenomic Libraries Using the Self Organising Map and High Performance Computing Techniques

Nigel McCoy; Shaun Mahony; Aaron Golden

This paper describes a novel approach for annotating metagenomic libraries obtained from environmental samples utilising the self organising map (SOM) neural network formalism. A parallel implementation of the SOM is presented and its particular usefulness in metagenomic annotation highlighted. The benefits of the parallel algorithm and performance increases are explained, the latest results from annotation on an artificially generated metagenomic library presented and the viability of this approach for implementation on existing metagenomic libraries is assessed.

- Session 2a. “Data Management” | Pp. 99-109

A Distributed System for Genetic Linkage Analysis

Mark Silberstein; Dan Geiger; Assaf Schuster

Linkage analysis is a tool used by geneticists for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However analyses of large inbred pedigrees with extensive missing data are often beyond the capabilities of a single computer. We present a distributed system called for computing multipoint LOD scores of large inbred pedigrees. It achieves high performance via efficient parallelization of the algorithms in , a state-of-the-art serial program for these tasks, and through utilization of thousands of resources residing in multiple opportunistic grid environments. Notably, the system is available online, which allows computationally intensive analyses to be performed with no need for either installation of software, or maintenance of a complicated distributed environment. The main algorithmic challenges have been to efficiently split large tasks for distributed execution in a highly dynamic non-dedicated running environment, as well as to utilize resources in all the available grid environments. Meeting these challenges has provided nearly interactive response time for shorter tasks while simultaneously serving massively parallel ones. The system, which is being used extensively by medical centers worldwide, achieves speedups of up to three orders of magnitude and allows analyses that were previously infeasible.

- Session 2a. “Data Management” | Pp. 110-123

Enabling Data Sharing and Collaboration in Complex Systems Applications

Michael A. Johnston; Jordi Villà-Freixa

We describe a model for the data storage, retrieval and manipulation requirements of complex system applications based on pervasive, integrated, application specific databases shared over a peer to peer network. Such a model can significannotly increase productivity through transparent data sharing and querying as well as aid collaborations. We show a proof of concept of this approach as implemented in the Adun molecular simulation application together with a discussion of its limitations and possible extensions.

- Session 2a. “Data Management” | Pp. 124-140