Catálogo de publicaciones - libros
Euro-Par 2007 Parallel Processing: 13th International Euro-Par Conference, Rennes ,France , August 28-31, 2007. Proceedings
Anne-Marie Kermarrec ; Luc Bougé ; Thierry Priol (eds.)
En conferencia: 13º European Conference on Parallel Processing (Euro-Par) . Rennes, France . August 28, 2007 - August 31, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Computer System Implementation; Computer Systems Organization and Communication Networks; Software Engineering/Programming and Operating Systems; Theory of Computation; Numeric Computing; Database Management
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-74465-8
ISBN electrónico
978-3-540-74466-5
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
Efficient Parallel Simulation of Large-Scale Neuronal Networks on Clusters of Multiprocessor Computers
Hans E. Plesser; Jochen M. Eppler; Abigail Morrison; Markus Diesmann; Marc-Oliver Gewaltig
To understand the principles of information processing in the brain, we depend on models with more than 10 neurons and 10 connections. These networks can be described as graphs of threshold elements that exchange point events over their connections.
From the computer science perspective, the key challenges are to represent the connections succinctly; to transmit events and update neuron states efficiently; and to provide a comfortable user interface. We present here the neural simulation tool NEST, a neuronal network simulator which addresses all these requirements. To simulate very large networks with acceptable time and memory requirements, NEST uses a hybrid strategy, combining distributed simulation across cluster nodes (MPI) with thread-based simulation on each computer. Benchmark simulations of a computationally hard biological neuronal network model demonstrate that hybrid parallelization yields significant performance benefits on clusters of multi-core computers, compared to purely MPI-based distributed simulation.
- Topic 9: Parallel and Distributed Programming | Pp. 672-681
MCSTL: The Multi-core Standard Template Library
Johannes Singler; Peter Sanders; Felix Putze
Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, applications will soon have to support parallelism . The simplifies parallelization by providing efficient parallel implementations of the algorithms in the C++ Standard Template Library. Thus, simple recompilation will provide partial parallelization of applications that make consistent use of the STL. We present performance measurements on several architectures. For example, our sorter achieves a speedup of 21 on an 8-core 32-thread SUN T1.
- Topic 9: Parallel and Distributed Programming | Pp. 682-694
Library Support for Parallel Sorting in Scientific Computations
Holger Dachsel; Michael Hofmann; Gudula Rünger
Sorting is an integral part of numerous algorithms and, therefore, efficient sorting support is needed by many applications. This paper presents a parallel sorting library providing efficient implementations of parallel sorting methods that can be easily adapted to a specific application. A parallel implementation of the Fast Multipole Method is used to demonstrate the configuration and the usage of the library. We also describe a parallel sorting method which provides the ability to adapt to the actual amount of memory available. Performance results for a BlueGene/L supercomputer are given.
- Topic 9: Parallel and Distributed Programming | Pp. 695-704
Domain-Specific Optimization Strategy for Skeleton Programs
Kento Emoto; Kiminori Matsuzaki; Zhenjiang Hu; Masato Takeichi
Skeletal parallel programming enables us to develop parallel programs easily by composing ready-made components called . However, a simply-composed skeleton program often lacks efficiency due to overheads of intermediate data structures and communications. Many studies have focused on optimizations by fusing successive skeletons to eliminate the overheads. Existing fusion transformations, however, are too general to achieve adequate efficiency for some classes of problems. Thus, a specific fusion optimization is needed for a specific class. In this paper, we propose a strategy for domain-specific optimization of skeleton programs. In this strategy, one starts with a normal form that abstracts the programs of interest, then develops fusion rules that transform a skeleton program into the normal form, and finally makes efficient parallel implementation of the normal form. We illustrate the strategy with a case study: optimization of skeleton programs involving neighbor elements, which is often seen in scientific computations.
- Topic 9: Parallel and Distributed Programming | Pp. 705-714
Topic 10 Parallel Numerical Algorithms
Ian Duff; Michel Daydé; Matthias Bollhoefer; Anne Trefethen
Efficient and robust parallel and distributed algorithms with portable and easy-to-use implementations for the solution of fundamental problems in numerical mathematics are essential components of most parallel software systems for scientific and engineering applications.
- Topic 10: Parallel Numerical Algorithms | Pp. 715-716
An Efficient Parallel Particle Tracker for Advection-Diffusion Simulations in Heterogeneous Porous Media
Anthony Beaudoin; Jean-Raynald de Dreuzy; Jocelyne Erhel
The heterogeneity of natural geological formations has a major impact in the contamination of groundwater by migration of pollutants. In order to get an asymptotic behavior of the solute dispersion, numerical simultations require large scale computations. We have developed a fully parallel software, where the transport model is an original parallel particke tracker. Our performance results on a distributed memory parallel architecture show the efficiency of our algorithm, for the whole range of geological parameters studied.
- Topic 10: Parallel Numerical Algorithms | Pp. 717-726
A Fully Scalable Parallel Algorithm for Solving Elliptic Partial Differential Equations
Juan A. Acebrón; Renato Spigler
A comparison is made between the probabilistic domain decomposition (DD) method and a certain deterministic DD method for solving linear elliptic boundary-value problems. Since in the deterministic approach the CPU time is affected by intercommunications among the processors, it turns out that the probabilistic method performs better, especially when the number of subdomains (hence, of processors) is increased. This fact is clearly illustrated by some examples. The probabilistic DD algorithm has been implemented in an MPI environment, in order to exploit distributed computer architectures. Scalability and fault-tolerance of the probabilistic DD algorithm are emphasized.
- Topic 10: Parallel Numerical Algorithms | Pp. 727-736
Locality Optimized Shared-Memory Implementations of Iterated Runge-Kutta Methods
Matthias Korch; Thomas Rauber
Iterated Runge-Kutta (IRK) methods are a class of explicit solution methods for initial value problems of ordinary differential equations (ODEs) which possess a considerable potential for parallelism across the method and the ODE system. In this paper, we consider the sequential and parallel implementation of IRK methods with the main focus on the optimization of the locality behavior. We introduce different implementation variants for sequential and shared-memory computer systems and analyze their runtime and cache performance on two modern supercomputer systems.
- Topic 10: Parallel Numerical Algorithms | Pp. 737-747
Toward Scalable Matrix Multiply on Multithreaded Architectures
Bryan Marker; Field G. Van Zee; Kazushige Goto; Gregorio Quintana-Ortí; Robert A. van de Geijn
We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple one-dimensional data partitioning is suboptimal in the context of dense linear algebra operations and hinders scalability. In addition we advocate the publishing of low-level interfaces to supporting operations, such as the copying of data to contiguous memory, so that library developers may further optimize parallel linear algebra implementations. Data collected on a 16 CPU Itanium2 server supports these observations.
- Topic 10: Parallel Numerical Algorithms | Pp. 748-757
Task Scheduling for Parallel Multifrontal Methods
Olivier Beaumont; Abdou Guermouche
We present a new scheduling algorithm for task graphs arising from parallel multifrontal methods for sparse linear systems. This algorithm is based on the theorem proved by Prasanna and Musicus [1] for tree-shaped task graphs, when all tasks exhibit the same degree of parallelism. We propose extended versions of this algorithm to take communication between tasks and memory balancing into account. The efficiency of proposed approach is assessed by a set of experiments on a set of large sparse matrices from several libraries.
- Topic 10: Parallel Numerical Algorithms | Pp. 758-766