Catálogo de publicaciones - libros

Compartir en
redes sociales


Euro-Par 2007 Parallel Processing: 13th International Euro-Par Conference, Rennes ,France , August 28-31, 2007. Proceedings

Anne-Marie Kermarrec ; Luc Bougé ; Thierry Priol (eds.)

En conferencia: 13º European Conference on Parallel Processing (Euro-Par) . Rennes, France . August 28, 2007 - August 31, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Computer Systems Organization and Communication Networks; Software Engineering/Programming and Operating Systems; Theory of Computation; Numeric Computing; Database Management

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74465-8

ISBN electrónico

978-3-540-74466-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Efficient Parallel Simulation of Large-Scale Neuronal Networks on Clusters of Multiprocessor Computers

Hans E. Plesser; Jochen M. Eppler; Abigail Morrison; Markus Diesmann; Marc-Oliver Gewaltig

To understand the principles of information processing in the brain, we depend on models with more than 10 neurons and 10 connections. These networks can be described as graphs of threshold elements that exchange point events over their connections.

From the computer science perspective, the key challenges are to represent the connections succinctly; to transmit events and update neuron states efficiently; and to provide a comfortable user interface. We present here the neural simulation tool NEST, a neuronal network simulator which addresses all these requirements. To simulate very large networks with acceptable time and memory requirements, NEST uses a hybrid strategy, combining distributed simulation across cluster nodes (MPI) with thread-based simulation on each computer. Benchmark simulations of a computationally hard biological neuronal network model demonstrate that hybrid parallelization yields significant performance benefits on clusters of multi-core computers, compared to purely MPI-based distributed simulation.

- Topic 9: Parallel and Distributed Programming | Pp. 672-681

MCSTL: The Multi-core Standard Template Library

Johannes Singler; Peter Sanders; Felix Putze

Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, applications will soon have to support parallelism . The simplifies parallelization by providing efficient parallel implementations of the algorithms in the C++ Standard Template Library. Thus, simple recompilation will provide partial parallelization of applications that make consistent use of the STL. We present performance measurements on several architectures. For example, our sorter achieves a speedup of 21 on an 8-core 32-thread SUN T1.

- Topic 9: Parallel and Distributed Programming | Pp. 682-694

Library Support for Parallel Sorting in Scientific Computations

Holger Dachsel; Michael Hofmann; Gudula Rünger

Sorting is an integral part of numerous algorithms and, therefore, efficient sorting support is needed by many applications. This paper presents a parallel sorting library providing efficient implementations of parallel sorting methods that can be easily adapted to a specific application. A parallel implementation of the Fast Multipole Method is used to demonstrate the configuration and the usage of the library. We also describe a parallel sorting method which provides the ability to adapt to the actual amount of memory available. Performance results for a BlueGene/L supercomputer are given.

- Topic 9: Parallel and Distributed Programming | Pp. 695-704

Domain-Specific Optimization Strategy for Skeleton Programs

Kento Emoto; Kiminori Matsuzaki; Zhenjiang Hu; Masato Takeichi

Skeletal parallel programming enables us to develop parallel programs easily by composing ready-made components called . However, a simply-composed skeleton program often lacks efficiency due to overheads of intermediate data structures and communications. Many studies have focused on optimizations by fusing successive skeletons to eliminate the overheads. Existing fusion transformations, however, are too general to achieve adequate efficiency for some classes of problems. Thus, a specific fusion optimization is needed for a specific class. In this paper, we propose a strategy for domain-specific optimization of skeleton programs. In this strategy, one starts with a normal form that abstracts the programs of interest, then develops fusion rules that transform a skeleton program into the normal form, and finally makes efficient parallel implementation of the normal form. We illustrate the strategy with a case study: optimization of skeleton programs involving neighbor elements, which is often seen in scientific computations.

- Topic 9: Parallel and Distributed Programming | Pp. 705-714

Topic 10 Parallel Numerical Algorithms

Ian Duff; Michel Daydé; Matthias Bollhoefer; Anne Trefethen

Efficient and robust parallel and distributed algorithms with portable and easy-to-use implementations for the solution of fundamental problems in numerical mathematics are essential components of most parallel software systems for scientific and engineering applications.

- Topic 10: Parallel Numerical Algorithms | Pp. 715-716

An Efficient Parallel Particle Tracker for Advection-Diffusion Simulations in Heterogeneous Porous Media

Anthony Beaudoin; Jean-Raynald de Dreuzy; Jocelyne Erhel

The heterogeneity of natural geological formations has a major impact in the contamination of groundwater by migration of pollutants. In order to get an asymptotic behavior of the solute dispersion, numerical simultations require large scale computations. We have developed a fully parallel software, where the transport model is an original parallel particke tracker. Our performance results on a distributed memory parallel architecture show the efficiency of our algorithm, for the whole range of geological parameters studied.

- Topic 10: Parallel Numerical Algorithms | Pp. 717-726

A Fully Scalable Parallel Algorithm for Solving Elliptic Partial Differential Equations

Juan A. Acebrón; Renato Spigler

A comparison is made between the probabilistic domain decomposition (DD) method and a certain deterministic DD method for solving linear elliptic boundary-value problems. Since in the deterministic approach the CPU time is affected by intercommunications among the processors, it turns out that the probabilistic method performs better, especially when the number of subdomains (hence, of processors) is increased. This fact is clearly illustrated by some examples. The probabilistic DD algorithm has been implemented in an MPI environment, in order to exploit distributed computer architectures. Scalability and fault-tolerance of the probabilistic DD algorithm are emphasized.

- Topic 10: Parallel Numerical Algorithms | Pp. 727-736

Locality Optimized Shared-Memory Implementations of Iterated Runge-Kutta Methods

Matthias Korch; Thomas Rauber

Iterated Runge-Kutta (IRK) methods are a class of explicit solution methods for initial value problems of ordinary differential equations (ODEs) which possess a considerable potential for parallelism across the method and the ODE system. In this paper, we consider the sequential and parallel implementation of IRK methods with the main focus on the optimization of the locality behavior. We introduce different implementation variants for sequential and shared-memory computer systems and analyze their runtime and cache performance on two modern supercomputer systems.

- Topic 10: Parallel Numerical Algorithms | Pp. 737-747

Toward Scalable Matrix Multiply on Multithreaded Architectures

Bryan Marker; Field G. Van Zee; Kazushige Goto; Gregorio Quintana-Ortí; Robert A. van de Geijn

We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple one-dimensional data partitioning is suboptimal in the context of dense linear algebra operations and hinders scalability. In addition we advocate the publishing of low-level interfaces to supporting operations, such as the copying of data to contiguous memory, so that library developers may further optimize parallel linear algebra implementations. Data collected on a 16 CPU Itanium2 server supports these observations.

- Topic 10: Parallel Numerical Algorithms | Pp. 748-757

Task Scheduling for Parallel Multifrontal Methods

Olivier Beaumont; Abdou Guermouche

We present a new scheduling algorithm for task graphs arising from parallel multifrontal methods for sparse linear systems. This algorithm is based on the theorem proved by Prasanna and Musicus [1] for tree-shaped task graphs, when all tasks exhibit the same degree of parallelism. We propose extended versions of this algorithm to take communication between tasks and memory balancing into account. The efficiency of proposed approach is assessed by a set of experiments on a set of large sparse matrices from several libraries.

- Topic 10: Parallel Numerical Algorithms | Pp. 758-766