Catálogo de publicaciones - libros

Compartir en
redes sociales


Euro-Par 2007 Parallel Processing: 13th International Euro-Par Conference, Rennes ,France , August 28-31, 2007. Proceedings

Anne-Marie Kermarrec ; Luc Bougé ; Thierry Priol (eds.)

En conferencia: 13º European Conference on Parallel Processing (Euro-Par) . Rennes, France . August 28, 2007 - August 31, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Computer Systems Organization and Communication Networks; Software Engineering/Programming and Operating Systems; Theory of Computation; Numeric Computing; Database Management

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74465-8

ISBN electrónico

978-3-540-74466-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Topic 1 Support Tools and Environments

Liviu Iftode; Christine Morin; Marios Dikaiakos; Erich Focht

Despite an impressive body of research,parallel and distributed computing remains a complex task prone to subtle software bugs, which can affect both the correctness and the performance of the computation. The increasing demand to distribute computing over large-scale distributed platforms, such as grids and large clusters, overlaps with an increasing pressure to make computing more dependable. To address these challenges, the parallel and distributed computing community continuously requires better tools and environments to design, program, debug, test, tune, and monitor programs. This topic aims to bring together tool designers, developers, and users to share their concerns, ideas, solutions, and products, covering a wide range of platforms.

- Topic 1: Support Tools and Environments | Pp. 1-2

Automatic Structure Extraction from MPI Applications Tracefiles

Marc Casas; Rosa M. Badia; Jesús Labarta

The process of obtaining useful message passing applications tracefiles for performance analysis in supercomputers is a large and tedious task. When using hundreds or thousands of processors, the tracefile size can grow up to 10 or 20 GB. It is clear that analyzing or even storing these large traces is a problem. The methodology we have developed and implemented performs an automatic analysis that can be applied to huge tracefiles, which obtains its internal structure and selects meaningful parts of the tracefile. The paper presents the methodology and results we have obtained from real applications.

- Topic 1: Support Tools and Environments | Pp. 3-12

Automatic Generation of Dynamic Tuning Techniques

Paola Caymes-Scutari; Anna Morajko; Tomàs Margalef; Emilio Luque

The use of parallel/distributed programming increases as it enables high performance computing. However, to cover the expectations of high performance, a high degree of expertise is required. Fortunately, in general, every parallel application follows a particular programming scheme, such as Master/Worker, Pipeline, etc. By studying the bottlenecks of these schemes, the performance problems they present can be mathematically modelled. In this paper we present a performance problem specification language to automate the development of tuning techniques, called “tunlets”. Tunlets can be incorporated into MATE (Monitoring, Analysis and Tuning Environment) which dynamically adapts the applications to the current conditions of the execution environment. In summary, each tunlet provides an automatic way to monitor, analyze and tune the application according to its mathematical model.

- Topic 1: Support Tools and Environments | Pp. 13-22

A Scheduling Toolkit for Multiprocessor-Task Programming with Dependencies

Jörg Dümmler; Raphael Kunis; Gudula Rünger

The performance of many scientific applications for distributed memory platforms can be increased by utilizing multiprocessor-task programming. To obtain the minimum parallel runtime an appropriate schedule that takes the computation and communication performance of the target platform into account is required. However, many tools and environments for multiprocessor-task programming lack the support for an integrated scheduler. This paper presents a scheduling toolkit, which provides this support and integrates popular scheduling algorithms. The implemented scheduling algorithms provide an infrastructure to automatically determine a schedule for multiprocessor-tasks with dependencies represented by a task graph.

- Topic 1: Support Tools and Environments | Pp. 23-32

Makefile::Parallel Dependency Specification Language

Alberto Simões; Rúben Fonseca; José João Almeida

Some processes are not easy to be programmed from scratch for parallel machines (clusters), but can be easily split on simple steps. Makefile::Parallel is a tool which lets users specify how processes depend on each other.

The language syntax resembles the well known Makefile[1] format, but instead of specifying files or targets dependencies, Makefile::Parallel specifies processes (or jobs) dependencies.

The scheduler reads the specification and submits jobs to the cluster scheduler (in our case, Rocks PBS) waiting them to end. When each process finishes, dependencies are calculated and direct dependent jobs are submitted.

Makefile::Parallel language includes features to specify parametric rules, used to split and join processes dependencies: some tasks can be split into smaller jobs working on different portions of files, and at the end, another process can be used to join results.

- Topic 1: Support Tools and Environments | Pp. 33-41

Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework

Samuel Thibault; Raymond Namyst; Pierre-André Wacrenier

Exploiting full computational power of current more and more hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. Unfortunately, most operating systems only provide a poor scheduling API that does not allow applications to transmit valuable to the system. In a previous paper [1], we showed that using a -based thread scheduler can significantly improve applications’ performance in a portable way. However, since multithreaded applications have various scheduling requirements, there is no universal scheduler that could meet all these needs. In this paper, we present a framework that allows scheduling experts to implement and experiment with customized thread schedulers. It provides a powerful API for dynamically distributing bubbles among the machine in a high-level, portable, and efficient way. Several examples show how experts can then develop, debug and tune their own portable .

- Topic 1: Support Tools and Environments | Pp. 42-51

A Profiling Tool for Detecting Cache-Critical Data Structures

Jie Tao; Tobias Gaugler; Wolfgang Karl

A poor cache behavior can significantly prohibit achieving high speedup and scalability of parallel applications. This means optimizing a program with respect to cache locality can potentially introduce considerable performance gain. As a consequence, programmers usually perform cache locality optimization for acquiring the expected performance of their applications.

Within this work, we developed a data profiling tool with the goal of supporting the users in this task by allowing them to detect the optimization targets in their programs. In contrast to similar tools which mostly focus on code regions, we address data structures because they are the direct objects that programmers have to work with. Based on the Performance Monitoring Unit (PMU) provided by modern processors, is capable of finding cache-critical variables, arrays, or even a segment of an array. It can also locate theses access hotspots to the most concrete position such as individual functions and code lines. This feature allows the user to apply for efficient cache optimization.

- Topic 1: Support Tools and Environments | Pp. 52-61

On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications

Karl Fuerlinger; Michael Gerndt; Jack Dongarra

Profiling is often the method of choice for performance analysis of parallel applications due to its low overhead and easily comprehensible results. However, a disadvantage of profiling is the loss of temporal information that makes it impossible to causally relate performance phenomena to events that happened prior or later during execution. We investigate techniques to add temporal dimension to profiling data by incrementally capturing profiles during the runtime of the application and discuss the insights that can be gained from this type of performance data. The context in which we explore these ideas is an existing profiling tool for OpenMP applications.

- Topic 1: Support Tools and Environments | Pp. 62-71

Fine Tuning Algorithmic Skeletons

Denis Caromel; Mario Leyton

Algorithmic skeletons correspond to a high-level programming model that takes advantage of nestable programming patterns to hide the complexity of parallel/distributed applications. Programmers have to: define the nested skeleton structure, and provide the muscle (sequential) portions of code which are specific to a problem.

An inadequate structure definition, or inefficient muscle code can lead to performance degradation of the application. Related research has focused on the problem of performing optimization to the skeleton structure. Nevertheless, to our knowledge, no focus has been done on how to aide the programmer to write performance efficient muscle code.

We present the Calcium skeleton framework as the environment in which to perform fine tuning of algorithmic skeletons. Calcium provides structured parallelism in Java using ProActive. ProAcitve is a grid middleware implementing the active object programming model, and providing a deployment framework.

Finally, using a skeleton solution of the NQueens counting problems in Calcium, we validate the fine tuning approach on a grid environment.

- Topic 1: Support Tools and Environments | Pp. 72-81

Topic 2 Performance Prediction and Evaluation

Wolfgang Nagel; Bruno Gaujal; Tugrul Dayar; Nihal Pekergin

Parallel algorithms used to be evaluated using some version of the PRAM model where actual execution platforms are abstracted as ideal parallel machines. On the other hand the performance of hardware is often given in terms of individual pick performances which can be useless for actual applications. The real challenge for performance predictions and evaluations of parallel systems is to combine the top and low layers points of view, where congestions, controlmechanisms, failures or even breakdowns do alter the behavior of a large distributed platform running a parallel application. In many different ways, this challenge is addressed in all the papers presented in the track, using either new modelling techniques, or sophisticated measures, or experimental approaches.

- Topic 2: Performance Prediction and Evaluation | Pp. 83-83