Biblioteca Electrónica de Ciencia y Tecnología - Catálogo de publicaciones

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

The ParaWise Expert Assistant - Widening Accessibility to Efficient and Scalable Tool Generated OpenMP Code

Despite the apparent simplicity of the OpenMP directive shared memory programming model and the sophisticated dependence analysis and code generation capabilities of the ParaWise/CAPO tools, experience shows that a level of expertise is required to produce efficient parallel code. In a real world application the investigation of a single loop in a generated parallel code can soon become an in-depth inspection of numerous dependencies in many routines. The additional understanding of dependencies is also needed to effectively interpret the information provided and supply the required feedback. The ParaWise Expert Assistant has been developed to automate this investigation and present questions to the user about, and in the context of, their application code. In this paper, we demonstrate that knowledge of dependence information and OpenMP are no longer essential to produce efficient parallel code with the Expert Assistant. It is hoped that this will enable a far wider audience to use the tools and subsequently, exploit the benefits of large parallel systems.

Pp. No disponible

doi: DOItmp_0558_030093

An Evaluation of Auto-Scoping in OpenMP

In [1], Dieter an Mey proposes the addition of an AUTO attribute to the OpenMP DEFAULT clause. A DEFAULT(AUTO) clause would cause the compiler to automatically determine the scoping of variables that are not explicitly classified as shared, private or reduction. While this new feature would be useful and powerful, its implementation would rely on automatic parallelization technology, which has been shown to have significant limitations. In this paper, we implement support for the DEFAULT(AUTO) clause in the Polaris parallelizing compiler. Our modified version of the compiler will translate regions with DEFAULT(AUTO) clauses into regions that have explicit scopings for all variables. An evaluation of our implementation on a subset of the SPEC OpenMP Benchmark Suite shows that with current automatic parallelization technologies, a number of important regions cannot be statically scoped, resulting in a significant loss of speedup. We also compare our compiler’s performance to that of an Early Access version of the Sun Studio 9 Fortran 95 compiler [2].

Pp. No disponible

doi: 10.1007/978-3-540-31832-3_1

Parallelization of General Matrix Multiply Routines Using OpenMP

Jonathan L. Bentz; Ricky A. Kendall

An application programmer interface (API) is developed to facilitate, via OpenMP, the parallelization of the double precision general matrix multiply routine called from within GAMESS [1] during the execution of the coupled-cluster module for calculating physical properties of molecules. Results are reported using the ATLAS library and the Intel MKL on an Intel machine, and using the ESSL and the ATLAS library on an IBM SP.

Palabras clave: Execution Time; Application Programmer Interface; Thread Number; OpenMP Implementation; Fast Execution Time.

Pp. 1-11

doi: 10.1007/978-3-540-31832-3_2

Performance Analysis of Hybrid OpenMP/MPI N-Body Application

Rocco Aversa; Beniamino Di Martino; Nicola Mazzocca; Salvatore Venticinque

In this paper we show, through a case-study, how the adoption of the MPI model for the distributed parallelism and OpenMP parallelizing compiler technology for the inner shared memory parallelism, allows to yield a hierarchically distributed-shared memory implementation of an algorithm presenting multiple levels of parallelism. The chosen application solves the well known N-body problem.

Palabras clave: Shared Memory; Message Passing Interface; Parallel Programming Model; High Performance Fortran; OpenMP Directive.

Pp. 12-18

doi: 10.1007/978-3-540-31832-3_3

Performance and Scalability of OpenMP Programs on the Sun Fire^TM E25K Throughput Computing Server

Myungho Lee; Brian Whitney; Nawal Copty

The Sun Fire^TM K is the first generation of high-end servers built to the Throughput Computing strategy of Sun Microsystems. The server can scale up to 72 UltraSPARC® IV processors with Chip MultiThreading (CMT) technology, and execute up to 144 threads simultaneously. The Sun Studio^TM 9 software includes compilers and tools that provide support for the C, C++, and Fortran OpenMP Version 2.0 specifications, and that fully exploit the capabilities of the UltraSPARC IV processor and the E25K server. This paper gives an overview of the Sun Fire E25K server and OpenMP support in the Sun Studio 9 software. The paper presents the latest world-class SPEC OMPL benchmark results on the Sun Fire E25K, and compares these results with those on the UltraSPARC III Cu based Sun Fire 15K server. Results show that base and peak ratios for the SPEC OMPL benchmarks increase by approximately 50% on the Sun Fire E25K server, compared to a Sun Fire 15K server with the same number of processors and with higher clock frequency.

Palabras clave: Peak Ratio; Parallel Region; Automatic Parallelization; Master Thread; High Clock Frequency.

Pp. 19-28

doi: 10.1007/978-3-540-31832-3_4

What Multilevel Parallel Programs Do When You Are Not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

Gabriele Jost; Jesús Labarta; Judit Gimenez

In this paper we present a performance analysis case study of two multilevel parallel benchmark codes implemented in three different programming paradigms applicable to shared memory computer architectures. We describe how detailed analysis techniques help to differentiate between the influences of the programming model itself and other factors, such as implementation specific behavior of the operating system or architectural issues.

Palabras clave: Shared Memory; Programming Paradigm; Thread Number; OpenMP Directive; Remote Memory Access.

Pp. 29-40

doi: 10.1007/978-3-540-31832-3_5

SIMT/OMP: A Toolset to Study and Exploit Memory Locality of OpenMP Applications on NUMA Architectures

Jie Tao; Martin Schulz; Wolfgang Karl

OpenMP has become the dominant standard for shared memory programming. It is traditionally used for Symmetric Multiprocessor Systems, but has more recently also found its way to parallel architectures with distributed shared memory like NUMA machines. This combines the advantages of OpenMP’s easy-to-use programming model with the scalability and cost-effectiveness of NUMA architectures. In NUMA (Non Uniform Memory Access) environments, however, OpenMP codes suffer from the longer latencies of remote memory accesses. This can be observed for both hardware and software DSM systems. In this paper we present SIMT/OMP, a simulation environment capable of modeling NUMA scenarios and providing comprehensive performance data about the inter-connection traffic. We use this tool to study the impact of NUMA on the performance of OpenMP applications and show how the memory layout of these codes can be improved using a visualization tool. Based on these techniques, we have achieved performance increases of up to a factor of five on some of our benchmarks, especially in larger system configurations.

Palabras clave: Memory Access; Remote Access; Shared Memory System; Memory Layout; Remote Memory Access.

Pp. 41-52

doi: 10.1007/978-3-540-31832-3_6

Dragon: A Static and Dynamic Tool for OpenMP

Oscar Hernandez; Chunhua Liao; Barbara Chapman

A program analysis tool can play an important role in helping users understand and improve OpenMP codes. Dragon is a robust interactive program analysis tool based on the Open64 compiler, an open source OpenMP, C/C++/Fortran77/90 compiler for Intel Itanium systems. We developed the Dragon tool on top of Open64 to exploit its powerful analyses in order to provide static as well as dynamic (feedback-based) information which can be used to develop or optimize OpenMP codes. Dragon enables users to visualize and print essential program structures and obtain runtime information on their applications. Current features include static/dynamic call graphs and control flow graphs, data dependence analysis and interprocedural array region summaries, that help understand procedure side effects within parallel loops. On-going work extends Dragon to display data access patterns at runtime, and provide support for runtime instrumentation and optimizations.

Palabras clave: Flow Graph; Control Flow Graph; Call Graph; Dynamic Tool; Array Region.

Pp. 53-66

doi: 10.1007/978-3-540-31832-3_7

The ParaWise Expert Assistant – Widening Accessibility to Efficient and Scalable Tool Generated OpenMP Code

Stephen Johnson; Emyr Evans; Haoqiang Jin; Constantinos Ierotheou

Despite the apparent simplicity of the OpenMP directive shared memory programming model and the sophisticated dependence analysis and code generation capabilities of the ParaWise/CAPO tools, experience shows that a level of expertise is required to produce efficient parallel code. In a real world application the investigation of a single loop in a generated parallel code can soon become an in-depth inspection of numerous dependencies in many routines. The additional understanding of dependencies is also needed to effectively interpret the information provided and supply the required feedback. The ParaWise Expert Assistant has been developed to automate this investigation and present questions to the user about, and in the context of, their application code. In this paper, we demonstrate that knowledge of dependence information and OpenMP are no longer essential to produce efficient parallel code with the Expert Assistant. It is hoped that this will enable a far wider audience to use the tools and subsequently, exploit the benefits of large parallel systems.

Palabras clave: Dependence Analysis; Finite Element Code; Parallel Execution; Code Author; Application Code.

Pp. 67-82

doi: 10.1007/978-3-540-31832-3_8

Automatic Scoping of Variables in Parallel Regions of an OpenMP Program

Yuan Lin; Christian Terboven; Dieter an Mey; Nawal Copty

The process of manually specifying scopes of variables when writing an OpenMP program is both tedious and error-prone. To improve productivity, an autoscoping feature was proposed in [1]. This feature leverages the analysis capability of a compiler to determine the appropriate scopes of variables. In this paper, we present the proposed autoscoping rules and describe the autoscoping feature provided in the Sun Studio^TM 9 Fortran 95 compiler. To investigate how much work can be saved by using autoscoping and the performance impact of this feature, we study the process of parallelizing PANTA, a 50,000-line 3D Navier-Stokes solver, using OpenMP. With pure manual scoping, a total of 1389 variables have to be explicitly privatized by the programmer. With the help of autoscoping, only 13 variables have to be manually scoped. Both versions of PANTA achieve the same performance.

Palabras clave: Parallel Region; Data Race; Automatic Parallelization; OpenMP Directive; Default Clause.

Pp. 83-97

Catálogo de publicaciones - libros

Shared Memory Parallel Programming with Open MP: 5th International Workshop on Open MP Application and Tools, WOMPAT 2004, Houston, TX, USA, May 17-18, 2004

Barbara M. Chapman (eds.)

Resumen/Descripción – provisto por la editorial

Palabras clave – provistas por la editorial

Disponibilidad

Información

Cobertura temática

Ciencias de la computación e información

Tabla de contenidos