Catálogo de publicaciones - libros

Compartir en
redes sociales

Languages and Compilers for High Performance Computing: 17th International Workshop, LCPC 2004, West Lafayette, IN, USA, September 22-24, 2004, Revised Selected Papers

Rudolf Eigenmann ; Zhiyuan Li ; Samuel P. Midkiff (eds.)

En conferencia: 17º International Workshop on Languages and Compilers for Parallel Computing (LCPC) . West Lafayette, IN, USA . September 22, 2004 - September 24, 2004

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-28009-5

ISBN electrónico

978-3-540-31813-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11532378

Languages and Compilers for High Performance Computing

Rudolf Eigenmann; Zhiyuan Li; Samuel P. Midkiff (eds.)

Pp. No disponible

doi: 10.1007/11532378_1

Experiences in Using Cetus for Source-to-Source Transformations

Troy A. Johnson; Sang-Ik Lee; Long Fei; Ayon Basumallik; Gautam Upadhyaya; Rudolf Eigenmann; Samuel P. Midkiff

Cetus is a compiler infrastructure for the source-to-source transformation of programs. Since its creation nearly three years ago, it has grown to over 12,000 lines of Java code, been made available publically on the web, and become a basis for several research projects. We discuss our experience using Cetus for a selection of these research projects. The focus of this paper is not the projects themselves, but rather how Cetus made these projects possible, how the needs of these projects influenced the development of Cetus, and the solutions we applied to problems we encountered with the infrastructure. We believe the research community can benefit from such a discussion, as shown by the strong interest in the mini-workshop on compiler research infrastructures where some of this information was first presented.

Palabras clave: Intermediate Representation; Call Graph; Analysis Pass; Shared Memory System; Local Symbol.

Pp. 1-14

doi: 10.1007/11532378_2

The LLVM Compiler Framework and Infrastructure Tutorial

Chris Lattner; Vikram Adve

The LLVM Compiler Infrastructure ( http://llvm.cs. uiuc.edu ) is a robust system that is well suited for a wide variety of research and development work. This brief paper introduces the LLVM system and provides pointers to more extensive documentation, complementing the tutorial presented at LCPC.

Pp. 15-16

doi: 10.1007/11532378_3

An Overview of the Open Research Compiler

Chengyong Wu; Ruiqi Lian; Junchao Zhang; Roy Ju; Sun Chan; Lixia Liu; Xiaobing Feng; Zhaoqing Zhang

The Open Research Compiler (ORC), jointly developed by Intel Microprocessor Technology Labs and the Institute of Computing Technology at Chinese Academy of Sciences, has become the leading open source compiler on the Itanium^TM Processor Family (IPF, previously known as IA-64). Since its first release in 2002, it has been widely used in academia and industry worldwide as a compiler and architecture research infrastructure and as code base for further development. In this paper, we present an overview of the design of the major components in ORC, especially those new features in the code generator. We discuss the development methodology that is important to achieving the objectives of ORC. Performance comparisons with other IPF compilers and a brief summary of the research work based on ORC are also presented.

Palabras clave: Control Flow Graph; Register Allocation; Instruction Schedule; Program Language Design; VLIW Processor.

Pp. 17-31

doi: 10.1007/11532378_4

Trimaran: An Infrastructure for Research in Instruction-Level Parallelism

Lakshmi N. Chakrapani; John Gyllenhaal; Wen-mei W. Hwu; Scott A. Mahlke; Krishna V. Palem; Rodric M. Rabbah

Trimaran is an integrated compilation and performance monitoring infrastructure. The architecture space that Trimaran covers is characterized by HPL-PD, a parameterized processor architecture supporting novel features such as predication, control and data speculation and compiler controlled management of the memory hierarchy. Trimaran also consists of a full suite of analysis and optimization modules, as well as a graph-based intermediate language. Optimizations and analysis modules can be easily added, deleted or bypassed, thus facilitating compiler optimization research. Similarly, computer architecture research can be conducted by varying the HPL-PD machine via the machine description language HMDES. Trimaran also provides a detailed simulation environment and a flexible performance monitoring environment that automatically tracks the machine as it is varied.

Palabras clave: Memory Hierarchy; Intermediate Representation; Design Space Exploration; Register Allocation; Software Pipeline.

Pp. 32-41

doi: 10.1007/11532378_5

Phase-Based Miss Rate Prediction Across Program Inputs

Xipeng Shen; Yutao Zhong; Chen Ding

Previous work shows the possibility of predicting the cache miss rate (CMR) for all inputs of a program. However, most optimization techniques need to know more than the miss rate of the whole program. Many of them benefit from knowing miss rate of each execution phase of a program for all inputs. In this paper, we describe a method that divides a program into phases that have a regular locality pattern. Using a regression model, it predicts the reuse signature and then the cache miss rate of each phase for all inputs. We compare the prediction with the actual measurement. The average prediction is over 98% accurate for a set of floating-point programs. The predicted CMR-traces matches the simulated ones in spite of dramatic fluctuations of the miss rate over time. This technique can be used for improving dynamic optimization, benchmarking, and compiler design.

Palabras clave: Loop Nest; Cache Size; Phase Sequence; Program Input; Cache Block.

Pp. 42-55

doi: 10.1007/11532378_6

Speculative Subword Register Allocation in Embedded Processors

Bengu Li; Youtao Zhang; Rajiv Gupta

Multimedia and network processing applications make extensive use of subword data. Since registers are capable of holding a full data word, when a subword variable is assigned a register only part of the register is used. We propose an instruction set extension to the ARM embedded processor which allows two data items to reside in a register as long as each of them can be stored in 16 bits. The instructions are used by the register allocator to speculatively move the value of an otherwise spilled variable into a register which has already been assigned to another variable. The move is speculative because it only succeeds if the two values (value already present in the register and the value being moved into the register) can be simultaneously held in the register using 16 bits each. When this value is reloaded for further use, an attempt is first made to retrieve the value from its speculatively assigned register. If this attempt succeeds, load from memory is avoided. On an average our technique avoids 47% of dynamic reloads caused by spills.

Palabras clave: Physical Register; Register Allocation; Embed Processor; Interference Graph; Speculative Pass.

Pp. 56-71

doi: 10.1007/11532378_7

Empirical Performance-Model Driven Data Layout Optimization

Qingda Lu; Xiaoyang Gao; Sriram Krishnamoorthy; Gerald Baumgartner; J. Ramanujam; P. Sadayappan

Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness can be limited by the accuracy of the performance models used. In this paper, we describe an approach where a class of computations is modeled in terms of constituent operations that are empirically measured, thereby allowing modeling of the overall execution time. The performance model with empirically determined cost components is used to perform data layout optimization in the context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational models in quantum chemistry. The effectiveness of the approach is demonstrated through experimental measurements on some representative computations from quantum chemistry.

Palabras clave: Execution Time; Tile Size; Matrix Multiplication Algorithm; Input Operand; Tensor Contraction.

Pp. 72-86

doi: 10.1007/11532378_8

Implementation of Parallel Numerical Algorithms Using Hierarchically Tiled Arrays

Ganesh Bikshandi; Basilio B. Fraguela; Jia Guo; María J. Garzarán; Gheorghe Almási; José Moreira; David Padua

In this paper, we describe our experience in writing parallel numerical algorithms using Hierarchically Tiled Arrays (HTAs). HTAs are classes of objects that encapsulate parallelism. HTAs allow the construction of single-threaded parallel programs where a master process distributes tasks to be executed by a collection of servers holding the components (tiles) of the HTAs. The tiled and recursive nature of HTAs facilitates the development of algorithms with a high degree of parallelism as well as locality. We have implemented HTAs as a MATLAB^TM toolbox, overloading conventional operators and array functions such that HTA operations appear to the programmer as extensions of MATLAB^TM. We have successfully used it to write some widely used parallel numerical programs. The resulting programs are easier to understand and maintain than their MPI counterparts.

Palabras clave: Parallel Program; Main Loop; Tile Array; Compiler Support; Legality Check.

Pp. 87-101

doi: 10.1007/11532378_9

A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces

Arun Kejariwal; Paolo D’Alberto; Alexandru Nicolau; Constantine D. Polychronopoulos

Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel loop are distributed across different processors. Thus, partitioning of parallel loops is of key importance for high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of rectangular iteration spaces, the problem of partitioning of non-rectangular iteration spaces – e.g. triangular, trapezoidal iteration spaces – has not been given enough attention so far. In this paper, we present a geometric approach for partitioning N-dimensional non-rectangular iteration spaces for optimizing performance on parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented.

Palabras clave: Iteration Space; Loop Nest; Index Point; Convex Polytope; Cache Line.

Pp. 102-116