Catálogo de publicaciones - libros
Languages and Compilers for High Performance Computing: 17th International Workshop, LCPC 2004, West Lafayette, IN, USA, September 22-24, 2004, Revised Selected Papers
Rudolf Eigenmann ; Zhiyuan Li ; Samuel P. Midkiff (eds.)
En conferencia: 17º International Workshop on Languages and Compilers for Parallel Computing (LCPC) . West Lafayette, IN, USA . September 22, 2004 - September 24, 2004
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
| Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
|---|---|---|---|---|
| No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-28009-5
ISBN electrónico
978-3-540-31813-2
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Cobertura temática
Tabla de contenidos
doi: 10.1007/11532378
Languages and Compilers for High Performance Computing
Rudolf Eigenmann; Zhiyuan Li; Samuel P. Midkiff (eds.)
Pp. No disponible
doi: 10.1007/11532378_1
Experiences in Using Cetus for Source-to-Source Transformations
Troy A. Johnson; Sang-Ik Lee; Long Fei; Ayon Basumallik; Gautam Upadhyaya; Rudolf Eigenmann; Samuel P. Midkiff
Cetus is a compiler infrastructure for the source-to-source transformation of programs. Since its creation nearly three years ago, it has grown to over 12,000 lines of Java code, been made available publically on the web, and become a basis for several research projects. We discuss our experience using Cetus for a selection of these research projects. The focus of this paper is not the projects themselves, but rather how Cetus made these projects possible, how the needs of these projects influenced the development of Cetus, and the solutions we applied to problems we encountered with the infrastructure. We believe the research community can benefit from such a discussion, as shown by the strong interest in the mini-workshop on compiler research infrastructures where some of this information was first presented.
Palabras clave: Intermediate Representation; Call Graph; Analysis Pass; Shared Memory System; Local Symbol.
Pp. 1-14
doi: 10.1007/11532378_2
The LLVM Compiler Framework and Infrastructure Tutorial
Chris Lattner; Vikram Adve
The LLVM Compiler Infrastructure ( http://llvm.cs. uiuc.edu ) is a robust system that is well suited for a wide variety of research and development work. This brief paper introduces the LLVM system and provides pointers to more extensive documentation, complementing the tutorial presented at LCPC.
Pp. 15-16
doi: 10.1007/11532378_3
An Overview of the Open Research Compiler
Chengyong Wu; Ruiqi Lian; Junchao Zhang; Roy Ju; Sun Chan; Lixia Liu; Xiaobing Feng; Zhaoqing Zhang
The Open Research Compiler (ORC), jointly developed by Intel Microprocessor Technology Labs and the Institute of Computing Technology at Chinese Academy of Sciences, has become the leading open source compiler on the Itanium^TM Processor Family (IPF, previously known as IA-64). Since its first release in 2002, it has been widely used in academia and industry worldwide as a compiler and architecture research infrastructure and as code base for further development. In this paper, we present an overview of the design of the major components in ORC, especially those new features in the code generator. We discuss the development methodology that is important to achieving the objectives of ORC. Performance comparisons with other IPF compilers and a brief summary of the research work based on ORC are also presented.
Palabras clave: Control Flow Graph; Register Allocation; Instruction Schedule; Program Language Design; VLIW Processor.
Pp. 17-31
doi: 10.1007/11532378_4
Trimaran: An Infrastructure for Research in Instruction-Level Parallelism
Lakshmi N. Chakrapani; John Gyllenhaal; Wen-mei W. Hwu; Scott A. Mahlke; Krishna V. Palem; Rodric M. Rabbah
Trimaran is an integrated compilation and performance monitoring infrastructure. The architecture space that Trimaran covers is characterized by HPL-PD, a parameterized processor architecture supporting novel features such as predication, control and data speculation and compiler controlled management of the memory hierarchy. Trimaran also consists of a full suite of analysis and optimization modules, as well as a graph-based intermediate language. Optimizations and analysis modules can be easily added, deleted or bypassed, thus facilitating compiler optimization research. Similarly, computer architecture research can be conducted by varying the HPL-PD machine via the machine description language HMDES. Trimaran also provides a detailed simulation environment and a flexible performance monitoring environment that automatically tracks the machine as it is varied.
Palabras clave: Memory Hierarchy; Intermediate Representation; Design Space Exploration; Register Allocation; Software Pipeline.
Pp. 32-41
doi: 10.1007/11532378_5
Phase-Based Miss Rate Prediction Across Program Inputs
Xipeng Shen; Yutao Zhong; Chen Ding
Previous work shows the possibility of predicting the cache miss rate (CMR) for all inputs of a program. However, most optimization techniques need to know more than the miss rate of the whole program. Many of them benefit from knowing miss rate of each execution phase of a program for all inputs. In this paper, we describe a method that divides a program into phases that have a regular locality pattern. Using a regression model, it predicts the reuse signature and then the cache miss rate of each phase for all inputs. We compare the prediction with the actual measurement. The average prediction is over 98% accurate for a set of floating-point programs. The predicted CMR-traces matches the simulated ones in spite of dramatic fluctuations of the miss rate over time. This technique can be used for improving dynamic optimization, benchmarking, and compiler design.
Palabras clave: Loop Nest; Cache Size; Phase Sequence; Program Input; Cache Block.
Pp. 42-55
doi: 10.1007/11532378_6
Speculative Subword Register Allocation in Embedded Processors
Bengu Li; Youtao Zhang; Rajiv Gupta
Multimedia and network processing applications make extensive use of subword data. Since registers are capable of holding a full data word, when a subword variable is assigned a register only part of the register is used. We propose an instruction set extension to the ARM embedded processor which allows two data items to reside in a register as long as each of them can be stored in 16 bits. The instructions are used by the register allocator to speculatively move the value of an otherwise spilled variable into a register which has already been assigned to another variable. The move is speculative because it only succeeds if the two values (value already present in the register and the value being moved into the register) can be simultaneously held in the register using 16 bits each. When this value is reloaded for further use, an attempt is first made to retrieve the value from its speculatively assigned register. If this attempt succeeds, load from memory is avoided. On an average our technique avoids 47% of dynamic reloads caused by spills.
Palabras clave: Physical Register; Register Allocation; Embed Processor; Interference Graph; Speculative Pass.
Pp. 56-71
doi: 10.1007/11532378_7
Empirical Performance-Model Driven Data Layout Optimization
Qingda Lu; Xiaoyang Gao; Sriram Krishnamoorthy; Gerald Baumgartner; J. Ramanujam; P. Sadayappan
Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness can be limited by the accuracy of the performance models used. In this paper, we describe an approach where a class of computations is modeled in terms of constituent operations that are empirically measured, thereby allowing modeling of the overall execution time. The performance model with empirically determined cost components is used to perform data layout optimization in the context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational models in quantum chemistry. The effectiveness of the approach is demonstrated through experimental measurements on some representative computations from quantum chemistry.
Palabras clave: Execution Time; Tile Size; Matrix Multiplication Algorithm; Input Operand; Tensor Contraction.
Pp. 72-86
doi: 10.1007/11532378_8
Implementation of Parallel Numerical Algorithms Using Hierarchically Tiled Arrays
Ganesh Bikshandi; Basilio B. Fraguela; Jia Guo; María J. Garzarán; Gheorghe Almási; José Moreira; David Padua
In this paper, we describe our experience in writing parallel numerical algorithms using Hierarchically Tiled Arrays (HTAs). HTAs are classes of objects that encapsulate parallelism. HTAs allow the construction of single-threaded parallel programs where a master process distributes tasks to be executed by a collection of servers holding the components (tiles) of the HTAs. The tiled and recursive nature of HTAs facilitates the development of algorithms with a high degree of parallelism as well as locality. We have implemented HTAs as a MATLAB^TM toolbox, overloading conventional operators and array functions such that HTA operations appear to the programmer as extensions of MATLAB^TM. We have successfully used it to write some widely used parallel numerical programs. The resulting programs are easier to understand and maintain than their MPI counterparts.
Palabras clave: Parallel Program; Main Loop; Tile Array; Compiler Support; Legality Check.
Pp. 87-101
doi: 10.1007/11532378_9
A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces
Arun Kejariwal; Paolo D’Alberto; Alexandru Nicolau; Constantine D. Polychronopoulos
Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during parallel execution of a nested loop directly depend on partitioning, i.e., the way the different iterations of a parallel loop are distributed across different processors. Thus, partitioning of parallel loops is of key importance for high performance and efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of rectangular iteration spaces, the problem of partitioning of non-rectangular iteration spaces – e.g. triangular, trapezoidal iteration spaces – has not been given enough attention so far. In this paper, we present a geometric approach for partitioning N-dimensional non-rectangular iteration spaces for optimizing performance on parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented.
Palabras clave: Iteration Space; Loop Nest; Index Point; Convex Polytope; Cache Line.
Pp. 102-116