Catálogo de publicaciones - libros

Compartir en
redes sociales

Languages and Compilers for High Performance Computing: 17th International Workshop, LCPC 2004, West Lafayette, IN, USA, September 22-24, 2004, Revised Selected Papers

Rudolf Eigenmann ; Zhiyuan Li ; Samuel P. Midkiff (eds.)

En conferencia: 17º International Workshop on Languages and Compilers for Parallel Computing (LCPC) . West Lafayette, IN, USA . September 22, 2004 - September 24, 2004

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-28009-5

ISBN electrónico

978-3-540-31813-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11532378_30

Overflow Controlled SIMD Arithmetic

Jiahua Zhu; Hongjiang Zhang; Hui Shi; Binyu Zang; Chuanqi Zhu

Although the ”SIMD within a register” parallel architectures have existed for almost 10 years, the automatic optimizations for such architectures are not well developed yet. Since most optimizations for SIMD architectures are transplanted from traditional vectorization techniques, many special features of SIMD architectures, such as packed operations, have not been thoroughly considered. As operands are tightly packed within a register, there is no spare space to indicate overflow. To maintain the accuracy of automatic SIMDized programs, the operands should be unpacked to preserve enough space for interim overflow. By doing this, great overhead would be introduced. Furthermore, the instructions for handling interim overflows can sometimes prevent other optimizations. In this paper, a new technique, OCSA (overflow controlled SIMD arithmetic), is proposed to reduce the negative effects caused by interim overflow handling and eliminate the interference of interim overflows. We have applied our algorithm to the multimedia benchmarks of Berkeley. The experimental results show that the OCSA algorithm can significantly improve the performance of ADPCM-Decoder (110%), MESA-Reflect (113%) and DJVU-Encoder (106%).

Pp. 424-438

doi: 10.1007/11532378_31

Branch Strategies to Optimize Decision Trees for Wide-Issue Architectures

Patrick Carribault; Christophe Lemuet; Jean-Thomas Acquaviva; Albert Cohen; William Jalby

Branch predictors are associated with critical design issues for nowadays instruction greedy processors. We study two important domains where the optimization of decision trees — implemented through switch - case or nested if - then - else constructs — makes the precise modeling of these hardware mechanisms determining for performance: compute-intensive libraries with versioning and cloning, and high-performance interpreters. Against common belief, the complexity of recent microarchitectures does not necessarily hamper the design of accurate cost models, in the special case of decision trees . We build a simple model that illustrates the reasons for which decision tree performance is predictable. Based on this model, we compare the most significant code generation strategies on the Itanium2 processor. We show that no strategy dominates in all cases , and although they used to be penalized by traditional superscalar processors, indirect branches regain a lot of interest in the context of predicated execution and delayed branches. We validate our study with an improvement from 15% to 40% over Intel ICC compiler for a Daxpy code focused on short vectors.

Palabras clave: Target Address; Branch Register; Branch Prediction; Branch Instruction; Switch Structure.

Pp. 439-454

doi: 10.1007/11532378_32

Extending the Applicability of Scalar Replacement to Multiple Induction Variables

Nastaran Baradaran; Pedro C. Diniz; Joonseok Park

Scalar replacement or register promotion uses scalar variables to save data that can be reused across loop iterations, leading to a reduction of the number of memory operations at the expense of a possibly large number of registers. In this paper we present a compiler data reuse analysis capable of uncovering and exploiting reuse opportunities for array references that exhibit Multiple-Induction-Variable (MIV) subscripts, beyond the reach of current data reuse analysis techniques. We present experimental results of the application of scalar replacement to a sample set of kernel codes targeting a programmable hardware computing device — a Field-Programmable-Gate-Array (FPGA). The results show that, for memory bound designs, scalar replacement alone leads to speedups that range between 2x to 6x at the expense of an increase in the FPGA design area in the range of 6x to 20x.

Palabras clave: Loop Nest; Loop Level; Array Variable; Memory Operation; Array Reference.

Pp. 455-469

doi: 10.1007/11532378_33

Power-Aware Scheduling for Parallel Security Processors with Analytical Models

Yung-Chia Lin; Yi-Ping You; Chung-Wen Huang; Jenq-Kuen Lee; Wei-Kuan Shih; Ting-Ting Hwang

Techniques to reduce power dissipation for embedded systems have recently come into sharp focus in the technology development. Among these techniques, dynamic voltage scaling (DVS), power gating (PG), and multiple-domain partitioning are regarded as effective schemes to reduce dynamic and static power. In this paper, we investigate the problem of power-aware scheduling tasks running on a scalable encryption processor, which is equipped with heterogeneous distributed SOC designs and needs the effective integration of the elements of DVS, PG, and the scheduling for correlations of multiple domain resources. We propose a novel heuristic that integrates the utilization of DVS and PG and increases the total energy-saving. Furthermore, we propose an analytic model approach to make an estimate about its performance and energy requirements between different components in systems. These proposed techniques are essential and needed to perform DVS and PG on multiple domain resources that are of correlations. Experiments are done in the prototypical environments for our security processors and the results show that significant energy reductions can be achieved by our algorithms.

Palabras clave: Slack Time; Dynamic Voltage Scaling; Schedule Result; Power Gating; Security Processor.

Pp. 470-484