Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Computer Systems Architecture: 11th Asia-Pacific Conference, ACSAC 2006, Shanghai, China, September 6-8, 2006, Proceedings

Chris Jesshope ; Colin Egan (eds.)

En conferencia: 11º Asia-Pacific Conference on Advances in Computer Systems Architecture (ACSAC) . Shanghai, China . September 6, 2006 - September 8, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Arithmetic and Logic Structures; Input/Output and Data Communications; Logic Design; Computer Communication Networks; Processor Architectures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-40056-1

ISBN electrónico

978-3-540-40058-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

The Era of Multi-core Chips -A Fresh Look on Software Challenges

Guang R. Gao

In the past few months, the world has witnessed the impressive pace that the microprocessor chip vendors’ switching to multi-core chip technology. However, this is preventing steep software challenges – both in the migration of application software and in the adaptation of system software. In this talk, we discuss the challenges as well as opportunities facing software technology in the era of the emerging multi-core chips. We review the software effort failures and lessons learned during the booming years on parallel computing – in the 80s and early 90s, and analyze the issues and challenges today when we are once more trying to explore large-scale parallelism on multi-core chips and systems. We then predict major technology innovations that should be made in order to assure a success this time. This talk will begin with a discussion based on our own experience on working with fine-grain multithreading from execution/architecture models, system software technology, and relevant application software studies in the past decade. We then outline our recent experience in working on software issues for the next generation multi-core chip architectures. We will present a case study on a mapping of OpenMP on two representative classes of future multi-core architecture models. We discuss several fundamental performance issues facing system software designers.

Palabras clave: Software Technology; Recent Experience; Parallel Computing; Application Software; Data Communication.

Pp. 1-1

Streaming Networks for Coordinating Data-Parallel Programs (Position Statement)

Alex Shafarenko

A new coordination language for distributed data-parallel programs is presented, call SNet. The intention of SNet is to introduce advanced structuring techniques into a coordination language: stream processing and various forms of subtyping. The talk will present the organisation of SNet, its major type inferencing algorithms and will briefly discuss the current state of implementation and possible applications.

Palabras clave: Stream Processing; Stream Network; Architecture Initiative; Single Stream; Computation Language.

Pp. 2-5

Implementations of Square-Root and Exponential Functions for Large FPGAs

Mariusz Bajger; Amos R. Omondi

This paper discusses low-error, high-speed evaluation of two elementary functions: square-root (which is required in IEEE-754 standard on computer arithmetic) and exponential (which is common in scientific calculations). The basis of the proposed implementations is piecewise-linear interpolation but with the constants chosen in a way that minimizes relative error. We show that by placing certain constraints on the errors at three points within each interpolation interval, relative errors are greatly reduced. The implementation-targets are large FPGAs that have in-built multipliers, adders, and distributed memory.

Pp. 6-23

Using Branch Prediction Information for Near-Optimal I-Cache Leakage

Sung Woo Chung; Kevin Skadron

This paper describes a new on-demand wakeup prediction policy for instruction cache leakage control that achieves better leakage savings than prior policies, and avoids the performance overheads of prior policies. The proposed policy reduces leakage energy by more than 92% with only less than 0.3% performance overhead on average. The key to this new on-demand policy is to use branch prediction information for the wakeup prediction. In the proposed policy, inserting an extra stage for wakeup between branch prediction and fetch, allows the branch predictor to be also used as a wakeup predictor without any additional hardware. Thus, the extra stage hides the wakeup penalty, not affecting branch prediction accuracy. Though extra pipeline stages typically add to branch misprediction penalty, in this case, the extra wakeup stage on the normal fetch path can be overlapped with misprediction recovery. With such consistently accurate wakeup prediction, all cache lines except the next expected cache line are in the leakage saving mode, minimizing leakage energy.

Palabras clave: Instruction Cache; Low Power; Leakage; Drowsy Cache; Branch Prediction.

Pp. 24-37

Scientific Computing Applications on the Imagine Stream Processor

Jing Du; Xuejun Yang; Guibin Wang; Fujiang Ao

The Imagine processor is designed to address the processor-memory gap through streaming technology. Good performance of most media applications has been demonstrated on Imagine. However the research whether scientific computing applications are suited for Imagine is open. In this paper, we studied some key issues of scientific computing applications mapping to Imagine, and present the experimental results of some representative scientific computing applications on the ISIM simulation of Imagine. By evaluating the experimental results, we isolate the set of scientific computing application characteristics well suited for Imagine architecture, analyze the performance potentiality of scientific computing applications on Imagine compared with common processor and explore the optimizations of scientific stream program.

Palabras clave: scientific computing application; Imagine; stream; three level parallelism; multinest.

Pp. 38-51

Enhancing Last-Level Cache Performance by Block Bypassing and Early Miss Determination

Haakon Dybdahl; Per Stenström

While bypassing algorithms have been applied to the first-level cache, we study for the first time their effectiveness for the last-level caches for which miss penalties are significantly higher and where algorithm complexity is not constrained by the speed of the pipeline. Our algorithm monitors the reuse behavior of blocks that are touched by delinquent loads and re-classify them on-the-fly. Blocks classified as bypassed are only installed in the level-1 cache. We leverage the algorithm to early send out a miss request for loads expected to request blocks classified to be bypassed. Such requests are sent to memory directly without tag checks at intermediary levels in the cache hierarchy. Overall, we find that we can robustly reduce the miss rate by 23% and improve IPC with 14% on average for memory bound SPEC2000 applications without degrading performance of the other SPEC2000 applications.

Palabras clave: Main Memory; Cache Line; Shadow Instruction; Cache Block; Reuse Distance.

Pp. 52-66

A Study of the Performance Potential for Dynamic Instruction Hints Selection

Rao Fu; Jiwei Lu; Antonia Zhai; Wei-Chung Hsu

Instruction hints have become an important way to communicate compile-time information to the hardware. They can be generated by the compiler and the post-link optimizer to reduce cache misses, improve branch prediction and minimize other performance bottlenecks. This paper discusses different instruction hints available on modern processor architectures and shows the potential performance impact on many benchmark programs. Some hints can be effectively selected at compile time with profile feedback. However, since the same program executable can behave differently on various inputs and performance bottlenecks may change on different micro-architectures, significant performance opportunities can be exploited by selecting instruction hints dynamically.

Palabras clave: Data Cache; Cache Line; Normalize Execution Time; Performance Bottleneck; Memory Instruction.

Pp. 67-80

Reorganizing UNIX for Reliability

Jorrit N. Herder; Herbert Bos; Ben Gras; Philip Homburg; Andrew S. Tanenbaum

In this paper, we discuss the architecture of a modular UNIX-compatible operating system, MINIX 3, that provides reliability beyond that of most other systems. With nearly the entire operating system running as a set of user-mode servers and drivers atop a minimal kernel, the system is fully compartmentalized. By moving most of the code to unprivileged user-mode processes and restricting the powers of each one, we gain proper fault isolation and limit the damage bugs can do. Moreover, the system has been designed to survive and automatically recover from failures in critical modules, such as device drivers, transparent to applications and without user intervention. We used this new design to develop a highly reliable, open-source, POSIX-conformant member of the UNIX family. The resulting system is freely available and has been downloaded over 75,000 times since its release.

Palabras clave: Virtual Machine; Process Manager; Memory Manager; Device Driver; File Server.

Pp. 81-94

Critical-Task Anticipation Scheduling Algorithm for Heterogeneous and Grid Computing

Ching-Hsien Hsu; Ming-Yuan Own; Kuan-Ching Li

The problem of scheduling a weighted directed acyclic graph (DAG) to a set of heterogeneous processors to minimize the completion time has been recently studied. The NP-completeness of the problem has instigated researchers to propose different heuristic algorithms. In this paper, we present an efficient Critical-task Anticipation ( CA ) scheduling algorithm for heterogeneous computing systems. The CA scheduling algorithm introduces a new task prioritizing scheme that based on urgency and importance of tasks to obtain better schedule length compared to the Heterogeneous Earliest Finish Time algorithm. To evaluate the performance of the proposed algorithm, we have developed a simulator that contains a parametric graph generator for generating weighted directed acyclic graphs with various characteristics. We have implemented the CA algorithm along with the HEFT scheduling algorithm on the simulator. The CA algorithm is shown to be effective in terms of speedup and easy to implement.

Palabras clave: Schedule Algorithm; Directed Acyclic Graph; Finish Time; Critical Score; Schedule Length.

Pp. 95-108

Processor Directed Dynamic Page Policy

Dandan Huan; Zusong Li; Weiwu Hu; Zhiyong Liu

The widening gap between today’s processor and memory performance makes memory subsystem design an increasingly important part of computer design. Processor directed dynamic page policy is proposed by investigating the memory access patterns of applications. Processor directed dynamic page policy changes page mode adaptively in accordance with the directions of processor. It combines the advantages of close page policy and open page policy. The processor directed dynamic page policy is based on future memory access behavior. Compared with the direction information of existing dynamic page policies which is based on the history of memory access behavior, the direction information of processor directed dynamic page policy is more accurate. Furthermore, memory access requests of processor are scheduled based on the page policy to increase the page hit rate and reduce page conflict miss rate. The performance of SPEC CPU2000 benchmarks is improved significantly. The IPC is improved by 7.1%, 5.9% and 3.4% on average compared with close page policy, open page policy and conventional dynamic page policy, respectively.

Palabras clave: Godson-2; Memory Control Policy; Dynamic Page Policy; Open Page; Close Page.

Pp. 109-122