Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Computer Systems Architecture: 11th Asia-Pacific Conference, ACSAC 2006, Shanghai, China, September 6-8, 2006, Proceedings

Chris Jesshope ; Colin Egan (eds.)

En conferencia: 11º Asia-Pacific Conference on Advances in Computer Systems Architecture (ACSAC) . Shanghai, China . September 6, 2006 - September 8, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Arithmetic and Logic Structures; Input/Output and Data Communications; Logic Design; Computer Communication Networks; Processor Architectures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-40056-1

ISBN electrónico

978-3-540-40058-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

A Context-Switch Reduction Heuristic for Power-Aware Off-Line Scheduling

Biju Raveendran; Sundar Balasubramaniam; K Durga Prasad; S. Gurunarayanan

Scheduling algorithms significantly affect the performance of a real-time system. In systems with power constraints, context switches in a schedule result in wasted power consumption. We present a scheduling algorithm and a heuristic for reducing the number of context switches. The algorithm executes in near linear time in terms of the number of jobs, finds a feasible schedule in most cases if it exists, and reasonably reduces the number of context switches. Thus it is a power-aware scheduling algorithm.

Palabras clave: Schedule Algorithm; Feasible Schedule; Periodic Task; Context Switch; Task List.

Pp. 437-444

On the Reliability of Drowsy Instruction Caches

Soong Hyun Shin; Sung Woo Chung; Chu Shik Jhon

As technology scales down, the leakage energy accounts for more portion of total energy in a cache. Applying the Dynamic Voltage Scaling(DVS) to a cache, which is called a drowsy cache, is known as one of the most efficient techniques for reducing leakage energy in a cache. However, it increases the Soft Error Rate(SER) and many researchers began to doubt the reliability of a drowsy cache. In this paper, we show that the instruction cache(I-cache) can adopt the DVS without reliability problems for several reasons. First, an I-cache always stores read-only data, rarely incurring unrecoverable errors. In the I-cache, the soft error can be recovered by re-fetching from the lower level memory. Second, the effect of soft errors on performance is negligible, because the SER is extremely low. Additional, considerable percentage of soft errors do not harm the performance. In this paper, the evaluation results show that the drowsy I-cache rarely increases unrecoverable errors and negligibly degrades the performance.

Pp. 445-451

Design of a Reconfigurable Cryptographic Engine

Kang Sun; Lingdi Ping; Jiebing Wang; Zugen Liu; Xuezeng Pan

Cryptographic algorithms are usually compute-intensive and more efficiently implemented in hardware than in software. By taking advantage of FPGA technology, some work offers high performance and flexible solutions for cryptographic algorithms. But FPGAs still have some drawbacks. To overcome inherent shortages of FPGA, a novel asynchronous reconfigurable cryptographic engine (ARCEN) is introduced. In this architecture, reconfigurable cryptographic array is the kernel. It routes signals asynchronously between adjacent cells through Neighbor-to-Neighbor wires with 4-phase handshaking protocol. Computation circuit for reconfigurable cell is developed with modified DSDCVS logic. Experiment results show that the architecture has a better performance than FPGA.

Palabras clave: Cryptographic Algorithm; Logic Cell; Hash Family; Operation Circuit; Asynchronous Circuit.

Pp. 452-458

Enhancing ICOUNT2.8 Fetch Policy with Better Fairness for SMT Processors

Caixia Sun; Hongwei Tang; Minxuan Zhang

In Simultaneous Multithreading (SMT) processors, the instruction fetch policy implicitly determines shared resources allocation among all the co-scheduled threads, and consequently affects throughput and fairness. However, prior work on fetch policies almost focuses on throughput optimization. The issue of fairness between threads in progress rates is studied rarely. In this paper, we take fairness as the optimization goal and propose an enhanced version of ICOUNT2.8 with better fairness called ICOUNT2.8-fairness. Results show that using ICOUNT2.8-fairness, RPRrange (a fairness metric defined in this paper) is less than 5% for all types of workloads, and the degradation of overall throughput is not more than 7%. Especially, for two-thread MIX workload, ICOUNT2.8-fairness outperforms ICOUNT2.8 in throughput at the same time of achieving better fairness.

Palabras clave: SMT; Instruction Fetch Policy; Throughput; Fairness.

Pp. 459-465

The New BCD Subtractor and Its Reversible Logic Implementation

Himanshu Thapliyal; M. B Srinivas

IEEE 754r is the ongoing revision to the IEEE 754 floating point standard and a major enhancement to the standard is the addition of decimal format. Thus in this paper we propose a novel BCD subtractor called carry skip BCD subtractor. We also propose the reversible logic implementation of the proposed carry skip BCD subtractor. Reversible logic is emerging as a promising computing paradigm having its applications in low power CMOS, quantum computing, nanotechnology, and optical computing. It is not possible to realize quantum computing without reversible logic. It is being tried to design the BCD subtractor optimal in terms of number of reversible gates and garbage outputs.

Palabras clave: Reversible Logic; Full Adder; Toffoli Gate; Reversible Gate; Reversible Circuit.

Pp. 466-472

Power-Efficient Microkernel of Embedded Operating System on Chip

Tianzhou Chen; Wei Hu; Yi Lian

Because the absence of hardware support, almost all of embedded operating system are based on SDRAM in past time. With progress of embedded system hardware, embedded system can provide more substrative supports for embedded operating systems. In this paper we present an operating system microkernel for embedded system which can reside in the SRAM on chip. With progress of embedded system hardware, embedded system can provide more substrative supports for embedded operating systems. In this paper we present an operating system microkernel named SRAMOS for embedded system which can reside in the SRAM on chip. This microkernel can make the most of low power consumption of SRAM. The experiment results show that this microkernel performs better than the traditional embedded operating systems.

Palabras clave: power-efficient; microkernel; embedded operating system.

Pp. 473-479

Understanding Prediction Limits Through Unbiased Branches

Lucian Vintan; Arpad Gellert; Adrian Florea; Marius Oancea; Colin Egan

The majority of currently available branch predictors base their prediction accuracy on the previous k branch outcomes. Such predictors sustain high prediction accuracy but they do not consider the impact of unbiased branches which are difficult-to-predict. In this paper, we quantify and evaluate the impact of unbiased branches and show that any gain in prediction accuracy is proportional to the frequency of unbiased branches. By using the SPECcpu2000 integer benchmarks we show that there are a significant proportion of unbiased branches which severely impact on prediction accuracy (averaging between 6% and 24% depending on the prediction context used).

Palabras clave: Prediction Accuracy; Distribution Index; Polarisation Index; High Prediction Accuracy; Path Information.

Pp. 480-487

Bandwidth Optimization of the EMCI for a High Performance 32-bit DSP

Dong Wang; Xiao Hu; Shuming Chen; Yang Guo

Memory bandwidth and interface flexibility are often bottlenecks of embedded processors. The research about memory bandwidth optimization has become a hot topic. This paper introduces four new bandwidth optimization methods for External Memory Control Interface (EMCI) integrated in high performance digit signal processors (DSP), and aims at realization of the maximum throughput of data transmission and architecture flexibility, i.e. programmable and decoupled structure, pipelined transmission of burst mode, programmable priority for arbitration, and preferential reading based on cache-line offset. The experiment results show that the performance improvement is remarkable, but different for synchronous and asynchronous memories, and depends on the application behavior. The decoupled structure proves to be of great benefit to the architectural exploration and optimization for DSPs.

Palabras clave: Digit Signal Processor; Cache Line; Bandwidth Optimization; Memory Controller; Embed Processor.

Pp. 488-494

Research on Petersen Graphs and Hyper-cubes Connected Interconnection Networks

Wang Lei; Chen Zhiping

On the basis of the short diameter of Petersen graph, and high connectivity of Hypercube, an innovative interconnection network named HRP(n) (Hyper-cubes and Rings connected Petersen Graph), is proposed, and whose characteristics are studied simultaneously. It is proved that HRP(n) has not only regularity and good extensibility, but also has shorter diameter and better connectivity than those interconnection networks such as Q_n, TQ_n, CQ_n, and HP(n). In addition, the unicasting, broadcasting, and fault-tolerant routing algorithms are designed for HRP(n), analyses show that those routing algorithms have good communication efficiency.

Palabras clave: Petersen graph; Interconnection network; Routing algorithm.

Pp. 495-501

Cycle Period Analysis and Optimization of Timed Circuits

Lei Wang; Zhi-ying Wang; Kui Dai

In this paper, a method is proposed to analyze the minimum average cycle period of the timed circuits. Timed Petri net is used to model timed circuits. Our method is focus on structural analysis of the Petri net model of the timed circuits, which is another way to reduce the state space of the analyzed model. Then an algorithm is proposed to optimize the performance of timed circuit by asynchronous retiming technique. The algorithm balances the asynchronous pipelines to gain the target cycle period while minimize the area at the same time. Experimental results demonstrate the computational feasibility and effectiveness of both approaches.

Pp. 502-508