Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Computer Systems Architecture: 11th Asia-Pacific Conference, ACSAC 2006, Shanghai, China, September 6-8, 2006, Proceedings

Chris Jesshope ; Colin Egan (eds.)

En conferencia: 11º Asia-Pacific Conference on Advances in Computer Systems Architecture (ACSAC) . Shanghai, China . September 6, 2006 - September 8, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Arithmetic and Logic Structures; Input/Output and Data Communications; Logic Design; Computer Communication Networks; Processor Architectures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-40056-1

ISBN electrónico

978-3-540-40058-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Reducing the Branch Power Cost in Embedded Processors Through Static Scheduling, Profiling and SuperBlock Formation

Michael Hicks; Colin Egan; Bruce Christianson; Patrick Quick

Dynamic branch predictor logic alone accounts for approximately 10% of total processor power dissipation. Recent research indicates that the power cost of a large dynamic branch predictor is offset by the power savings created by its increased accuracy. We describe a method of reducing dynamic predictor power dissipation without degrading prediction accuracy by using a combination of local delay region scheduling and run time profiling of branches. Feedback into the static code is achieved with hint bits and avoids the need for dynamic prediction for some individual branches. This method requires only minimal hardware modifications and coexists with a dynamic predictor.

Pp. 366-372

Fault-Free Pairwise Independent Hamiltonian Paths on Faulty Hypercubes

Sun-Yuan Hsieh

A Hamiltonian path in G is a path which contains every vertex of G exactly once. Two Hamiltonian paths P _1 = 〈 u _1, u _2,..., u _ n 〉 and P _2 = 〈 v _1, v _2,..., v _ n 〉 of G are said to be independent if u _1= v _1, u _ n = v _ n , and u _ i  ≠  v _ i for all 1< i < n . A set of Hamiltonian paths { P _1, P _2,..., P _ k } of G are mutually independent if any two different Hamiltonian paths in the set are independent. It is well-known that an n -dimensional hypercube Q _ n is bipartite with two partite sets of equal-size. Let F be the set of faulty edges of Q _ n such that | F |≤ n –2. In this paper, we show that Q _ n – F contains ( n –| F |–1)-mutually independent Hamiltonian paths between any two vertices from different partite sets, where n ≥2.

Palabras clave: Interconnection networks; hypercubes; Hamiltonian; pairwise independent Hamiltonian paths; fault-tolerant embedding.

Pp. 373-379

Constructing Node-Disjoint Paths in Enhanced Pyramid Networks

Hsien-Jone Hsieh; Dyi-Rong Duh

Chen et al. in 2004 proposed a new hierarchy structure, called the enhanced pyramid network ( EPM , for short), by replacing each mesh in a pyramid network ( PM , for short) with a torus. Recently, some topological properties and communication on the EPM s have been investigated or derived. Their results have revealed that an EPM is an attractive alternative to a PM . This study investigates the node-disjoint paths between any two distinct nodes and the upper bound of the ω -wide-diameter of an EPM . This result shows that the EPM s have smaller ω -wide-diameters than the PM s.

Palabras clave: Enhanced pyramid networks; pyramid networks; fault-tolerance; wide diameter; node-disjoint paths; container; interconnection networks.

Pp. 380-386

Striping Cache: A Global Cache for Striped Network File System

Sheng-Kai Hung; Yarsun Hsu

Using caching to enhance performance has been widely used in the computer system. This is still true in the distributed paradigm. In the distributed environment, caches are distributed in each of the nodes and can be collected to form a global cache. However, the overall performance cannot benefit from the global cache without efficient cooperation of these global resources. The local file system in each node knows nothing about a stripe and thus can not benefit from the related blocks of a stripe. We propose a striping cache (SC) which knows the related blocks of a stripe and can use them to improve the performance of a striped network file system. This high level cache can benefit from previous reads and can aggregate small writes to improve the overall performance. We implement this mechanism in our reliable parallel file system (RPFS). The experimental results show that both read and write performance can be improved with SC support. The improvement comes from the fact that we can reduce the number of disk accesses by employing SC.

Pp. 387-393

DTuplesHPC: Distributed Tuple Space for Desktop High Performance Computing

Yi Jiang; Guangtao Xue; Minglu Li; Jinyuan You

This paper introduces a Linda [2] like peer-to-peer tuple space middleware build on top of distributed hash table – DTuplesHPC. This tuple space middleware is capable of being a high performance computing platform. And the decoupled style of tuple space [1] model is used instead of the message-passing model that is widely used in MPI based high performance computing. With the help of tuple space model, the distributed computing can be liberated from architectural consideration. First, the DTuples platform allows the dynamic organization of the computing resources. That is to say, the job can be submitted at any time, but the computation resources may be ready later. The time and space are all decoupled in DTuplesHPC. Second, it brings the simple tuple space programming model to the large-scale high performance computing at desktop. In our design, the in(), rd(), out(), copy-collect() and eval() primitives are supported. In this paper, we present the key design concepts of the DTuples.

Palabras clave: High Performance Computing; Active Object; Distribute Hash Table; Code Fragment; Tuple Space.

Pp. 394-400

The Algorithm and Circuit Design of a 400MHz 16-Bit Hybrid Multiplier

Li Zhentao; Chen Shuming; Li Zhaoliang; Lei Conghua

In this paper we present the algorithm of a 16-bit hybrid multiplier, which can work in two modes. In normal mode, it performs a 16-bit multi-plication. In SIMD mode, it performs two parallel 8-bit multiplications. The proposed algorithm is based on the raix-4 modified Booth’s algorithm. Our algorithm generates ten partial products and a modifier, which is five less than the other algorithms. We can get one 32-bit product or two 16-bit products by directly accumulating the ten partial products and the modifier, easing the design of the tree structures for compressing the partial products and the final adder. The proposed algorithm is adopted by YHFT-DSP/800, a high perfor-mance fixed-point DSP. The multiplier was full custom designed in 0.18um CMOS technology. We also designed a test chip. The test results show the multiplier works well at 400MHz in normal mode, 480MHz in SIMD mode. The simulated power is 35.8 mW at 400MHz, and 42.5 mW at 480MHz.

Palabras clave: Digital Signal Processor; Partial Product; Test Chip; SIMD Instruction; Final Adder.

Pp. 401-408

Live Range Aware Cache Architecture

Peng Li; Dongsheng Wang; Songliu Guo; Tao Tian; Weimin Zheng

Memory wall is always the focus of computer architecture research. In this paper, we observe that in computers with write-back cache, memory write operation actually lags behind write instruction commitment. By the time memory write operation executes, the data might already have gone out of its live range. Based on this observation, a novel Cache architecture called LIve Range Aware Cache (LIRAC) is proposed. LIRAC can significantly reduce the number of write operations with minimal hardware support. Performance benefits of LIRAC are evaluated by trace-based analysis using simplescalar simulator and SPEC CPU 2000 benchmarks. Our results show that LIRAC can eliminate 21% of write operations on average and up to 85% in the best case.

Palabras clave: Live Range; Cache; Memory Hierarchy.

Pp. 409-415

The Challenges of Efficient Code-Generation for Massively Parallel Architectures

Jason M McGuiness; Colin Egan; Bruce Christianson; Guang Gao

Overcoming the memory wall [15] may be achieved by increasing the bandwidth and reducing the latency of the processor to memory connection, for example by implementing Cellular architectures, such as the IBM Cyclops. Such massively parallel architectures have sophisticated memory models. In this paper we used DIMES (the Delaware Iterative Multiprocessor Emulation System), developed by CAPSL at the University of Delaware, as a hardware evaluation tool for cellular architectures. The authors contend that there is an open question regarding the potential, ideal approach to parallelism from the programmer’s perspective. For example, at language-level such as UPC or HPF, or using trace-scheduling, or at a library-level, for example OpenMP or POSIX-threads. To investigate this, we have chosen to use a threaded Mandelbrot-set generator with a work-stealing algorithm to evaluate the DIMES cthread programming model for writing a simple multi-threaded program.

Palabras clave: Memory Model; Cellular Architecture; Work Thread; Distribute Processing Symposium; Memory Consistency Model.

Pp. 416-422

Reliable Systolic Computing Through Redundancy

Kunio Okuda; Siang Wun Song; Marcos Tatsuo Yamamoto

The systolic array paradigm has low communication demand because it does not use costly global communication and each processor communicates with few other processors. It is thus suitable to be used in cluster computing. The systolic approach, however, is vulnerable in a heterogeneous environment where machines perform differently. In this paper we propose a redundant systolic solution with high-availability to deal with this problem. We analyze the overhead that results from the need to coordinate the actions of the redundant processors and show that this overhead is worth the performance improvement it provides.

Palabras clave: cluster computing; heterogeneity; redundancy; high-availability.

Pp. 423-429

A Diversity-Controllable Genetic Algorithm for Optimal Fused Traffic Planning on Sensor Networks

Yantao Pan; Xicheng Lu; Peidong Zhu; Shen Ma

In some sensor network applications e.g. target tracing, multi-profile data about an event are fused at intermediate nodes. The optimal planning of such fused traffic is important for prolonging the network lifetime, because data communications consume the most energy of sensor networks. As a general method for such optimization problems, genetic algorithms suffer from tremendous communication diversities that increase greatly with the network size. In this paper, we propose a diversity-controllable genetic algorithm for optimizing fused traffic planning. Simulation shows that it gains remarkable improvements.

Palabras clave: Sensor Networks; Lifetime Optimization; and Data Fusion.

Pp. 430-436