Catálogo de publicaciones - libros
Advances in Computer Systems Architecture: 11th Asia-Pacific Conference, ACSAC 2006, Shanghai, China, September 6-8, 2006, Proceedings
Chris Jesshope ; Colin Egan (eds.)
En conferencia: 11º Asia-Pacific Conference on Advances in Computer Systems Architecture (ACSAC) . Shanghai, China . September 6, 2006 - September 8, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Computer System Implementation; Arithmetic and Logic Structures; Input/Output and Data Communications; Logic Design; Computer Communication Networks; Processor Architectures
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-40056-1
ISBN electrónico
978-3-540-40058-5
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Cobertura temática
Tabla de contenidos
doi: 10.1007/11859802_31
Reducing the Branch Power Cost in Embedded Processors Through Static Scheduling, Profiling and SuperBlock Formation
Michael Hicks; Colin Egan; Bruce Christianson; Patrick Quick
Dynamic branch predictor logic alone accounts for approximately 10% of total processor power dissipation. Recent research indicates that the power cost of a large dynamic branch predictor is offset by the power savings created by its increased accuracy. We describe a method of reducing dynamic predictor power dissipation without degrading prediction accuracy by using a combination of local delay region scheduling and run time profiling of branches. Feedback into the static code is achieved with hint bits and avoids the need for dynamic prediction for some individual branches. This method requires only minimal hardware modifications and coexists with a dynamic predictor.
Pp. 366-372
doi: 10.1007/11859802_32
Fault-Free Pairwise Independent Hamiltonian Paths on Faulty Hypercubes
Sun-Yuan Hsieh
A Hamiltonian path in G is a path which contains every vertex of G exactly once. Two Hamiltonian paths P _1 = 〈 u _1, u _2,..., u _ n 〉 and P _2 = 〈 v _1, v _2,..., v _ n 〉 of G are said to be independent if u _1= v _1, u _ n = v _ n , and u _ i ≠ v _ i for all 1< i < n . A set of Hamiltonian paths { P _1, P _2,..., P _ k } of G are mutually independent if any two different Hamiltonian paths in the set are independent. It is well-known that an n -dimensional hypercube Q _ n is bipartite with two partite sets of equal-size. Let F be the set of faulty edges of Q _ n such that | F |≤ n –2. In this paper, we show that Q _ n – F contains ( n –| F |–1)-mutually independent Hamiltonian paths between any two vertices from different partite sets, where n ≥2.
Palabras clave: Interconnection networks; hypercubes; Hamiltonian; pairwise independent Hamiltonian paths; fault-tolerant embedding.
Pp. 373-379
doi: 10.1007/11859802_33
Constructing Node-Disjoint Paths in Enhanced Pyramid Networks
Hsien-Jone Hsieh; Dyi-Rong Duh
Chen et al. in 2004 proposed a new hierarchy structure, called the enhanced pyramid network ( EPM , for short), by replacing each mesh in a pyramid network ( PM , for short) with a torus. Recently, some topological properties and communication on the EPM s have been investigated or derived. Their results have revealed that an EPM is an attractive alternative to a PM . This study investigates the node-disjoint paths between any two distinct nodes and the upper bound of the ω -wide-diameter of an EPM . This result shows that the EPM s have smaller ω -wide-diameters than the PM s.
Palabras clave: Enhanced pyramid networks; pyramid networks; fault-tolerance; wide diameter; node-disjoint paths; container; interconnection networks.
Pp. 380-386
doi: 10.1007/11859802_34
Striping Cache: A Global Cache for Striped Network File System
Sheng-Kai Hung; Yarsun Hsu
Using caching to enhance performance has been widely used in the computer system. This is still true in the distributed paradigm. In the distributed environment, caches are distributed in each of the nodes and can be collected to form a global cache. However, the overall performance cannot benefit from the global cache without efficient cooperation of these global resources. The local file system in each node knows nothing about a stripe and thus can not benefit from the related blocks of a stripe. We propose a striping cache (SC) which knows the related blocks of a stripe and can use them to improve the performance of a striped network file system. This high level cache can benefit from previous reads and can aggregate small writes to improve the overall performance. We implement this mechanism in our reliable parallel file system (RPFS). The experimental results show that both read and write performance can be improved with SC support. The improvement comes from the fact that we can reduce the number of disk accesses by employing SC.
Pp. 387-393
doi: 10.1007/11859802_35
DTuplesHPC: Distributed Tuple Space for Desktop High Performance Computing
Yi Jiang; Guangtao Xue; Minglu Li; Jinyuan You
This paper introduces a Linda [2] like peer-to-peer tuple space middleware build on top of distributed hash table – DTuplesHPC. This tuple space middleware is capable of being a high performance computing platform. And the decoupled style of tuple space [1] model is used instead of the message-passing model that is widely used in MPI based high performance computing. With the help of tuple space model, the distributed computing can be liberated from architectural consideration. First, the DTuples platform allows the dynamic organization of the computing resources. That is to say, the job can be submitted at any time, but the computation resources may be ready later. The time and space are all decoupled in DTuplesHPC. Second, it brings the simple tuple space programming model to the large-scale high performance computing at desktop. In our design, the in(), rd(), out(), copy-collect() and eval() primitives are supported. In this paper, we present the key design concepts of the DTuples.
Palabras clave: High Performance Computing; Active Object; Distribute Hash Table; Code Fragment; Tuple Space.
Pp. 394-400
doi: 10.1007/11859802_36
The Algorithm and Circuit Design of a 400MHz 16-Bit Hybrid Multiplier
Li Zhentao; Chen Shuming; Li Zhaoliang; Lei Conghua
In this paper we present the algorithm of a 16-bit hybrid multiplier, which can work in two modes. In normal mode, it performs a 16-bit multi-plication. In SIMD mode, it performs two parallel 8-bit multiplications. The proposed algorithm is based on the raix-4 modified Booth’s algorithm. Our algorithm generates ten partial products and a modifier, which is five less than the other algorithms. We can get one 32-bit product or two 16-bit products by directly accumulating the ten partial products and the modifier, easing the design of the tree structures for compressing the partial products and the final adder. The proposed algorithm is adopted by YHFT-DSP/800, a high perfor-mance fixed-point DSP. The multiplier was full custom designed in 0.18um CMOS technology. We also designed a test chip. The test results show the multiplier works well at 400MHz in normal mode, 480MHz in SIMD mode. The simulated power is 35.8 mW at 400MHz, and 42.5 mW at 480MHz.
Palabras clave: Digital Signal Processor; Partial Product; Test Chip; SIMD Instruction; Final Adder.
Pp. 401-408
doi: 10.1007/11859802_37
Live Range Aware Cache Architecture
Peng Li; Dongsheng Wang; Songliu Guo; Tao Tian; Weimin Zheng
Memory wall is always the focus of computer architecture research. In this paper, we observe that in computers with write-back cache, memory write operation actually lags behind write instruction commitment. By the time memory write operation executes, the data might already have gone out of its live range. Based on this observation, a novel Cache architecture called LIve Range Aware Cache (LIRAC) is proposed. LIRAC can significantly reduce the number of write operations with minimal hardware support. Performance benefits of LIRAC are evaluated by trace-based analysis using simplescalar simulator and SPEC CPU 2000 benchmarks. Our results show that LIRAC can eliminate 21% of write operations on average and up to 85% in the best case.
Palabras clave: Live Range; Cache; Memory Hierarchy.
Pp. 409-415
doi: 10.1007/11859802_38
The Challenges of Efficient Code-Generation for Massively Parallel Architectures
Jason M McGuiness; Colin Egan; Bruce Christianson; Guang Gao
Overcoming the memory wall [15] may be achieved by increasing the bandwidth and reducing the latency of the processor to memory connection, for example by implementing Cellular architectures, such as the IBM Cyclops. Such massively parallel architectures have sophisticated memory models. In this paper we used DIMES (the Delaware Iterative Multiprocessor Emulation System), developed by CAPSL at the University of Delaware, as a hardware evaluation tool for cellular architectures. The authors contend that there is an open question regarding the potential, ideal approach to parallelism from the programmer’s perspective. For example, at language-level such as UPC or HPF, or using trace-scheduling, or at a library-level, for example OpenMP or POSIX-threads. To investigate this, we have chosen to use a threaded Mandelbrot-set generator with a work-stealing algorithm to evaluate the DIMES cthread programming model for writing a simple multi-threaded program.
Palabras clave: Memory Model; Cellular Architecture; Work Thread; Distribute Processing Symposium; Memory Consistency Model.
Pp. 416-422
doi: 10.1007/11859802_39
Reliable Systolic Computing Through Redundancy
Kunio Okuda; Siang Wun Song; Marcos Tatsuo Yamamoto
The systolic array paradigm has low communication demand because it does not use costly global communication and each processor communicates with few other processors. It is thus suitable to be used in cluster computing. The systolic approach, however, is vulnerable in a heterogeneous environment where machines perform differently. In this paper we propose a redundant systolic solution with high-availability to deal with this problem. We analyze the overhead that results from the need to coordinate the actions of the redundant processors and show that this overhead is worth the performance improvement it provides.
Palabras clave: cluster computing; heterogeneity; redundancy; high-availability.
Pp. 423-429
doi: 10.1007/11859802_40
A Diversity-Controllable Genetic Algorithm for Optimal Fused Traffic Planning on Sensor Networks
Yantao Pan; Xicheng Lu; Peidong Zhu; Shen Ma
In some sensor network applications e.g. target tracing, multi-profile data about an event are fused at intermediate nodes. The optimal planning of such fused traffic is important for prolonging the network lifetime, because data communications consume the most energy of sensor networks. As a general method for such optimization problems, genetic algorithms suffer from tremendous communication diversities that increase greatly with the network size. In this paper, we propose a diversity-controllable genetic algorithm for optimizing fused traffic planning. Simulation shows that it gains remarkable improvements.
Palabras clave: Sensor Networks; Lifetime Optimization; and Data Fusion.
Pp. 430-436