Catálogo de publicaciones - libros

Compartir en
redes sociales


Architecture of Computing Systems: ARCS 2007: 20th International Conference, Zurich, Switzerland, March 12-15, 2007. Proceedings

Paul Lukowicz ; Lothar Thiele ; Gerhard Tröster (eds.)

En conferencia: 20º International Conference on Architecture of Computing Systems (ARCS) . Zurich, Switzerland . March 12, 2007 - March 15, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer Communication Networks; Computer System Implementation; Operating Systems; Software Engineering; Information Systems Applications (incl. Internet); Information Storage and Retrieval

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-71267-1

ISBN electrónico

978-3-540-71270-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

A Reconfigurable Processor for Forward Error Correction

Afshin Niktash; Hooman T. Parizi; Nader Bagherzadeh

In this paper, we introduced a reconfigurable processor optimized for implementation of Forward Error Correction (FEC) algorithms and provided the implementation results of the Viterbi and Turbo decoding algorithms. In this architecture, an array of processing elements is employed to perform the required operations in parallel. Each processing element encapsulates multiple functional units which are highly optimized for FEC algorithms. A data buffer coupled with high bandwidth interconnection network facilitates pumping the data to the array and collecting the results. A processing element controller orchestrates the operation and the data movement. Different FEC algorithms like Viterbi, Turbo, Reed-Solomon and LDPC are widely used in digital communication and could be implemented on this architecture. Unlike traditional approach to programmable FEC architectures, this architecture is instruction-level programmable which results the ultimate flexibility and programmability.

Pp. 1-13

FPGA-Accelerated Deletion-Tolerant Coding for Reliable Distributed Storage

Peter Sobe; Volker Hampel

Distributed storage systems often have to guarantee data availability despite of failures or temporal downtimes of storage nodes. For this purpose, a deletion-tolerant code is applied that allows to reconstruct missing parts in a codeword, i.e. to tolerate a distinct number of failures. The Reed/Solomon (R/S) code is the most general deletion-tolerant code and can be adapted to a required number of tolerable failures. In terms of its least information overhead, R/S is optimal, but it consumes significantly more computation power than parity-based codes. Reconfigurable hardware can be employed for particular operations in finite fields for R/S coding by specialized arithmetics, so that the higher computation effort is compensated by faster and parallel operations. We present architectures for an application–specific acceleration by FPGAs. In this paper, strategies for an efficient communication with the accelerating FPGA and a performance comparison between a pure software-based solution and the accelerated system are provided.

Pp. 14-27

LIRAC: Using Live Range Information to Optimize Memory Access

Peng Li; Dongsheng Wang; Haixia Wang; Meijuan Lu; Weimin Zheng

Processor-memory wall is always the focus of computer architecture research. While existing cache architecture can significantly mitigate the gap between processor and memory, they are not very effective in certain scenarios. For example, when scratch data is cached, it is not necessary to write back modified data. However, existing cache architectures do not provide enough support in distinguishing this kind of situation. Based on this observation, we propose a novel cache architecture called LIve Range Aware Cache (LIRAC). This cache scheme can significantly reduce cache write-backs with minimal hardware support.

The performance of LIRAC is evaluated using trace-driven analysis and simplescalar simulator. We used SPEC CPU 2000 benchmarks and a number of multimedia applications. Simulation results show that LIRAC can eliminate 21% cache write-backs on average and up to 85% in the best case.

The idea of LIRAC can be extended and used in write buffers and CMP with transactional memory. In this paper, we also propose LIve Range Aware BUFfer (LIRABuf). Simulation results show that the improvement of LIRABuf is also significant.

Pp. 28-42

Optimized Register Renaming Scheme for Stack-Based x86 Operations

Xuehai Qian; He Huang; Zhenzhong Duan; Junchao Zhang; Nan Yuan; Yongbin Zhou; Hao Zhang; Huimin Cui; Dongrui Fan

The stack-based floating point unit (FPU) in the x86 architecture limits its floating point (FP) performance. The flat register file can improve FP performance but affect x86 compatibility. This paper presents an optimized two-phase floating point register renaming scheme used in implementing an x86-compliant processor. The two-phase renaming scheme eliminates the implicit dependencies between the consecutive FP instructions and redundant operations. As two applications of the method, the techniques used in the second phase of the scheme can eliminate redundant loads and reduce the mis-speculation ratio of the load-store queue. Moreover, the performance of a binary translation system that translates instructions in x86 to MIPS-like ISA can also be boosted by adding the related architectural supports in this optimized scheme to the architecture.

Pp. 43-56

A Customized Cross-Bar for Data-Shuffling in Domain-Specific SIMD Processors

Praveen Raghavan; Satyakiran Munaga; Estela Rey Ramos; Andy Lambrechts; Murali Jayapala; Francky Catthoor; Diederik Verkest

Shuffle operations are one of the most common operations in SIMD based embedded system architectures. In this paper we study different families of shuffle operations that frequently occur in embedded applications running on SIMD architectures. These shuffle operations are used to drive the design of a custom shuffler for domain-specific SIMD processors. The energy efficiency of various crossbar based custom shufflers is analyzed and compared with the widely used full crossbar. We show that by customizing the crossbar to implement specific shuffle operations required in the target application domain, we can reduce the energy consumption of shuffle operations by up to 80%. We also illustrate the tradeoffs between flexibility and energy efficiency of custom shufflers and show that customization offers reasonable benefits without compromising the flexibility required for the target application domain.

Pp. 57-68

Customized Placement for High Performance Embedded Processor Caches

Subramanian Ramaswamy; Sudhakar Yalamanchili

In this paper, we propose the use of compiler controlled customized placement policies for embedded processor data caches. Profile driven customized placement improves the sharing of cache resources across memory lines thereby reducing conflict misses and lowering the average memory access time (AMAT) and consequently execution time. Alternatively, customized placement policies can be used to reduce the cache size and associativity for a fixed AMAT with an attendant reduction in power and area. These advantages are achieved with a small increase in complexity of the address translation in indexing the cache. The consequent increase in critical path length is offset by lowered miss rates. Simulation experiments with embedded benchmark kernels show that caches with customized placement provide miss rates comparable to traditional caches with larger sizes and higher associativities.

Pp. 69-82

A Multiprocessor Cache for Massively Parallel SoC Architectures

Jörg-Christian Niemann; Christian Liß̈; Mario Porrmann; Ulrich Rückert

In this paper, we present an advanced multiprocessor cache architecture for chip multiprocessors (CMPs). It is designed for the scalable GigaNetIC CMP, which is based on massively parallel on-chip computing clusters. Our write-through multiprocessor cache is configurable in respect to the most relevant design options. It is supposed to be used in universal co-proc essors as well as in network processing units. For an early verification of the software and an early exploration of various hardware configurations, we have developed a SystemC-based simulation model for the complete chip multiproc essor. For detailed hardware-software co-verification, we use our FPGA-based rapid prototyping system RAPTOR2000 to emulate our architecture with near-ASIC performance. Finally, we demonstrate the performance gains for different application scenarios enabled by the usage of our multiprocessor cache.

Pp. 83-97

Improving Resource Discovery in the Arigatoni Overlay Network

Raphaël Chand; Luigi Liquori; Michel Cosnard

Arigatoni is a structured multi-layer overlay network providing various services with variable guarantees, and promoting an intermittent participation to the virtual organization where peers can appear, disappear and organize themselves dynamically. Arigatoni mainly concerns with how resources are declared and discovered in the overlay, allowing global computers to make a secure, PKI-based, use of global aggregated computational power, storage, information resources, etc. Arigatoni provides fully decentralized, asynchronous and scalable resource discovery, and provides mechanisms for dealing with dynamic virtual organizations. This paper introduces a non trivial improvement of the original resource discovery protocol by allowing to register and to ask for . Simulations show that it is efficient and scalable.

Pp. 98-111

An Effective Multi-hop Broadcast in Vehicular Ad-Hoc Network

Tae-Hwan Kim; Won-Kee Hong; Hie-Cheol Kim

Multi-hop broadcast protocols in vehicular ad-hoc network (VANET) require more prompt message dissemination than traditional broadcast protocols because they mainly deal with vital data involved in driver safety. In this paper, a time reservation-based relay node selection algorithm is proposed in order to achieve immediate message dissemination. All nodes in the communication range of a relay node randomly choose their waiting time within a given time-window. The time-window range is determined by a distance from a previous relay node and a reservation ratio of the time-window. A node with the shortest waiting time is selected as a new relay node. The experimental results show that the proposed algorithm has a shorter end-to-end delay time than the distance-based relay node selection algorithm no matter how node density varies. In particular, when the node density is low, the proposed algorithm has a 25.7% shorter end-to-end time and a 46% better performance in terms of the compound metric than the distance-based relay node selection algorithm.

Pp. 112-125

Functional Knowledge Exchange Within an Intelligent Distributed System

Oliver Buchtala; Bernhard Sick

Humans learn from other humans – and intelligent nodes of a distributed system operating in a dynamic environment (e.g., robots, smart sensors, or software agents) should do the same! Humans do not only learn by communicating facts but also by exchanging rules. The latter can be seen as a more generic, abstract kind of knowledge. We refer to these two kinds of knowledge as “descriptive” and “functional” knowledge, respectively. In a dynamic environment, where new knowledge arises or old knowledge becomes obsolete, intelligent nodes must adapt on-line to their local environment by means of self-learning mechanisms. If they exchange functional knowledge in addition to descriptive knowledge, they will efficiently be enabled to cope with a particular phenomenon before they observe this phenomenon in their local environment, for instance. In this article, we present an architecture of so-called organic nodes that face a classification problem. We show how a need for new functional knowledge is detected, how new rules are determined, and how the exchange of locally acquired rules within a network of organic nodes leads to a certain kind of self-optimization of the overall system. We show the potential of our methods using an artificial scenario and a real-world scenario from the field of intrusion detection in computer networks.

Pp. 126-141