Catálogo de publicaciones - libros

Compartir en
redes sociales

Embedded Computer Systems: Architectures, Modeling, and Simulation: 7th International Workshop, SAMOS 2007, Samos, Greece, July 16-19, 2007. Proceedings

Stamatis Vassiliadis ; Mladen Bereković ; Timo D. Hämäläinen (eds.)

En conferencia: 7º International Workshop on Embedded Computer Systems (SAMOS) . Samos, Greece . July 16, 2007 - July 19, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Theory of Computation; Computer Hardware; Processor Architectures; Computer Communication Networks; System Performance and Evaluation; Computer System Implementation

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73622-6

ISBN electrónico

978-3-540-73625-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Artes

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-73625-7_11

An Automatically-Retargetable Time-Constraint-Driven Instruction Scheduler for Post-compiling Optimization of Embedded Code

José O. Carlomagno F.; Luiz F. P. Santos; Luiz C. V. dos Santos

Although SoC design space exploration requires retargetable tools and real-time constraint awareness, conventional compiler infrastructure barely provides both. This paper proposes a novel, automatically retargetable, time-constraint aware instruction scheduler to fulfill both needs. The tool is based upon a unified representation of instruction precedence and timing constraints. It relies on a formal model of the target processor, written in an architecture description language. Experimental results show that the technique not only handles time-constraint analysis efficiently, but also exploits them successfully to guide code optimization. To give proper evidence of retargetability, we present results for the processors MIPS, PowerPC and SPARC. We obtained speed-ups of 1.18 to 1.23 over pre-optimized code.

- Scheduling & Programming Models | Pp. 86-95

doi: 10.1007/978-3-540-73625-7_12

Improving TriMedia Cache Performance by Profile Guided Code Reordering

Norbert Esser; Renga Sundararajan; Joachim Trescher

There is an ever-increasing gap between memory and processor performance. As a result, exploiting the cache becomes increasingly important, especially for embedded systems where cache sizes are much smaller than that of general purpose processors. The fine-tuning of an application with respect to cache behavior is now largely dependent on the skill of the application programmer. Given the difficulty of predicting cache behavior, this is, even when great skill is applied, a cumbersome task. A wide range of approaches, in hardware as well as in software, can be used to relieve the programmer’s burden. On the hardware side, we can experiment, for example, with cache sizes, line sizes, replacement policies, and cache organization. On the software side, we can use various optimization techniques like software pipelining, branch prediction, and code reordering. The research described in this paper focussed on improving performance by using code reordering techniques.

This paper reports on the work that we have done to reduce the number of line-fetches in the instruction cache. We have extended the functionality of the linker in the TriMedia compiler chain, such that the number of fetches during program execution is reduced. By reordering the code, we ensure that hot code stays in the cache and the cache is not polluted with cold code. Because fewer fetches are needed we expect a performance increase. By analyzing and profiling code, we obtain execution statistics that can help us find better code-allocations.

- Scheduling & Programming Models | Pp. 96-106

doi: 10.1007/978-3-540-73625-7_13

A Streaming Machine Description and Programming Model

Paul Carpenter; David Rodenas; Xavier Martorell; Alex Ramirez; Eduard Ayguadé

In this paper we present the initial development of a streaming environment based on a programming model and machine description. The stream programming model consists of an extension to the C language and it’s translation towards a streaming machine. The extensions will be a set of OpenMP-like directives. We show how a serial application can be converted into a streaming parallel application using the proposed annotations. We also show how the machine description can be used to parametrize a cost-model simulator to predict the performance of the stream program. The cost model allows the compiler to determine the best task partitioning and scheduling for each architecture.

- Scheduling & Programming Models | Pp. 107-116

doi: 10.1007/978-3-540-73625-7_14

Mapping and Performance Evaluation for Heterogeneous MP-SoCs Via Packing

Bastian Ristau; Gerhard Fettweis

The computational demand of signal processing algorithms is rising continuously. Heterogeneous embedded multiprocessor systems-on-chips are one solution to tackle this demand. But to be able to take advantage of the benefits of these systems, new strategies are required how to map applications to such a system and how to evaluate the system’s performance at a very early design stage. We will present a static, analytical, bottom-up methodology for temporal and spatial mapping of applications to MP-SoCs based on packing. Furthermore we will demonstrate how the result can be used for performance evaluation and system improvement without the need for simulations.

- Multi-processor Architectures | Pp. 117-126

doi: 10.1007/978-3-540-73625-7_15

Strategies for Compiling TC to Novel Chip Multiprocessors

Thomas A. M. Bernard; Chris R. Jesshope; Peter M. W. Knijnenburg

Microthreaded C also called is a concurrent language based on the C language which allows the programmer to code concurrency-oriented applications for targeting chip multiprocessors. source code contains fine-grained concurrent control structures, where the concurrency is explicitly written via new keywords. This language is used as an interface for defining dynamic concurrency and as an intermediate language to capture concurrency from data-parallel languages such as Single-Assignment C, or as the target for parallelizing compilers for sequential languages such as C. This paper presents an overview of language, emphasizing the aspects of memory synchronization and concurrent control structures. In order to understand the properties and scopes of the language, we also present the outlines of the architectures after discussing the global concepts of the microthreading model. Finally we show the toolchain we are currently developing to support the model, focusing on compiler strategies.

- Multi-processor Architectures | Pp. 127-138

doi: 10.1007/978-3-540-73625-7_16

Image Quantisation on a Massively Parallel Embedded Processor

Jan Jacobs; Leroy van Engelen; Jan Kuper; Gerard J. M. Smit

Recent advances in embedded processing architectures allow for new powerful algorithms, which exploit the intrinsic parallelism present in image processing applications. This paper describes the results of the mapping process of stochastic image quantisation on a massively parallel processor. The problem can be modeled in a parallel way. Despite the fact that the implementation is IO bound, good speedups are achieved (16× compared to a standard image processing package running on a Pentium processor).

- Multi-processor Architectures | Pp. 139-148

doi: 10.1007/978-3-540-73625-7_17

Stream Image Processing on a Dual-Core Embedded System

Michael G. Benjamin; David Kaeli

Effective memory utilization is critical to reap the benefits of the multi-core processors emerging on embedded systems. In this paper we explore the use of a stream model to effectively utilize memory hierarchies. We target image processing algorithms running on the Analog Devices Blackfin BF561 fixed-point, dual-core DSP. Using optimized assembly to effectively use cores reduces runtime, but also underscores the need to mitigate the memory bottleneck. Like other embedded processors, the Blackfin BF561 has L2 SRAM available. Applying the stream model allows us to effectively make full use of both cores and the L2 SRAM. We achieve almost a 10X speedup in execution time compared to non-optimized C code.

- Multi-processor Architectures | Pp. 149-158

doi: 10.1007/978-3-540-73625-7_18

MORA: A New Coarse-Grain Reconfigurable Array for High Throughput Multimedia Processing

Marco Lanuzza; Stefania Perri; Pasquale Corsonello

This paper presents a new coarse-grain reconfigurable array optimized for multimedia processing. The system has been designed to provide a dense support for arithmetic operations, wide internal data bandwidth and efficiently distributed memory resources. All these characteristics are combined into a cohesive structure that efficiently supports a block-level pipelined dataflow, which is particularly suitable for stream oriented applications. Moreover, the new reconfigurable architecture is highly flexible and easily scalable. Thanks to all these features, the proposed architecture can be drastically more speed- and area-efficient than a state of the art FPGA in executing multimedia oriented applications.

- Reconfigurable Architectures | Pp. 159-168

doi: 10.1007/978-3-540-73625-7_19

FPGA Design Methodology for a Wavelet-Based Scalable Video Decoder

Hendrik Eeckhaut; Harald Devos; Philippe Faes; Mark Christiaens; Dirk Stroobandt

Client-side diversification led the video-coding community to develop scalable video-codecs supporting efficient decoding at varying quality levels. This scalability has a lot of advantages but the corresponding decoding algorithm is complex and really stresses the system bandwidth as it replaces the block-based DCT-approach with frame-based wavelets. This has a tremendous impact on the hardware architecture. We present the implementation of the RESUME decoder using reconfigurable hardware designed through the use of state-of-the-art HW/SW-codesign techniques. These techniques were augmented with automatic loop transformations and regression testing. Our efforts resulted in a design capable of decoding more than 25 frames per second at lossless CIF resolution.

- Reconfigurable Architectures | Pp. 169-178

doi: 10.1007/978-3-540-73625-7_20

Evaluating Large System-on-Chip on Multi-FPGA Platform

Ari Kulmala; Erno Salminen; Timo D. Hämäläinen

This paper presents a configurable base architecture tailorable for different applications. It allows simple and rapid way to evaluate and prototype large Multi-Processor System-on-Chip architectures on multiple FPGAs with support to Globally Asynchronous Locally Synchronous scheme. It allows early hardware/software co-verification and optimization. The architecture abstracts the underlying hardware details from the processors so that knowledge about the exact locations of individual components are not required for communication. Implemented example architecture contains 58 IP blocks, including 35 Nios II soft processors. As a proof of concept, a MPEG-4 video encoder is run on the example architecture.

- Reconfigurable Architectures | Pp. 179-189