Catálogo de publicaciones - libros
Embedded Computer Systems: Architectures, Modeling, and Simulation: 7th International Workshop, SAMOS 2007, Samos, Greece, July 16-19, 2007. Proceedings
Stamatis Vassiliadis ; Mladen Bereković ; Timo D. Hämäläinen (eds.)
En conferencia: 7º International Workshop on Embedded Computer Systems (SAMOS) . Samos, Greece . July 16, 2007 - July 19, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Theory of Computation; Computer Hardware; Processor Architectures; Computer Communication Networks; System Performance and Evaluation; Computer System Implementation
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-73622-6
ISBN electrónico
978-3-540-73625-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Tabla de contenidos
An Automatically-Retargetable Time-Constraint-Driven Instruction Scheduler for Post-compiling Optimization of Embedded Code
José O. Carlomagno F.; Luiz F. P. Santos; Luiz C. V. dos Santos
Although SoC design space exploration requires retargetable tools and real-time constraint awareness, conventional compiler infrastructure barely provides both. This paper proposes a novel, automatically retargetable, time-constraint aware instruction scheduler to fulfill both needs. The tool is based upon a unified representation of instruction precedence and timing constraints. It relies on a formal model of the target processor, written in an architecture description language. Experimental results show that the technique not only handles time-constraint analysis efficiently, but also exploits them successfully to guide code optimization. To give proper evidence of retargetability, we present results for the processors MIPS, PowerPC and SPARC. We obtained speed-ups of 1.18 to 1.23 over pre-optimized code.
- Scheduling & Programming Models | Pp. 86-95
Improving TriMedia Cache Performance by Profile Guided Code Reordering
Norbert Esser; Renga Sundararajan; Joachim Trescher
There is an ever-increasing gap between memory and processor performance. As a result, exploiting the cache becomes increasingly important, especially for embedded systems where cache sizes are much smaller than that of general purpose processors. The fine-tuning of an application with respect to cache behavior is now largely dependent on the skill of the application programmer. Given the difficulty of predicting cache behavior, this is, even when great skill is applied, a cumbersome task. A wide range of approaches, in hardware as well as in software, can be used to relieve the programmer’s burden. On the hardware side, we can experiment, for example, with cache sizes, line sizes, replacement policies, and cache organization. On the software side, we can use various optimization techniques like software pipelining, branch prediction, and code reordering. The research described in this paper focussed on improving performance by using code reordering techniques.
This paper reports on the work that we have done to reduce the number of line-fetches in the instruction cache. We have extended the functionality of the linker in the TriMedia compiler chain, such that the number of fetches during program execution is reduced. By reordering the code, we ensure that hot code stays in the cache and the cache is not polluted with cold code. Because fewer fetches are needed we expect a performance increase. By analyzing and profiling code, we obtain execution statistics that can help us find better code-allocations.
- Scheduling & Programming Models | Pp. 96-106
A Streaming Machine Description and Programming Model
Paul Carpenter; David Rodenas; Xavier Martorell; Alex Ramirez; Eduard Ayguadé
In this paper we present the initial development of a streaming environment based on a programming model and machine description. The stream programming model consists of an extension to the C language and it’s translation towards a streaming machine. The extensions will be a set of OpenMP-like directives. We show how a serial application can be converted into a streaming parallel application using the proposed annotations. We also show how the machine description can be used to parametrize a cost-model simulator to predict the performance of the stream program. The cost model allows the compiler to determine the best task partitioning and scheduling for each architecture.
- Scheduling & Programming Models | Pp. 107-116
Mapping and Performance Evaluation for Heterogeneous MP-SoCs Via Packing
Bastian Ristau; Gerhard Fettweis
The computational demand of signal processing algorithms is rising continuously. Heterogeneous embedded multiprocessor systems-on-chips are one solution to tackle this demand. But to be able to take advantage of the benefits of these systems, new strategies are required how to map applications to such a system and how to evaluate the system’s performance at a very early design stage. We will present a static, analytical, bottom-up methodology for temporal and spatial mapping of applications to MP-SoCs based on packing. Furthermore we will demonstrate how the result can be used for performance evaluation and system improvement without the need for simulations.
- Multi-processor Architectures | Pp. 117-126
Strategies for Compiling TC to Novel Chip Multiprocessors
Thomas A. M. Bernard; Chris R. Jesshope; Peter M. W. Knijnenburg
Microthreaded C also called is a concurrent language based on the C language which allows the programmer to code concurrency-oriented applications for targeting chip multiprocessors. source code contains fine-grained concurrent control structures, where the concurrency is explicitly written via new keywords. This language is used as an interface for defining dynamic concurrency and as an intermediate language to capture concurrency from data-parallel languages such as Single-Assignment C, or as the target for parallelizing compilers for sequential languages such as C. This paper presents an overview of language, emphasizing the aspects of memory synchronization and concurrent control structures. In order to understand the properties and scopes of the language, we also present the outlines of the architectures after discussing the global concepts of the microthreading model. Finally we show the toolchain we are currently developing to support the model, focusing on compiler strategies.
- Multi-processor Architectures | Pp. 127-138
Image Quantisation on a Massively Parallel Embedded Processor
Jan Jacobs; Leroy van Engelen; Jan Kuper; Gerard J. M. Smit
Recent advances in embedded processing architectures allow for new powerful algorithms, which exploit the intrinsic parallelism present in image processing applications. This paper describes the results of the mapping process of stochastic image quantisation on a massively parallel processor. The problem can be modeled in a parallel way. Despite the fact that the implementation is IO bound, good speedups are achieved (16× compared to a standard image processing package running on a Pentium processor).
- Multi-processor Architectures | Pp. 139-148
Stream Image Processing on a Dual-Core Embedded System
Michael G. Benjamin; David Kaeli
Effective memory utilization is critical to reap the benefits of the multi-core processors emerging on embedded systems. In this paper we explore the use of a stream model to effectively utilize memory hierarchies. We target image processing algorithms running on the Analog Devices Blackfin BF561 fixed-point, dual-core DSP. Using optimized assembly to effectively use cores reduces runtime, but also underscores the need to mitigate the memory bottleneck. Like other embedded processors, the Blackfin BF561 has L2 SRAM available. Applying the stream model allows us to effectively make full use of both cores and the L2 SRAM. We achieve almost a 10X speedup in execution time compared to non-optimized C code.
- Multi-processor Architectures | Pp. 149-158
MORA: A New Coarse-Grain Reconfigurable Array for High Throughput Multimedia Processing
Marco Lanuzza; Stefania Perri; Pasquale Corsonello
This paper presents a new coarse-grain reconfigurable array optimized for multimedia processing. The system has been designed to provide a dense support for arithmetic operations, wide internal data bandwidth and efficiently distributed memory resources. All these characteristics are combined into a cohesive structure that efficiently supports a block-level pipelined dataflow, which is particularly suitable for stream oriented applications. Moreover, the new reconfigurable architecture is highly flexible and easily scalable. Thanks to all these features, the proposed architecture can be drastically more speed- and area-efficient than a state of the art FPGA in executing multimedia oriented applications.
- Reconfigurable Architectures | Pp. 159-168
FPGA Design Methodology for a Wavelet-Based Scalable Video Decoder
Hendrik Eeckhaut; Harald Devos; Philippe Faes; Mark Christiaens; Dirk Stroobandt
Client-side diversification led the video-coding community to develop scalable video-codecs supporting efficient decoding at varying quality levels. This scalability has a lot of advantages but the corresponding decoding algorithm is complex and really stresses the system bandwidth as it replaces the block-based DCT-approach with frame-based wavelets. This has a tremendous impact on the hardware architecture. We present the implementation of the RESUME decoder using reconfigurable hardware designed through the use of state-of-the-art HW/SW-codesign techniques. These techniques were augmented with automatic loop transformations and regression testing. Our efforts resulted in a design capable of decoding more than 25 frames per second at lossless CIF resolution.
- Reconfigurable Architectures | Pp. 169-178
Evaluating Large System-on-Chip on Multi-FPGA Platform
Ari Kulmala; Erno Salminen; Timo D. Hämäläinen
This paper presents a configurable base architecture tailorable for different applications. It allows simple and rapid way to evaluate and prototype large Multi-Processor System-on-Chip architectures on multiple FPGAs with support to Globally Asynchronous Locally Synchronous scheme. It allows early hardware/software co-verification and optimization. The architecture abstracts the underlying hardware details from the processors so that knowledge about the exact locations of individual components are not required for communication. Implemented example architecture contains 58 IP blocks, including 35 Nios II soft processors. As a proof of concept, a MPEG-4 video encoder is run on the example architecture.
- Reconfigurable Architectures | Pp. 179-189