Catálogo de publicaciones - libros

Compartir en
redes sociales


Parallel and Distributed Processing and Applications: 5th International Symposium, ISPA 2007 Niagara Falls, Canada, August 29-31, 2007 Proceedings

Ivan Stojmenovic ; Ruppa K. Thulasiram ; Laurence T. Yang ; Weijia Jia ; Minyi Guo ; Rodrigo Fernandes de Mello (eds.)

En conferencia: 5º International Symposium on Parallel and Distributed Processing and Applications (ISPA) . Niagara Falls, ON, Canada . August 28, 2007 - September 1, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Algorithm Analysis and Problem Complexity; Computer Communication Networks; Information Systems Applications (incl. Internet); System Performance and Evaluation; Software Engineering

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74741-3

ISBN electrónico

978-3-540-74742-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

A Network Performance Sensitivity Metric for Parallel Applications

Jeffrey J. Evans; Cynthia S. Hood

Excessive run time variability of parallel application codes on commodity clusters is a significant challenge. To gain insight into this problem our earlier work developed a tools to emulate parallel applications (PACE) by simulating computation and using the cluster’s interconnection network for communication, and further study parallel application run time effects (PARSE). This work expands our previous efforts by presenting a metric derived from PARSE test results conducted on several widely used parallel benchmarks and application code fragments. The metric suggests that a parallel application’s sensitivity to network performance variation can be quantified relative to its behavior in optimal network performance conditions. Ideas on how this metric can be useful to parallel application development, cluster system performance management and system administration are also presented.

- Networks | Pp. 920-931

The Influence of Interference Networks in QoS Parameters in a WLAN 802.11g Environment

Jasmine P. L. Araújo; Josiane C. Rodrigues; Simone G. C. Fraiha; Felipe M. Lamarão; Nandamudi L. Vijaykumar; Gervásio P. S. Cavalcante; Carlos R. L. Francês

This paper proposes a strategy to determine how much a given network can affect the QoS parameters of another, by interference. In order to achieve this, a measurement campaign was carried out in two stages: firstly with a single AP and later with two APs separated by a distance less than three meters, using the same channel. After the measurement, an analysis of the results and a set of inferences were made by using Bayesian Networks, whose inputs were the experimental data, i.e. QoS metrics such as: throughput, jitter, packet loss, PMOS and physical metrics like power and distance.

- Networks | Pp. 932-945

Instruction Selection for Subword Level Parallelism Optimizations for Application Specific Instruction Processors

Miao Wang; Guiming Wu; Zhiying Wang

Application Specific Instruction Processors (or, ASIPs) have the potential to meet the high-performance demands of multimedia applications, such as image processing, audio and video encoding, speech processing, and digital signal processing. To achieve lower cost and efficient energy for high performance embedded systems built by ASIPs, subword parallelism optimization will become an important alternative to accelerate multimedia applications. But one major problem is how to exploit subword parallelism for ASIPs with limited resources. This paper shows that loop transformations such as loop unrolling, variable expansion, etc., can be utilized to create opportunities for subword parallelism, and presents a novel approach to recognize and extract subword parallelism based on Cost Subgragh (or, CSG). This approach is evaluated on Transport Triggered Architecture (TTA), a customizable processor architecture that is particularly suitable for tailoring the hardware resources according to the requirements of the application. In our experiment, 63.58% of loops and 85.64% of instructions in these loops can exploit subword parallelism. The results indicate that significant available subword parallelism would be attained using our method.

- Software and Languages | Pp. 946-957

High Performance 3D Convolution for Protein Docking on IBM Blue Gene

Akira Nukada; Yuichiro Hourai; Akira Nishida; Yutaka Akiyama

We have developed a high performance 3D convolution library for Protein Docking on IBM Blue Gene. The algorithm is designed to exploit slight locality of memory access in 3D-FFT by making full use of a cache memory structure. The 1D-FFT used in the 3D convolution is optimized for PowerPC 440 FP2 processors. The number of SIMOMD instructions is minimized by simultaneous computation of two 1D-FFTs. The high performance 3D convolution library achieves up to 2.16 Gflops (38.6% of peak) per node. The total performance of a shape complementarity search is estimated at 7 Tflops with the 4-rack Blue Gene system (4096 nodes).

- Software and Languages | Pp. 958-969

KSEQ: A New Scalable Synchronous I/O Multiplexing Mechanism for Event-Driven Applications

Hongtao Xia; Weiping Sun; Jingli Zhou; Yunhua Huang; Jifeng Yu

The performance of event-driven network applications, such as Web servers and proxies, was influenced by the scalability and efficiency of synchronous I/O multiplexing mechanism. Research shows that event-based mechanism can ensure the scalability, and using kernel-user shared memory to evade system calls can reduce a lot of system overhead. But these two features can not be combined by any solution till now, because of synchronous problem. This paper attempts to design an event notification mechanism for event-driven network applications, which using kernel-user shared event queues (KSEQ) to achieve both good scalability and low system overhead. The KSEQ works something like double buffer, and both application and kernel can write the shared data structures without the help of synchronization system calls. Experiment shows that the Squid proxy server using this mechanism presents shorter response time than other mechanisms.

- Software and Languages | Pp. 970-981

A Synchronous Mode MPI Implementation on the Cell BE Architecture

Murali Krishna; Arun Kumar; Naresh Jayam; Ganapathy Senthilkumar; Pallav K. Baruah; Raghunath Sharma; Shakti Kapoor; Ashok Srinivasan

The Cell Broadband Engine shows much promise in high performance computing applications. The Cell is a heterogeneous multi-core processor, with the bulk of the computational work load meant to be borne by eight co-processors called SPEs. Each SPE operates on a distinct 256 KB local store, and all the SPEs also have access to a shared 512 MB to 2 GB main memory through DMA. The unconventional architecture of the SPEs, and in particular their small local store, creates some programming challenges. We have provided an implementation of core features of MPI for the Cell to help deal with this. This implementation views each SPE as a node for an MPI process, with the local store used as if it were a cache. In this paper, we describe synchronous mode communication in our implementation, using the rendezvous protocol, which makes MPI communication for long messages efficient. We further present experimental results on the Cell hardware, where it demonstrates good performance, such as throughput up to 6.01 GB/s and latency as low as 0.65 s on the pingpong test. This demonstrates that it is possible to efficiently implement MPI calls even on the simple SPE cores.

- Software and Languages | Pp. 982-991