Catálogo de publicaciones - libros

Compartir en
redes sociales


Parallel and Distributed Processing and Applications: 5th International Symposium, ISPA 2007 Niagara Falls, Canada, August 29-31, 2007 Proceedings

Ivan Stojmenovic ; Ruppa K. Thulasiram ; Laurence T. Yang ; Weijia Jia ; Minyi Guo ; Rodrigo Fernandes de Mello (eds.)

En conferencia: 5º International Symposium on Parallel and Distributed Processing and Applications (ISPA) . Niagara Falls, ON, Canada . August 28, 2007 - September 1, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Algorithm Analysis and Problem Complexity; Computer Communication Networks; Information Systems Applications (incl. Internet); System Performance and Evaluation; Software Engineering

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74741-3

ISBN electrónico

978-3-540-74742-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Comparing Direct-to-Cache Transfer Policies to TCP/IP and M-VIA During Receive Operations in MPI Environments

Farshad Khunjush; Nikitas J. Dimopoulos

The main contributors to message delivery latency in message passing environments are the copying operations needed to transfer and bind a received message to the consuming process/thread. To reduce this copying overhead, we introduce architectural extensions comprising a specialized network cache and instructions. In this work, we study the possible overhead and cache pollution introduced through the operating system and the communications stack as exemplified by Linux, TCP/IP and M-VIA. We introduce this overhead in our simulation environment and study its effects on our proposed extensions. Ultimately, we have been able to compare the performance achieved by an application running on a system incorporating our extensions with the performance of the same application running on a standard system. The results show that our proposed approach can improve the performance of MPI applications by 10% to 20%.

- Architectures and Systems | Pp. 208-222

Virtual Distro Dispatcher: A Costless Distributed Virtual Environment from Trashware

Flavio Bertini; D. Davide Lamanna; Roberto Baldoni

Obsolete hardware can be effectively reused through intelligent software optimization, which is possible only when source code is available. Virtual Distro Dispatcher (VDD) is a system that produces virtual machines on a central server and projects them on a number of costless physical terminals. VDD is the result of an extreme software optimisation based on virtualization and terminal servers. VDD creates and projects Linux distros that are completely customizable and different from each other. They are virtual desktop machines that can be used for testing or developing and are completely controllable directly from each terminal. Memory consumption has been strongly reduced without sacrificing performances. Test results are encouraging to proceed with the research towards clustering.

- Architectures and Systems | Pp. 223-234

A Parallel Infrastructure on Dynamic EPIC SMT and Its Speculation Optimization

Qingying Deng; Minxuan Zhang; Jiang Jiang

SMT(simultaneous multithreading) processors execute instructions from different threads in the same cycle, which has the unique ability to exploit ILP(instruction-level parallelism) and TLP(thread-level parallelism) simultaneously. EPIC(explicitly parallel instruction computing) emphasizes importance of the synergy between compiler and hardware. Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. Control and data speculations are effective ways to improve instruction level parallelism. In this paper, we present our efforts to design and implement a parallel environment, which includes an optimizing, portable parallel compiler OpenUH and SMT architecture EDSMT based on IA-64. Meanwhile, its speculation is also reexamined.

- Architectures and Systems | Pp. 235-244

An SRP Target Mode to Improve Read Performance of SRP-Based IB-SANs

Zhiying Jiang; Jin He; Jizhong Han; Xigui Wang; Yonghao Zhou; Xubin He

SCSI RDMA Protocol (SRP) is used to build high performance Storage Area Networks (SANs) over InfiniBand, or SRP-based IB-SANs for short. The I/O read performance is critical for many read dominant applications, such as multimedia, remote sensing, data backup, etc. However, if I/O accesses focus on a specific storage device of an IB-SAN, the local I/O performance of single device could become the bottleneck, leaving the network performance under utilized. In this paper, we propose an SRP target mode called Target Disk Cache Assisted (tDCA) mode, which explores the file read-ahead feature and page cache of Linux to tackle the performance gap between storage devices and network. Experimental results show that this strategy improves the read performance in terms of throughput which is increased significantly for both random read with good locality and sequential read.

- Architectures and Systems | Pp. 245-255

An FPGA Design to Achieve Fast and Accurate Results for Molecular Dynamics Simulations

Eunjung Cho; Anu G. Bourgeois; Feng Tan

A Molecular Dynamics (MD) system is defined by the position and momentum of particles and their interactions. The dynamics of a system can be evaluated by an N-body problem and the simulation is continued until the energy reaches equilibrium. Thus, solving the dynamics numerically and evaluating the interaction is computationally expensive even for a small number of particles in the system. We are focusing on long-ranged interactions, since the calculation time is O(N) for an N particle system. There are many existing algorithms aimed at reducing the calculation time of MD simulations. Multigrid (MG) method [1] reduces O(N) calculation time to O(N) time while still achieving reasonable accuracy. Another movement to achieve much faster calculation time is running MD simulation on special purpose processors and customized hardware with ASICs or FPGAs. In this paper, we design and implement an FPGA-based MD simulator with an efficient MG method.

- Architectures and Systems | Pp. 256-267

Performance and Complexity Analysis of Credit-Based End-to-End Flow Control in Network-on-Chip

Seongmin Noh; Daehyun Kim; Vu-Duc Ngo; Hae-Wook Choi

Network-on-Chip is an alternative paradigm to improve communication bandwidth compared to bus-based communication, and its performance degrades if there is no effective flow control method., Heterogeneous networks with very slow processing elements (PEs) especially need a flow control mechanism at the transport layer to prevent too much packet injection. In this paper, a credit-based end-to-end flow control (CB-EEFC) is implemented to control the network latency at high traffic loads. Simulation in mesh networks shows improved performance in latency and 0.5% up to 3% saturated throughput decrease with the CB-EEFC method. RTL gate level simulation shows that a network interface using CB-EEFC brings about a 31.4% increase in complexity compared to a network interface without CB-EEFC.

- Architectures and Systems | Pp. 268-277

An QoS Aware Mapping of Cores Onto NoC Architectures

Huy-Nam Nguyen; Vu-Duc Ngo; Younghwan Bae; Hanjin Cho; Hae-Wook Choi

Network-on-chip (NoC) is being proposed as a scalable and reusable communication platform for future SoC applications. The NoC, somewhat, resembles the parallel computer network. However, the NoC design highly requires the certain satisfaction of latency, power consumption, and area constraints. The latency of the network relates much to throughput and power consumption. Moreover, the IPs and the network are heterogeneous. Hence, a certain mapping of IPs onto a certain architecture produces a certain value of network latency as well as power consumption. The change of mapping scheme leads to a significant change of the values of these constraints. The fact that if we want to maximize the system’s throughput, the network latency also increases and if we minimize the network latency, the trade off is that the throughput will decrease. In this paper, we present an mapping scheme that does compromise between throughput maximization and latency minimization. This sub-optimal mapping is found using the spanning tree searching algorithm. The experiment architecture using here is Mesh based topology. We use NS2 to simulate and calculate the system throughput and system power consumption is calculated using Orion model.

- Architectures and Systems | Pp. 278-288

Latency Optimization for NoC Design of H.264 Decoder Based on Self-similar Traffic Modeling

Vu-Duc Ngo; June-Young Chang; Younghwan Bae; Hanjin Cho; Hae-Wook Choi

In this article, we present analytical method to evaluate the NoC design of H.264 decoder’s latency based on the self-similar traffic models of all 12 IPs. The traffic models are generated by using the superposition of four 2-state Modulated Markov Poisson Process (MMPP) and the real traced data transaction between IPs. The optimization engine is utilized to automatically allocate IPs on the desired routers to achieve the minimal latency.

- Architectures and Systems | Pp. 289-302

Hardware Implementation of Common Protocol Interface for a Network-Based Multiprocessor

Arata Shinozaki; Mitsunori Kubo; Takayuki Nakatomi; Baoliu Ye; Minyi Guo

Our research project “UMP-PJ” has suggested the UMP Network Architecture for the next-generation computing infrastructure, in which each network node is coordinated each other. We have conducted the research on the basic architecture of the UMP Network, and shown its usefulness in the last papers. We defined a Processing Element (PE) comprised of PE Wrapper and PE Core. PE Wrapper is a common network interface of network node for UMP Network, and PE Core is an appreciation-specific function module. This paper evaluates the hardware implementation of PE. Especially, PE Wrapper is the key to satisfy the scalability and flexibility of UMP Network Architecture. The experimental model processed JPEG encoding application successfully with a PE implemented with an FPGA board on PC in conjunction with other software PEs. Experimental results demonstrate that no system bottleneck and redundant processing are caused by PE Wrapper implemented with hardware. This implies UMP Network Architecture is suitable for hardware implementation.

- Architectures and Systems | Pp. 303-313

A Distributed Hebb Neural Network for Network Anomaly Detection

Daxin Tian; Yanheng Liu; Bin Li

One of the most challenging problems in anomaly detection is to develop scalable algorithms which are capable of dealing with large audit data, network traffic data, or alter data. In this paper a distributed neural network based on Hebb rule is presented to improve the speed and scalability of inductive learning. The speed is improved by randomly splitting a large data set into disjoint subsets and each subset data is presented to an independent neural network, these networks can be trained in distributed and each one in parallel. The analysis of completeness and risk bounds of competitive Hebb learning proof that the distributed Hebb neural network can avoid the accuracy being degraded as compared to running a single algorithm with the entire data. The experiments are performed on the KDD’99 Data set, which is a standard intrusion detection benchmark. Comparisons with other approaches on the same benchmark demonstrate the effectiveness and applicability of the proposed method.

- Datamining and Databases | Pp. 314-325