Catálogo de publicaciones - libros

Compartir en
redes sociales

Algorithms and Architectures for Parallel Processing: 7th International Conference, ICA3PP 2007, Hangzhou, China, June 11-14, 2007. Proceedings

Hai Jin ; Omer F. Rana ; Yi Pan ; Viktor K. Prasanna (eds.)

En conferencia: 7º International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP) . Hangzhou, China . June 11, 2007 - June 14, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Software Engineering/Programming and Operating Systems; Computer Systems Organization and Communication Networks; Computation by Abstract Devices; Algorithm Analysis and Problem Complexity; Simulation and Modeling

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-72904-4

ISBN electrónico

978-3-540-72905-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-72905-1_10

An Implementation of Parallel Eigenvalue Computation Using Dual-Level Hybrid Parallelism

Yonghua Zhao; Xuebin Chi; Qiang Cheng

This paper describes a hybrid two-level parallel method with MPI/OpenMP for computing the eigenvalues of dense symmetric matrices on cluster of SMP’s environments. The eigenvalue computation is Based on both the Householder tridiagonalization method and a divide-and-conquer algorithm of tridiagonal eigenproblem. In hybrid parallel design, We take a coarse-grain approach to OpenMP shared-memory parallelization, which keeps BLAS-3 operations in tridiagonalization. Moreover, dynamic work sharing is used in the divide-and-conquer algorithm of tridiagonal eigenproblem. So the amount of synchronization has also been reduced, and these could have an effect on the load balance. In addition, we analyze the communication overhead between hybrid MPI/ OpenMP and pure MPI. An experimental analysis on the Deepcomp6800 shows the hybrid algorithm performs good scalability.

- Track 1: Parallel Algorithms | Pp. 107-119

doi: 10.1007/978-3-540-72905-1_11

An Improved Algorithm for Alhusaini’s Algorithm in Heterogeneous Distributed Systems

Jianwei Liao; Jianqiao Yu

In most of mapping Algorithms for application in HDC, the Alhusaini’s method is one of the most important Algorithms. However, we find there are some weaknesses in Alhusaini’s method though the experiments and analysis. So, we propose a two-phase algorithm called 2-phases dynamic resource co-allocation algorithm (2PDRCA) based on Alhusaini’s method. The first phase only generates the data that will be used in the second phase. The second phase will selected a set of independent tasks and allocate according to the weight of each task in our method. The simulation results show that the method is effective, and solves the problem such as Low efficiency of Alhusaini’s method in communication intension application.

- Track 1: Parallel Algorithms | Pp. 120-130

doi: 10.1007/978-3-540-72905-1_12

Fuzzy-Grey Prediction Based Dynamic Failure Detector for Distributed Systems

Dong Tian; Shuyu Chen; Taiping Mao

Fuzzy logic and grey theory, combined with adaptive heartbeat mechanism, are integrated to implement an adaptive failure detector for distributed systems. A GM(1,1) unified-dimensional new message model, which only needs a small volume of sample data, is used to predict heartbeat arrival time dynamically. Since prediction error is inevitable, a two-input (residual ratio and message loss rate), one-output (compensation value) fuzzy controller is designed to learn how to compensate for the output from the grey model, and the roughly determined fuzzy rule base is tuned by a reward-punishment learning principle. Experimental results show the availability and validity of the failure detector in detail.

- Track 2: Parallel Architecture | Pp. 131-141

doi: 10.1007/978-3-540-72905-1_13

A Two-Level Directory Organization Solution for CC-NUMA Systems

Guoteng Pan; Qiang Dou; Lunguo Xie

Currently, directory-based cache coherence protocols are widely adopted in DSM systems. However, with the scaling of system size, directory-based protocols are also confronted with the problem of scalability. With the analysis of factors that affect the scalability of directory protocols, we propose a two-level directory organization solution based on directory cache in this paper. Simulation result shows that this directory organization can efficiently reduce storage space occupied by directory information to enable good scalability for the implementation of the protocol, with the performance of the system being considered.

- Track 2: Parallel Architecture | Pp. 142-152

doi: 10.1007/978-3-540-72905-1_14

A Framework of Software Component Adaptation

Xiong Xie; Weishi Zhang

Software component adaptation is a difficult problem to be solved in component-based software development. In this paper, we focus on a framework of component adaptation in which several adaptations are involved. The framework is described as a finite automaton which has only one initial state and only one final state. Using formal and informal methods we describe the precondition, the post-condition and the process of different component adaptation which are involved in the whole adaptation process. There may be several mismatches between the component and the requirement of application. For executing adaptation successfully the system involves a plan which can save all adaptation types with order. At last future work and limitation of the framework are discussed.

- Track 2: Parallel Architecture | Pp. 153-164

doi: 10.1007/978-3-540-72905-1_15

A Parallel Infrastructure on Dynamic EPIC SMT

Qingying Deng; Minxuan Zhang; Jiang Jiang

There are only three real “dimensions” to processor performance increases beyond Moore’s law: clock frequency, superscalar instruction issue, and multiprocessing. The first two have been pushed to their logical limits and we must focus on multiprocessing. SMT (simultaneous multithreading) [2] and CMP(chip multiprocessing) [1] are two architectural approaches to exploit thread-level parallelism using available on-chip resources. SMT processors execute instructions from different threads in the same cycle, which has the unique ability to exploit ILP(instruction-level parallelism) and TLP(thread-level parallelism) simultaneously. EPIC(explicitly parallel instruction computing) emphasizes importance of the synergy between compiler and hardware. In this paper, we present our efforts to design and implement a parallel environment, which includes an optimizing, portable parallel compiler OpenUH and SMT architecture EDSMT based on IA-64. The performance is evaluated using the NAS parallel benchmarks.

- Track 2: Parallel Architecture | Pp. 165-176

doi: 10.1007/978-3-540-72905-1_16

The Thread Migration Mechanism of DSM-PEPE

Federico Meza; Cristian Ruz

In this paper we present the thread migration mechanism of DSM-PEPE, a multithreaded distributed shared memory system. DSM systems like DSM-PEPE provide a parallel environment to harness the available computing power of computer networks. DSM systems offer a virtual shared memory space on top of a distributed-memory multicomputer, featuring the scalability and low cost of a multicomputer, and the ease of programming of a shared-memory multiprocessor.

DSM systems rely on data migration to make data available to running threads. The thread migration mechanism of DSM-PEPE was designed as an alternative to this data migration paradigm. Threads are allowed to migrate from one node to another, as needed by the computation. We show by experimentation the feasibility of the thread migration mechanism of DSM-PEPE as an alternative to improve application perfomance by enhancing spatial locality.

- Track 2: Parallel Architecture | Pp. 177-187

doi: 10.1007/978-3-540-72905-1_17

EH*RS: A High-Availability Scalable Distributed Data Structure

Xueping Ren; Xianghua Xu

EH*RS is a new high-availability Scalable Distributed Data Structure (SDDS). The file structure and the search performance of EH*RS are basically these of EH*. It gets high availability based on record group and Reed-Salomon erasure correcting coding. EH*RS remains all data available despite the unavailability of any ≥1servers by storing the additional information: the parity information. The value of k transparently grows with the file, to prevent the reliability decline. The storage overhead for the high-availability is small. The example shows that EH*RS file performs as expected. Finally, the scheme of EH*RS provides new perspectives to data-intensive applications (DBMSs), including the emerging ones of grids and of P2P computing.

- Track 2: Parallel Architecture | Pp. 188-197

doi: 10.1007/978-3-540-72905-1_18

Optimizing Stream Organization to Improve the Performance of Scientific Computing Applications on the Stream Processor

Ying Zhang; Gen Li; Xuejun Yang; Kun Zeng

It is very important to organize streams well to make stream programs take advantage of the parallel computing and memory system of the stream processor effectively, especially for scientific stream programs. In this paper, after analyzing typical scientific programs, we present and characterize two methods to optimize the stream organization: stream reusing and stream transpose. Several representative scientific stream programs with and without our optimization are performed on a stream typical processor simulator. Simulation results show that these methods can improve scientific stream program performance greatly.

- Track 2: Parallel Architecture | Pp. 198-209

doi: 10.1007/978-3-540-72905-1_19

A Parallel Architecture for Motion Estimation and DCT Computation in MPEG-2 Encoder

Jian Huang; Hao Li

This paper presents a parallel architecture that can simultaneously perform block-matching motion estimation (ME) and discrete cosine transform (DCT). Because DCT and ME are both processed block by block, it is preferable to put them in one module for resource sharing. Simulation results performed using Simulink demonstrate that the parallel fashioned architecture improves the performance in terms of running time by 18.6% compared to the conventional sequential fashioned architecture.

- Track 2: Parallel Architecture | Pp. 210-221