Catálogo de publicaciones - libros
Algorithms and Architectures for Parallel Processing: 7th International Conference, ICA3PP 2007, Hangzhou, China, June 11-14, 2007. Proceedings
Hai Jin ; Omer F. Rana ; Yi Pan ; Viktor K. Prasanna (eds.)
En conferencia: 7º International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP) . Hangzhou, China . June 11, 2007 - June 14, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Computer System Implementation; Software Engineering/Programming and Operating Systems; Computer Systems Organization and Communication Networks; Computation by Abstract Devices; Algorithm Analysis and Problem Complexity; Simulation and Modeling
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-72904-4
ISBN electrónico
978-3-540-72905-1
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
An Implementation of Parallel Eigenvalue Computation Using Dual-Level Hybrid Parallelism
Yonghua Zhao; Xuebin Chi; Qiang Cheng
This paper describes a hybrid two-level parallel method with MPI/OpenMP for computing the eigenvalues of dense symmetric matrices on cluster of SMP’s environments. The eigenvalue computation is Based on both the Householder tridiagonalization method and a divide-and-conquer algorithm of tridiagonal eigenproblem. In hybrid parallel design, We take a coarse-grain approach to OpenMP shared-memory parallelization, which keeps BLAS-3 operations in tridiagonalization. Moreover, dynamic work sharing is used in the divide-and-conquer algorithm of tridiagonal eigenproblem. So the amount of synchronization has also been reduced, and these could have an effect on the load balance. In addition, we analyze the communication overhead between hybrid MPI/ OpenMP and pure MPI. An experimental analysis on the Deepcomp6800 shows the hybrid algorithm performs good scalability.
- Track 1: Parallel Algorithms | Pp. 107-119
An Improved Algorithm for Alhusaini’s Algorithm in Heterogeneous Distributed Systems
Jianwei Liao; Jianqiao Yu
In most of mapping Algorithms for application in HDC, the Alhusaini’s method is one of the most important Algorithms. However, we find there are some weaknesses in Alhusaini’s method though the experiments and analysis. So, we propose a two-phase algorithm called 2-phases dynamic resource co-allocation algorithm (2PDRCA) based on Alhusaini’s method. The first phase only generates the data that will be used in the second phase. The second phase will selected a set of independent tasks and allocate according to the weight of each task in our method. The simulation results show that the method is effective, and solves the problem such as Low efficiency of Alhusaini’s method in communication intension application.
- Track 1: Parallel Algorithms | Pp. 120-130
Fuzzy-Grey Prediction Based Dynamic Failure Detector for Distributed Systems
Dong Tian; Shuyu Chen; Taiping Mao
Fuzzy logic and grey theory, combined with adaptive heartbeat mechanism, are integrated to implement an adaptive failure detector for distributed systems. A GM(1,1) unified-dimensional new message model, which only needs a small volume of sample data, is used to predict heartbeat arrival time dynamically. Since prediction error is inevitable, a two-input (residual ratio and message loss rate), one-output (compensation value) fuzzy controller is designed to learn how to compensate for the output from the grey model, and the roughly determined fuzzy rule base is tuned by a reward-punishment learning principle. Experimental results show the availability and validity of the failure detector in detail.
- Track 2: Parallel Architecture | Pp. 131-141
A Two-Level Directory Organization Solution for CC-NUMA Systems
Guoteng Pan; Qiang Dou; Lunguo Xie
Currently, directory-based cache coherence protocols are widely adopted in DSM systems. However, with the scaling of system size, directory-based protocols are also confronted with the problem of scalability. With the analysis of factors that affect the scalability of directory protocols, we propose a two-level directory organization solution based on directory cache in this paper. Simulation result shows that this directory organization can efficiently reduce storage space occupied by directory information to enable good scalability for the implementation of the protocol, with the performance of the system being considered.
- Track 2: Parallel Architecture | Pp. 142-152
A Framework of Software Component Adaptation
Xiong Xie; Weishi Zhang
Software component adaptation is a difficult problem to be solved in component-based software development. In this paper, we focus on a framework of component adaptation in which several adaptations are involved. The framework is described as a finite automaton which has only one initial state and only one final state. Using formal and informal methods we describe the precondition, the post-condition and the process of different component adaptation which are involved in the whole adaptation process. There may be several mismatches between the component and the requirement of application. For executing adaptation successfully the system involves a plan which can save all adaptation types with order. At last future work and limitation of the framework are discussed.
- Track 2: Parallel Architecture | Pp. 153-164
A Parallel Infrastructure on Dynamic EPIC SMT
Qingying Deng; Minxuan Zhang; Jiang Jiang
There are only three real “dimensions” to processor performance increases beyond Moore’s law: clock frequency, superscalar instruction issue, and multiprocessing. The first two have been pushed to their logical limits and we must focus on multiprocessing. SMT (simultaneous multithreading) [2] and CMP(chip multiprocessing) [1] are two architectural approaches to exploit thread-level parallelism using available on-chip resources. SMT processors execute instructions from different threads in the same cycle, which has the unique ability to exploit ILP(instruction-level parallelism) and TLP(thread-level parallelism) simultaneously. EPIC(explicitly parallel instruction computing) emphasizes importance of the synergy between compiler and hardware. In this paper, we present our efforts to design and implement a parallel environment, which includes an optimizing, portable parallel compiler OpenUH and SMT architecture EDSMT based on IA-64. The performance is evaluated using the NAS parallel benchmarks.
- Track 2: Parallel Architecture | Pp. 165-176
The Thread Migration Mechanism of DSM-PEPE
Federico Meza; Cristian Ruz
In this paper we present the thread migration mechanism of DSM-PEPE, a multithreaded distributed shared memory system. DSM systems like DSM-PEPE provide a parallel environment to harness the available computing power of computer networks. DSM systems offer a virtual shared memory space on top of a distributed-memory multicomputer, featuring the scalability and low cost of a multicomputer, and the ease of programming of a shared-memory multiprocessor.
DSM systems rely on data migration to make data available to running threads. The thread migration mechanism of DSM-PEPE was designed as an alternative to this data migration paradigm. Threads are allowed to migrate from one node to another, as needed by the computation. We show by experimentation the feasibility of the thread migration mechanism of DSM-PEPE as an alternative to improve application perfomance by enhancing spatial locality.
- Track 2: Parallel Architecture | Pp. 177-187
EH*RS: A High-Availability Scalable Distributed Data Structure
Xueping Ren; Xianghua Xu
EH*RS is a new high-availability Scalable Distributed Data Structure (SDDS). The file structure and the search performance of EH*RS are basically these of EH*. It gets high availability based on record group and Reed-Salomon erasure correcting coding. EH*RS remains all data available despite the unavailability of any ≥1servers by storing the additional information: the parity information. The value of k transparently grows with the file, to prevent the reliability decline. The storage overhead for the high-availability is small. The example shows that EH*RS file performs as expected. Finally, the scheme of EH*RS provides new perspectives to data-intensive applications (DBMSs), including the emerging ones of grids and of P2P computing.
- Track 2: Parallel Architecture | Pp. 188-197
Optimizing Stream Organization to Improve the Performance of Scientific Computing Applications on the Stream Processor
Ying Zhang; Gen Li; Xuejun Yang; Kun Zeng
It is very important to organize streams well to make stream programs take advantage of the parallel computing and memory system of the stream processor effectively, especially for scientific stream programs. In this paper, after analyzing typical scientific programs, we present and characterize two methods to optimize the stream organization: stream reusing and stream transpose. Several representative scientific stream programs with and without our optimization are performed on a stream typical processor simulator. Simulation results show that these methods can improve scientific stream program performance greatly.
- Track 2: Parallel Architecture | Pp. 198-209
A Parallel Architecture for Motion Estimation and DCT Computation in MPEG-2 Encoder
Jian Huang; Hao Li
This paper presents a parallel architecture that can simultaneously perform block-matching motion estimation (ME) and discrete cosine transform (DCT). Because DCT and ME are both processed block by block, it is preferable to put them in one module for resource sharing. Simulation results performed using Simulink demonstrate that the parallel fashioned architecture improves the performance in terms of running time by 18.6% compared to the conventional sequential fashioned architecture.
- Track 2: Parallel Architecture | Pp. 210-221