Catálogo de publicaciones - libros

Compartir en
redes sociales


High Performance Computing for Computational Science: VECPAR 2006: 7th International Conference, Rio de Janeiro, Brazil, June 10-13, 2006, Revised Selected and Invited Papers

Michel Daydé ; José M. L. M. Palma ; Álvaro L. G. A. Coutinho ; Esther Pacitti ; João Correia Lopes (eds.)

En conferencia: 7º International Conference on High Performance Computing for Computational Science (VECPAR) . Rio de Janeiro, Brazil . June 10, 2006 - June 13, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer System Implementation; Software Engineering/Programming and Operating Systems; Theory of Computation; Computer Communication Networks; Mathematics of Computing

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-71350-0

ISBN electrónico

978-3-540-71351-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Combinatorial Scientific Computing: The Enabling Power of Discrete Algorithms in Computational Science

Bruce Hendrickson; Alex Pothen

Combinatorial algorithms have long played a crucial, albeit under-recognized role in scientific computing. This impact ranges well beyond the familiar applications of graph algorithms in sparse matrices to include mesh generation, optimization, computational biology and chemistry, data analysis and parallelization. Trends in science and in computing suggest strongly that the importance of discrete algorithms in computational science will continue to grow. This paper reviews some of these many past successes and highlights emerging areas of promise and opportunity.

- Chapter 3: Numerical Methods | Pp. 260-280

Improving the Numerical Simulation of an Airflow Problem with the BlockCGSI Algorithm

C. Balsa; M. Braza; M. Daydé; J. Palma; D. Ruiz

Partial spectral information associated with the smallest ei- genvalues can be used to improve the solution of successive linear systems of equations, namely in the simulation of time-dependent partial differential equations, where at each time step there are several systems with the same spectral properties to be solved. We propose to perform a partial spectral decomposition with the BlockCGSI algorithm in the first time step, and exploit this information to improve the convergence of the Conjugate Gradient algorithm in the solution of the following linear systems. We describe in summary the BlockCGSI algorithm, that is a combination of the block Conjugate Gradient (blockCG) with the Inverse Subspace Iteration. Then, we validate the accelerating strategy in the simulation of the flow around an airplane wing, where the Conjugate Gradient is accelerated through the deflation of the starting residual.

- Chapter 3: Numerical Methods | Pp. 281-291

EdgePack: A Parallel Vertex and Node Reordering Package for Optimizing Edge-Based Computations in Unstructured Grids

Marcos Martins; Renato Elias; Alvaro Coutinho

A new and simple method is proposed to choose the best data configuration in terms of processing phase time according to previous probing of edge-based matrix-vector products for codes using iterative solvers in unstructured grid problems. This method is realized as a suite of routines named EdgePack, acting during both pre-solution and solution phase, based on data locality optimization techniques and variations of matrix-vector product algorithm. Results have been demonstrating the great flexibility and simplicity of this method, which is suitable for distributed memory platforms in which different data configurations can coexist.

- Chapter 3: Numerical Methods | Pp. 292-304

Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment

Satoshi Ohshima; Kenji Kise; Takahiro Katagiri; Toshitsugu Yuba

GPUs for numerical computations are becoming an attractive alternative in research. In this paper, we propose a new parallel processing environment for matrix multiplications by using both CPUs and GPUs. The execution time of matrix multiplications can be decreased to 40.1% by our method, compared with using the fastest of either CPU only case or GPU only case. Our method performs well when matrix sizes are large.

- Chapter 3: Numerical Methods | Pp. 305-318

Robust Two-Level Lower-Order Preconditioners for a Higher-Order Stokes Discretization with Highly Discontinuous Viscosities

Duilio Conceição; Paulo Goldfeld; Marcus Sarkis

The main goal of this paper is to present new robust and scalable preconditioned conjugate gradient algorithms for solving Stokes equations with large viscosity jumps across subregion interfaces and discretized on non-structured meshes. The proposed algorithms do not require the construction of a coarse mesh and avoid expensive communications between coarse and fine levels. The algorithms belong to the family of preconditioners based on non-overlapping decomposition of subregions known as balancing domain decomposition methods. The local problems employ two-level element-wise/subdomain-wise direct factorizations to reduce the size and the cost of the local Dirichlet and Neumann Stokes solvers. The Stokes coarse problem is based on subdomain constant pressures and on connected subdomain interface flux functions and rigid body motions. This guarantees scalability and solvability of the local Neumann problems. Estimates on the condition numbers and numerical experiments based on a parallel implementation for unstructured meshes are also discussed.

- Chapter 3: Numerical Methods | Pp. 319-333

The Impact of Parallel Programming Models on the Performance of Iterative Linear Solvers for Finite Element Applications

Kengo Nakajima

Parallel iterative linear solvers for unstructured grids in FEM applications, originally developed for the Earth Simulator (ES), are ported to various types of parallel computer. The performance of flat MPI and hybrid parallel programming models is compared for the ES, Hitachi SR8000, IBM SP-3 and IBM p5-model 595 supercomputers. The effect of coloring and of different storage methods for coefficient matrices are evaluated in various types of application. Performance for more than 10 processors is estimated using measured data for up to 10 processors.

- Chapter 3: Numerical Methods | Pp. 334-348

Efficient Parallel Algorithm for Constructing a Unit Triangular Matrix with Prescribed Singular Values

Georgina Flores-Becerra; Victor M. Garcia; Antonio M. Vidal

The problem tackled in this paper is the parallel construction of a unit triangular matrix with prescribed singular values, when these fulfill Weyl’s conditions [9] ; this is a particular case of the Inverse Singular Value Problem. A sequential algorithm for this problem was proposed in [10] by Kosowsky and Smoktunowicz. In this paper parallel versions of this algorithm will be described, both for shared memory and distributed memory architectures. The proposed parallel implementation is better suited for the shared memory paradigm; this is confirmed by the numerical experiments; the shared memory version, reaches an efficiency over 90%, and reduces substantially the execution times compared with the sequential algorithm.

- Chapter 3: Numerical Methods | Pp. 349-362

A Rewriting System for the Vectorization of Signal Transforms

Franz Franchetti; Yevgen Voronenko; Markus Püschel

We present a rewriting system that automatically vectorizes signal transform algorithms at a high level of abstraction. The input to the system is a transform algorithm given as a formula in the well-known Kronecker product formalism. The output is a “vectorized” formula, which means it consists exclusively of constructs that can be directly mapped into short vector code. This approach obviates compiler vectorization, which is known to be limited in this domain. We included the formula vectorization into the Spiral program generator for signal transforms, which enables us to generate vectorized code and further optimize for the memory hierarchy through search over alternative algorithms. Benchmarks for the discrete Fourier transform (DFT) show that our generated floating-point code is competitive with and that our fixed-point code clearly outperforms the best available libraries.

- Chapter 3: Numerical Methods | Pp. 363-377

High Order Fourier-Spectral Solutions to Self Adjoint Elliptic Equations

Moshe Israeli; Alexander Sherman

We develop a High Order Fourier solver for nonseparable, selfadjoint elliptic equations with variable (diffusion) coefficients. The solution of an auxiliary constant coefficient equation, serves in a transformation of the dependent variable. There results a ”modified Helmholtz” elliptic equation with almost constant coefficients. The small deviations from constancy are treated as correction terms. We developed a highly accurate, fast, Fourier-spectral algorithm to solve such constant coefficient equations. A small number of correction steps is required in order to achieve very high accuracy. This is achieved by optimization of the coefficients in the auxiliary equation. For given coefficients the approximation error becomes smaller as the domain decreases. A highly parallelizable hierarchical procedure allows a decomposition into smaller sub-domains where the solution is efficiently computed. This step is followed by hierarchical matching to reconstruct the global solution. Numerical experiments illustrate the high accuracy of the approach even at coarse resolutions.

- Chapter 3: Numerical Methods | Pp. 378-390

Multiresolution Simulations Using Particles

Michael Bergdorf; Petros Koumoutsakos

We present novel multiresolution particle methods with extended dynamic adaptivity in areas where increased resolution is required. In the framework of smooth particle methods we present two adaptive approaches: one based on globally adaptive mappings and one employing a wavelet-based multiresolution analysis to guide the allocation of computational elements. Preliminary results are presented from the application of these methods to problems involving the development of sharp vorticity gradients. The present particle methods are employed in large scale parallel computer architectures demonstrating a high degree of parallelization and enabling state of the art large scale simulations of continuum systems using particles.

- Chapter 3: Numerical Methods | Pp. 391-402