Catálogo de publicaciones - libros
High Performance Computing: HiPC 2007: 14th International Conference, Goa, India, December 18-21, 2007. Proceedings
Srinivas Aluru ; Manish Parashar ; Ramamurthy Badrinath ; Viktor K. Prasanna (eds.)
En conferencia: 14º International Conference on High-Performance Computing (HiPC) . Goa, India . December 18, 2007 - December 21, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Processor Architectures; Software Engineering/Programming and Operating Systems; Computer Systems Organization and Communication Networks; Algorithm Analysis and Problem Complexity; Computation by Abstract Devices; Mathematics of Computing
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-77219-4
ISBN electrónico
978-3-540-77220-0
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Cobertura temática
Tabla de contenidos
The Future Is Parallel But It May Not Be Easy
Michael J. Flynn
Processor performance scaling by improving clock frequency has now hit power limits. The new emphasis on multi core architectures comes about from the failure of frequency scaling not because of breakthroughs in parallel programming or architecture. Progress in automatic compilation of serial programs into multi tasked ones has been slow. A look at parallel projects of the past illustrates problems in performance and programmability. Solving these problems requires both an understanding of underlying issues such as parallelizing control structures and dealing with the memory bottleneck. For many applications performance comes at the price of programmability and reliability comes at the price of performance.
- Keynote Addresses (Abstracts) | Pp. 1-1
Petaflop/s, Seriously
David Keyes
Sustained floating-point rates on real applications, as tracked by the Gordon Bell Prize, have increased by over five orders of magnitude from 1988, when 1 Gigaflop/s was reported on a structural simulation, to 2006, when 200 Teraflop/s were reported on a molecular dynamics simulation. Various versions of Moore’s Law over the same interval provide only two to three orders of magnitude of improvement for an individual processor; the remaining factor comes from concurrency, which is of order 100,000 for the BlueGene/L computer, the platform of choice for the majority of recent Bell Prize finalists. As the semiconductor industry begins to slip relative to its own roadmap for silicon-based logic and memory, concurrency will play an increasing role in attaining the next order of magnitude, to arrive at the long-awaited milepost of 1 Petaflop/s sustained on a practical application, which should occur around 2009. Simulations based on Eulerian formulations of partial differential equations can be among the first applications to take advantage of petascale capabilities, but not the way most are presently being pursued. Only weak scaling can get around the fundamental limitation expressed in Amdahl’s Law and only optimal implicit formulations can get around another limitation on scaling that is an immediate consequence of Courant-Friedrichs-Lewy stability theory under weak scaling of a PDE. Many PDE-based applications and other lattice-based applications with petascale roadmaps, such as quantum chromodynamics, will likely be forced to adopt optimal implicit solvers. However, even this narrow path to petascale simulation is made treacherous by the imperative of dynamic adaptivity, which drives us to consider algorithms and queueing policies that are less synchronous than those in common use today. Drawing on the SCaLeS report (2003-04), the latest ITRS roadmap, some back-of-the-envelope estimates, and numerical experiences with PDE-based codes on recently available platforms, we will attempt to project the pathway to Petaflop/s for representative applications.
- Keynote Addresses (Abstracts) | Pp. 2-3
High Performance Data Mining - Application for Discovery of Patterns in the Global Climate System
Vipin Kumar
Advances in technology and high-throughput experiment techniques have resulted in the availability of large data sets in commercial enterprises and in a wide variety of scientific and engineering disciplines. Data in terabytes range are not uncommon today and are expected to reach petabytes in the near future for many application domains in science, engineering, business, bioinformatics, and medicine. This has created an unprecedented opportunity to develop automated data-driven techniques of extracting useful knowledge. Data mining, an important step in this process of knowledge discovery, consists of methods that discover interesting, non-trivial, and useful patterns hidden in the data. This talk will provide an overview of a number of data mining research in our group for understanding patterns in global climate system and computational challenges in addressing them.
- Keynote Addresses (Abstracts) | Pp. 4-4
The Transformation Hierarchy in the Era of Multi-core
Yale Patt
The transformation hierarchy is the name I have given to the mechanism that converts problems stated in natural language (English, Spanish, Hindi, Japanese, etc.) to the electronic circuits of the computer that actually does the work of producing a solution. The problem is first transformed from a natural language description into an algorithm, and then to a program in some mechanical language, then compiled to the ISA of the particular processor, which is implemented in a microarchitecture, built out of circuits. At each step of the transformation hierarchy, there are choices. These choices enable one to optimize the process to accomodate some optimization criterion. Usually, that criterion is microprocessor performance. Up to now, optimizations have been done mostly within each of the layers, with artifical barriers in place between the layers. It has not been the case (with a few exceptions) that knowledge at one layer has been leveraged to impact optimization of other layers. I submit, that with the current growth rate of semiconductor technology, this luxury of operating within a transformation layer will no longer be the common case. This growth rate (now more than a billion trnasistors on a chip is possible) has ushered in the era of the chip multiprocessor. That is, we are entering Phase II of Microprocessor Performance Improvement, where improvements will come from breaking the barriers that separate the transformation layers. In this talk, I will suggest some of the ways in which this will be done.
- Keynote Addresses (Abstracts) | Pp. 5-5
Web Search: Bridging Information Retrieval and Microeconomic Modeling
Prabhakar Raghavan
Web search has come to dominate our consciousness as a convenience we take for granted, as a medium for connecting advertisers and buyers, and as a fast-growing revenue source for the companies that provide this service. Following a brief overview of the state of the art and how we got there, this talk covers a spectrum of technical challenges arising in web search- ranging from spam detection to auction mechanisms.
- Keynote Addresses (Abstracts) | Pp. 6-6
Distributed Ranked Search
Vijay Gopalakrishnan; Ruggero Morselli; Bobby Bhattacharjee; Pete Keleher; Aravind Srinivasan
P2P deployments are a natural infrastructure for building distributed search networks. Proposed systems support locating and retrieving all results, but lack the information necessary to rank them. Users, however, are primarily interested in the most relevant results, not necessarily all possible results.
Using random sampling, we extend a class of well-known information retrieval ranking algorithms such that they can be applied in this decentralized setting. We analyze the overhead of our approach, and quantify how our system scales with increasing number of documents, system size, document to node mapping (uniform versus non-uniform), and types of queries (rare versus popular terms). Our analysis and simulations show that a) these extensions are efficient, and scale with little overhead to large systems, and b) the accuracy of the results obtained using distributed ranking is comparable to that of a centralized implementation.
- Plenary Session - Best Paper | Pp. 7-20
ROW-FS: A User-Level Virtualized Redirect-on-Write Distributed File System for Wide Area Applications
Vineet Chadha; Renato J. Figueiredo
We propose a virtualization approach to implement redirect-on-write capabilities that overlay a traditional distributed file system. The redirect-on-write distributed file system (ROW-FS) is implemented via a user-level proxy that is able to selectively steer Network File System (NFS) RPC calls to one of two servers: a “main” read-only server, and a “shadow” read-write server. By employing virtualization by means of a user-level proxy and using the de-facto standard NFS protocol, ROW-FS can be mounted as an NFS file system by existing, unmodified clients from a variety of platforms, and requires no changes to existing kernels. Its primary application is in supporting wide-area computing environments, where ROW-FS can provide improved performance and fault-tolerance (file system modifications can be check-pointed along with application state). Results show that benchmark applications including Linux kernel compilation and instantiation of virtual machines across wide-area networks achieve substantially better performance with ROW-FS as compared to NFS.
- Session I - Applications on I/O and FPGAs | Pp. 21-34
No More Energy-Performance Trade-Off: A New Data Placement Strategy for RAID-Structured Storage Systems
Tao Xie; Yao Sun
Many real-world applications like Video-On-Demand (VOD) and Web servers require prompt responses to access requests. However, with an explosive increase of data volume and the emerging of faster disks with higher power requirements, energy consumption of disk based storage systems has become a salient issue. To achieve energy-conservation and prompt responses simultaneously, in this paper we propose a novel energy-saving data placement strategy, called Striping-based Energy-Aware (SEA), which can be applied to RAID-structured storage systems to noticeably save energy while providing quick responses. Further, we implement two SEA-powered RAID-based data placement algorithms, SEA0 and SEA5, by incorporating the SEA strategy into RAID-0 and RAID-5, respectively. Extensive experimental results demonstrate that compared with three well-known data placement algorithms Greedy, SP, and HP, SEA0 and SEA5 reduce mean response time on average at least 52.15% and 48.04% while saving energy on average no less than 10.12% and 9.35%, respectively.
- Session I - Applications on I/O and FPGAs | Pp. 35-46
Reducing the I/O Volume in an Out-of-Core Sparse Multifrontal Solver
Emmanuel Agullo; Abdou Guermouche; Jean-Yves L’Excellent
High performance sparse direct solvers are often a method of choice in various simulation problems. However, they require a large amount of memory compared to iterative methods. In this context, out-of-core solvers must be employed, where disks are used when the storage requirements are too large with respect to the physical memory available. In this paper, we study how to minimize the requirements in the multifrontal method, a particular direct method to solve large-scale problems efficiently. Experiments on large real-life problems also show that the volume of obtained when minimizing the storage requirement can be significantly reduced by applying algorithms designed to reduce the volume.
- Session I - Applications on I/O and FPGAs | Pp. 47-58
Experiments with a Parallel External Memory System
Mohammad R. Nikseresht; David A. Hutchinson; Anil Maheshwari
The theory of bulk-synchronous parallel computing has produced a large number of attractive algorithms, which are provably optimal in some sense, but typically require that the aggregate random access memory (RAM) of the processors be sufficient to hold the entire data set of the parallel problem instance. In this work we investigate the performance of parallel algorithms for extremely large problem instances relative to the available RAM. We describe a system, Parallel External Memory System (PEMS), which allows existing parallel programs designed for a large number of processors without disks to be adapted easily to smaller, realistic numbers of processors, each with its own disk system. Our experiments with PEMS show that this approach is practical and promising and the run times scale predictable with the number of processors and with the problem size.
- Session I - Applications on I/O and FPGAs | Pp. 59-70