Catálogo de publicaciones - libros

Compartir en
redes sociales

High Performance Computing: HiPC 2006: 13th International Conference, Bangalore, India, December 18-21, 2006, Proceedings

Yves Robert ; Manish Parashar ; Ramamurthy Badrinath ; Viktor K. Prasanna (eds.)

En conferencia: 13º International Conference on High-Performance Computing (HiPC) . Bangalore, India . December 18, 2006 - December 21, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Processor Architectures; Software Engineering/Programming and Operating Systems; Computer Systems Organization and Communication Networks; Algorithm Analysis and Problem Complexity; Computation by Abstract Devices; Mathematics of Computing

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-68039-0

ISBN electrónico

978-3-540-68040-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Artes

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11945918_11

Minimizing Average Response Time for Scheduling Stochastic Workload in Heterogeneous Computational Grids

Jie Hu; Raymond Klefstad

Scheduling stochastic workloads is a difficult task. We analyze minimum average response time of computational grids composed of nodes with multiple processors when stochastic workloads are scheduled to the grids. We propose an algorithm to achieve minimum average response time of grids. We compare the minimum average response time of grids with the average response time of grids with load balancing scheduling in different cases. Specifically, we analyze the impact of differential processor speeds, the number of processors per node, and utilization rate of the grids on the difference between these two scheduling strategies. These analysis provide deeper understanding of average response time of grids, which will allow us to design more efficient algorithms for Grid workload scheduling.

Palabras clave: Load Balance; Utilization Rate; Service Rate; System Load; Grid Environment.

- Session I – Scheduling and Load Balancing | Pp. 47-59

doi: 10.1007/11945918_12

Advanced Reservation-Based Scheduling of Task Graphs on Clusters

Anthony Sulistio; Wolfram Schiffmann; Rajkumar Buyya

A Task Graph (TG) is a model of a parallel program that consists of many subtasks that can be executed simultaneously on different processing elements. Subtasks exchange data via an interconnection network. The dependencies between subtasks are described by means of a Directed Acyclic Graph. Unfortunately, due to their characteristics, scheduling a TG requires dedicated or uninterruptible resources. Moreover, scheduling a TG by itself results in a low resource utilization because of the dependencies among the subtasks. Therefore, in order to solve the above problems, we propose a scheduling approach for TGs by using advance reservation in a cluster environment. In addition, to improve resource utilization, we also propose a scheduling solution by interweaving one or more TGs within the same reservation block and/or backfilling with independent jobs.

Palabras clave: Edge Length; Test Bench; Parallel Program; Task Graph; Total Completion Time.

- Session I – Scheduling and Load Balancing | Pp. 60-71

doi: 10.1007/11945918_13

Estimation Based Load Balancing Algorithm for Data-Intensive Heterogeneous Grid Environments

Ruchir Shah; Bharadwaj Veeravalli; Manoj Misra

Grid computing holds the great promise to effectively share geographically distributed heterogeneous resources to solve large-scale complex scientific problems. One of the distinct characteristics of the Grid system is resource heterogeneity. The effective use of the Grid requires an approach to manage the heterogeneity of the involved resources that can include computers, data, network, etc. In this paper, we proposed a de-centralized and adaptive load balancing algorithm for heterogeneous Grid environment. Our algorithm estimates different system parameters (such as job arrival rate, CPU processing rate, load at processor) and effectively performs load balancing by considering all necessary affecting criteria. Simulation results demonstrate that our algorithm outperforms conventional approaches in the event of heterogeneous environment and when communication overhead is significant.

Palabras clave: Grid systems; Heterogeneous environment; Load balancing; Average response time; Communication overhead; Migration.

- Session I – Scheduling and Load Balancing | Pp. 72-83

doi: 10.1007/11945918_14

Improving the Flexibility of the Deficit Table Scheduler

Raúl Martínez; Francisco J. Alfaro; José L. Sánchez

A key component for networks with Quality of Service (QoS) support is the egress link scheduler. The table-based schedulers are simple to implement and can offer good latency bounds. Some of the latest proposals of network technologies, like Advanced Switching and InfiniBand, define in their specifications one of these schedulers. However, these schedulers do not work properly with variable packet sizes and face the problem of bounding the bandwidth and latency assignments. We have proposed a new table-based scheduler, the Deficit Table (DTable) scheduler, that works properly with variable packet sizes. Moreover, we have proposed a methodology to configure this table-based scheduler that partially decouples the bandwidth and latency assignments. In this paper we propose a method to improve the flexibility of the decoupling methodology. Moreover, we compare the latency performance of this strategy with two well-known scheduling algorithms: the Self-Clocked Weighted Fair Queuing (SCFQ) and the Deficit Round Robin (DRR) algorithms.

- Session I – Scheduling and Load Balancing | Pp. 84-97

doi: 10.1007/11945918_15

A Cache-Pinning Strategy for Improving Generational Garbage Collection

Vimal K. Reddy; Richard K. Sawyer; Edward F. Gehringer

In generational garbage collection, the youngest generation of the heap is frequently traversed during garbage collection. Due to randomness of the traversal, memory access patterns are unpredictable and cache performance becomes crucial to garbage-collection efficiency. Our proposal to improve cache performance of garbage collection is to “pin” the youngest generation (sometimes called the nursery) in the cache, converting all nursery accesses to cache hits. To make the nursery fit inside the cache, we reduce its size, and, to prevent its eviction from the cache, we configure the operating system’s page-fault handler to disallow any page allocation that would cause cache conflicts to the nursery. We evaluated our scheme on a copying-style generational garbage collector using IBM VisualAge Smalltalk and Jikes research virtual machine. Our simulation results indicate that the increase in frequency of garbage collection due to a smaller nursery is overshadowed by gains of converting all nursery accesses to cache hits.

Palabras clave: Garbage Collection; Pause Time; Garbage Collector; Page Frame; Cache Conflict.

- Session II – Architectures | Pp. 98-110

doi: 10.1007/11945918_16

A Realizable Distributed Ion-Trap Quantum Computer

Darshan D. Thaker; Tzvetan S. Metodi; Frederic T. Chong

Recent advances in trapped ion technology have rapidly accelerated efforts to construct a near-term, scalable quantum computer. Micro-machined electrodes in silicon are expected to trap hundreds of ions, each representing quantum bits, on a single chip. We find, however, that scalable systems must be composed of multiple chips and we explore inter-chip communication technologies. Specifically, we explore the parallelization of modular exponentiation, the substantially dominant portion of Shor’s algorithm, on multi-chip ion-trap systems with photon-mediated communication between chips. Shor’s algorithm, which factors the product of two primes in polynomial time on quantum computers, has strong implications for public-key cryptography and has been the driving application behind much of the research in quantum computing. Parallelization of the algorithm is necessary to obtain tractable execution times on large problems. Our results indicate that a 1024-bit RSA key can be factored in 13 days given 4300 (each of area 10 by 10 centimeters) ion-trap chips in a multi-chip system.

- Session II – Architectures | Pp. 111-122

doi: 10.1007/11945918_17

Segmented Bitline Cache: Exploiting Non-uniform Memory Access Patterns

Ravishankar Rao; Justin Wenck; Diana Franklin; Rajeevan Amirtharajah; Venkatesh Akella

On chip caches in modern processors account for a sizable fraction of the dynamic and leakage power. Much of this power is wasted, required only because the memory cells farthest from the sense amplifiers in the cache must discharge a large capacitance on the bitlines. We reduce this capacitance by segmenting the memory cells along the bitlines, and turning off the segmenters to reduce the overall bitline capacitance. The success of this cache relies on accessing segments near the sense-amps much more often than remote segments. We show that the access pattern to the first level data and instruction cache is extremely skewed. Only a small set of cache lines are accessed frequently. We exploit this non-uniform cache access pattern by mapping the frequently accessed cache lines closer to the sense amp. These lines are isolated by segmenting circuits on the bitlines and hence dissipate lesser power when accessed. Modifications to the address decoder enable a dynamic re-mapping of cache lines to segments. In this paper, we explore the design-space of segmenting the level one data and instruction caches. Instruction and data caches show potential power savings of 10% and 6% respectively on the subset of benchmarks simulated.

- Session II – Architectures | Pp. 123-134

doi: 10.1007/11945918_18

Trade-Offs in Transient Fault Recovery Schemes for Redundant Multithreaded Processors

Joseph Sharkey; Nayef Abu-Ghazeleh; Dmitry Ponomarev; Kanad Ghose; Aneesh Aggarwal

CMOS downscaling trends, manifested in the use of smaller transistor feature sizes and lower supply voltages, make microprocessors more and more vulnerable to transient errors with each new technology generation. One architectural approach to detecting and recovering from such errors is to execute two copies of the same program and then compare the results. While comparing only the store instructions is sufficient for error detection, register values also have to be compared to support fault recovery. In this paper, we propose novel checkpoint-assisted mechanisms for efficient fault recovery that dramatically reduce the number of register values to be compared for detecting soft errors and perform comprehensive investigation of these and other existing recovery schemes from the standpoint of performance, power and design complexity.

Palabras clave: Register File; Cache Line; Soft Error; Transient Fault; Main Thread.

- Session II – Architectures | Pp. 135-147

doi: 10.1007/11945918_19

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors

Venkatesan Packirisamy; Shengyue Wang; Antonia Zhai; Wei-Chung Hsu; Pen-Chung Yew

Speculative multithreading is a technique that has been used to improve single thread performance. Speculative multithreading architectures for Chip multiprocessors (CMPs) have been extensively studied. But there have been relatively few studies on the design of speculative multithreading for simultaneous multithreading (SMT) processors. The current SMT based designs – IMT [9] and DMT [2] use load/store queue (LSQ) to perform dependence checking. Since the size of the LSQ is limited, this design is suitable only for small threads. In this paper we present a novel cache-based architecture support for speculative simultaneous multithreading which can efficiently handle larger threads. In our architecture, the associativity in the cache is used to buffer speculative values. Our 4-thread architecture can achieve about 15% speedup when compared to the equivalent superscalar processors and about 3% speedup on the average over the LSQ-based architectures, however, with a less complex hardware. Also our scheme can perform 14% better than the LSQ-based scheme for larger threads.

Palabras clave: Cache Line; Speculative Load; Chip Multiprocessor; Speculative Thread; Dependence Violation.

- Session II – Architectures | Pp. 148-158

doi: 10.1007/11945918_20

Dynamic Internet Congestion with Bursts

Stefan Schmid; Roger Wattenhofer

This paper studies throughput maximization in networks with dynamically changing congestion. First, we give a new and simple analysis of an existing model where the bandwidth available to a flow varies multiplicatively over time. The main contribution however is the introduction of a novel model for dynamics based on concepts of network calculus. This model features a limited form of amortization: After quiet times where the available bandwidth was roughly constant, the congestion may change more abruptly. We present a competitive algorithm for this model and also derive a lower bound.

Palabras clave: Congestion Control; Competitive Ratio; Online Algorithm; Transport Layer; Arrival Curve.

- Session III – Network and Distributed Algorithms | Pp. 159-170