Catálogo de publicaciones - libros

Compartir en
redes sociales

High Performance Computing: HiPC 2006: 13th International Conference, Bangalore, India, December 18-21, 2006, Proceedings

Yves Robert ; Manish Parashar ; Ramamurthy Badrinath ; Viktor K. Prasanna (eds.)

En conferencia: 13º International Conference on High-Performance Computing (HiPC) . Bangalore, India . December 18, 2006 - December 21, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Processor Architectures; Software Engineering/Programming and Operating Systems; Computer Systems Organization and Communication Networks; Algorithm Analysis and Problem Complexity; Computation by Abstract Devices; Mathematics of Computing

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-68039-0

ISBN electrónico

978-3-540-68040-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Artes

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11945918_31

Receive Side Coalescing for Accelerating TCP/IP Processing

Srihari Makineni; Ravi Iyer; Partha Sarangam; Donald Newell; Li Zhao; Ramesh Illikkal; Jaideep Moses

With rapid advancements in Ethernet technology, Ethernet speeds have increased by 10 fold, from 1 to 10Gbps, in a period of 2-3 years. This sudden increase in speeds has outpaced the rate at which processor and memory speeds have been increasing, raising concerns that TCP/IP processing will not scale to these levels. As a result, applications running on commercial servers will not be able to take advantage of the increased Ethernet bandwidth. This has led to a flurry of activity in the industry and academia focused on finding ways to scale up TCP/IP processing to 10Gbps and beyond. In this paper, we propose a novel technique called "Receive Side Coalescing" (RSC) that increases TCP/IP processing efficiencies significantly. RSC allows NICs to identify packets that belong to same TCP/IP flow and coalesce them into a single large packet. As a result, TCP/IP stack has to process fewer packets reducing per packet processing costs. NIC can do this coalescing of packets during interrupt moderation time hence packet latency is not affected. We have collected packet traces and analyzed those to find out how much coalescing is possible in different scenarios. Our analysis shows that about 50% reduction in number of packets is possible. We have prototyped RSC on Windows and Linux to understand the benefits, and the results show that 2-7% of savings in CPU utilization is possible at 1Gbps speeds. Projection models developed to estimate processing costs at 10Gbps show that RSC can save up to 20% of the CPU.

Palabras clave: Receive Side Coalescing; RSC; TOE; TCP/IP acceleration; de-fragmentation; receive offload.

- Session V – Network Services | Pp. 289-300

doi: 10.1007/11945918_32

Minimizing Metadata Access Latency in Wide Area Networked File Systems

Jian Liang; Aniruddha Bohra; Hui Zhang; Samrat Ganguly; Rauf Izmailov

Traditional network file systems, like NFS, do not extend to wide-area due to low bandwidth and high network latency. We present WireFS, a Wide Area File System, which enables delegation of metadata management to nodes at client sites ( homes ). The home of a file stores the most recent copy of the file, serializes all updates, and streams updates to the central file server. WireFS uses access history to migrate the home of a file to the client site which accesses the file most frequently. We formulate the home migration problem as an integer programming problem, and present two algorithms: a dynamic programming approach to find the optimal solution, and a non-optimal but more efficient greedy algorithm. We show through extensive simulations that even in the WAN setting, access latency over WireFS is comparable to NFS performance in the LAN setting; the migration overhead is also marginal.

Palabras clave: Exponential Weighted Moving Average; Access Latency; Home Node; Cooperative Cache; Metadata Management.

- Session V – Network Services | Pp. 301-312

doi: 10.1007/11945918_33

Connecting Pervasive Frameworks Through Mediation

Florence T. Balagtas; Cedric Angelo M. Festin

Context information helps an application decide on what to do in order to adapt to its user’s needs. To easily develop ubiquitous applications, there has been increased research in the design and development of frameworks called pervasive computing frameworks. Although these frameworks help application developers create ubiquitous applications easily, interoperability has been a problem because of the different representation of context information and protocols used. This research attempts to solve this problem by creating a Context Information Mediator (CIM) which will serve as a translation gateway between different applications created using different frameworks. To test our system, we developed two versions of an inventory system application that keeps track of items inside a building. The idea here is to let these applications communicate with each other and share information through the CIM.

Palabras clave: Context Information; Data Size; Pervasive Computing; Client Application; Translation Time.

- Session V – Network Services | Pp. 313-325

doi: 10.1007/11945918_34

Error Resilient Video Streaming for Heterogeneous Networks

Divyashikha Sethia; Huzur Saran

We consider the problem of video streaming for a critical private web cast, for a medium sized audience with heterogeneous nodes having different bandwidths and reliabilities. The nodes can distribute video in a peer-to-peer manner by forming a multicast tree at application level. A majority of the nodes in the network have low bandwidths and low reliability and can only receive the video stream. A simulation model has been implemented to compare single video streaming scheme with error resilience schemes with stream replication and Multiple Description Coding (MDC) [6][7]. Results indicate that MDC error resilience scheme provides lower average outage, better video quality and network utilization as packet loss percentage and node failure probability increases. We discuss the significance of path diversity in multiple multicast trees for error resilience and the number of multicast trees.

Palabras clave: Video Streaming; Error Resilience; Multiple Description Coding (MDC); Path Diversity.

- Session V – Network Services | Pp. 326-337

doi: 10.1007/11945918_35

Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport

Joseph Antony; Pete P. Janes; Alistair P. Rendell

Modern shared memory multiprocessor systems commonly have non-uniform memory access (NUMA) with asymmetric memory bandwidth and latency characteristics. Operating systems now provide application programmer interfaces allowing the user to perform specific thread and memory placement. To date, however, there have been relatively few detailed assessments of the importance of memory/thread placement for complex applications. This paper outlines a framework for performing memory and thread placement experiments on Solaris and Linux. Thread binding and location specific memory allocation and its verification is discussed and contrasted. Using the framework, the performance characteristics of serial versions of lmbench, Stream and various BLAS libraries (ATLAS, GOTO, ACML on Opteron/Linux and Sunperf on Opteron, UltraSPARC/Solaris) are measured on two different hardware platforms (UltraSPARC/FirePlane and Opteron/HyperTransport). A simple model describing performance as a function of memory distribution is proposed and assessed for both the Opteron and UltraSPARC.

Palabras clave: Application Program Interface; Memory Bandwidth; Memory Allocation; Data Quantity; Virtual Address.

- Session VI – Applications | Pp. 338-352

doi: 10.1007/11945918_36

Low Power Scheduling of DAGs to Minimize Finish Times

Sanjeev Baskiyar; Kiran Kumar Palli

We propose a schedule named Low Power Heterogeneous Makespan (LPHM) that attempts to minimize makespan as well as power consumption in the execution of any directed acyclic task graph on heterogeneous processors. We combine the techniques of Heterogeneous Earliest Finish Time (HEFT) [9] and voltage scaling [4]. The processors used for execution are considered to be continuously voltage scalable within the range of operation. After initial scheduling for minimum makespan, the processors are voltage scaled down to reduce power consumption whenever there is an idle time. This voltage scaling is performed without violating the precedence relationships among tasks. The simulation results show power savings of 22% over HEFT with no increase in makespan.

Palabras clave: DAG; Power; Scheduling; Makespan; Heterogeneous; Voltage Scaling.

- Session VI – Applications | Pp. 353-362

doi: 10.1007/11945918_37

GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment

Weiguo Liu; Bertil Schmidt; Gerrit Voss; Wolfgang Müller-Wittig

Molecular Biologists frequently compute multiple sequence alignments (MSAs) to identify similar regions in protein families. However, aligning hundreds of sequences by popular MSA tools such as ClustalW requires several hours on sequential computers. Due to the rapid growth of biological sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to reduce this runtime using graphics processing units (GPUs). To derive an efficient mapping onto this type of architecture, we have reformulated the computationally most expensive part of ClustalW in terms of computer graphics primitives. This results in a high-speed implementation with significant runtime savings on a commodity graphics card.

- Session VI – Applications | Pp. 363-374

doi: 10.1007/11945918_38

Load Balanced Block Lanczos Algorithm over GF(2) for Factorization of Large Keys

Wontae Hwang; Dongseung Kim

Researchers use NFS (Number Field Sieve) method with Lanczos algorithm to analyze big-sized RSA keys. NFS method includes the integer factorization process and nullspace computation of huge sparse matrices. Parallel processing is indispensible since sequential computation requires weeks (even months) of CPU time with supercomputers even for 150-digit RSA keys. This paper presents details of improved block Lanczos algorithm based on previous implementation[4,10]. It includes a new load balancing scheme by partitioning the matrix such that the numbers of nonzero components in the submatrices become equal. Experimentally, a speedup up to 6 and the maximum of efficiency of 0.74 have been achieved using an 8-node cluster with Myrinet interconnection.

Palabras clave: parallel/cluster computing; cryptology; RSA key; load balancing; sparse matrix.

- Session VI – Applications | Pp. 375-386

doi: 10.1007/11945918_39

Parallel Support Graph Preconditioners

Meiqiu Wang; Vivek Sarin

Support graph preconditioning is a relatively new technique that has gained attention in recent years. Unlike incomplete factorization-based preconditioning, this is a robust technique whose performance is not affected significantly by domain characteristics such as anisotropy and inhomogeneity. A major limitation of this technique is that it is applicable to symmetric diagonally dominant M-matrices only. In this paper, we outline an extension of the technique to symmetric positive definite matrices arising from finite element discretization of elliptic problems. An added advantage of our approach is the inherent parallelism that can be exploited to develop efficient parallel preconditioners. Our method allows trade-off between the preconditioner’s parallelism and the rate of convergence of the iterative solver. In contrast, efforts to parallelize incomplete factorization-based preconditioners often result in much slower convergence. Numerical results show that our preconditioner achieves good parallel speedup on distributed memory multiprocessors such as Beowulf workstation clusters.

Palabras clave: Iterative methods; preconditioning; support graphs; parallel computing.

- Session VI – Applications | Pp. 387-398

doi: 10.1007/11945918_40

Group Based Routing in Disconnected Ad Hoc Networks

Markose Thomas; Arobinda Gupta; Srinivasan Keshav

In this paper, we propose a routing protocol for disconnected ad hoc networks where most nodes tend to move about in groups. To the best of our knowledge, no routing protocol for disconnected ad hoc networks has been designed earlier keeping in mind possible group patterns formed by the movement of nodes. Our protocol works by identifying groups using an efficient distributed group membership protocol, and then routing at the group level, rather than at the node level. The protocol is designed so that existing concepts of routing in disconnected ad hoc networks can be extended to work at the group level. Initial simulations across a broad spectrum of parameters suggest that our protocol performs better in terms of delivery ratio and latency over traditional approaches like AODV [1], and also over disconnected routing approaches like the 2-Hop routing protocol [2].

Palabras clave: Destination Node; Delivery Ratio; Group Leader; Mobility Model; Communication Range.

- Session VII – Ad-Hoc Networks | Pp. 399-410