Catálogo de publicaciones - libros

Compartir en
redes sociales


Título de Acceso Abierto

Supercomputing Frontiers: Supercomputing Frontiers

Parte de: Theoretical Computer Science and General Issues

En conferencia: 4º Asian Conference on Supercomputing Frontiers (SCFA) . Singapore, Singapore . March 26, 2018 - March 29, 2018

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

artificial intelligence; big data; cloud computing; communication; computer architecture; computer science; computer systems; data management; databases; hardware; High-Performance Computing (HPC); information management; map-reduce; processors; programming languages; semantics; wireless telecommunication systems

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No requiere 2018 Directory of Open access Books acceso abierto
No requiere 2018 SpringerLink acceso abierto

Información

Tipo de recurso:

libros

ISBN impreso

978-3-319-69952-3

ISBN electrónico

978-3-319-69953-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Tabla de contenidos

Erratum to: Machine Learning Predictions for Underestimation of Job Runtime on HPC System

Jian Guo; Akihiro Nomura; Ryan Barton; Haoyu Zhang; Satoshi Matsuoka

Laboratory medicine, along with the airline industry, has a long history of utilising quality management systems. It took until 1999 for The Joint Accreditation Committee of the International Society for Cellular Therapy (ISCT) and the European Group for Blood and Marrow Transplantation (EBMT), known as JACIE, to be established as an accreditation system in the field of haematopoietic stem cell transplantation (HSCT). The aim was to create a standardised system of accreditation to be officially recognised across Europe and was based on the accreditation standards established by the US-based Foundation for the Accreditation of Cellular Therapy (FACT).

Since the concept of JACIE was originally launched, many European centres have applied for initial accreditation with other centres gaining reaccreditation for the 2nd or 3rd time. Transplant units, outside of Europe, have accepted the importance of the JACIE Standards, with units in South Africa, Singapore and Saudi Arabia also gaining accreditation.

There is evidence that both donor and patient care have improved within the accredited centres (Passweg et al., Bone Marrow Transpl, 47:906–923, 2012; Demiriz IS, Tekgunduz E, Altuntas F (2012) What is the most appropriate source for hematopoietic stem cell transplantation? Peripheral Stem Cell/Bone Marrow/Cord Blood Bone Marrow Res. (2012):Article ID 834040 (online)). However, there is a lack of published evidence demonstrating that this improvement directly results from better nursing care. Therefore, the authors conducted a survey of nursing members of the European Blood and Marrow Transplantation Nurses Group (EBMT (NG)) to identify how nurses working in the area of HSCT felt that JACIE impacted in the care they delivered and the general implications of JACIE for nurses.

Pp. E1-E2

HHVSF: A Framework to Accelerate Drug-Based High-Throughput Virtual Screening on High-Performance Computers

Pin Chen; Xin Yan; Jiahui Li; Yunfei Du; Jun Xu

The High-performance High-throuhput Virtual Screening Framework (HHVSF) has been developed to accelerate High-Throughput Virtual Screening (HTVS) on high-performance computers. Task management and data management are two core components in HHVSF. Fine-grained computing resources are configured to support serial or threaded applications. Each task gets the input file from database through a preemptive algorithm and the failed tasks can be found and corrected. NoSQL database MongoDB is used as the data repository engine. Data is mobilized between the RAMDISK in computing node and the database. Data analysis is carried out after the computing process, and the results are stored in the database. Among the most popular molecular docking and molecular structure similarity packages, Autodock_vina (ADV) and WEGA were chosen to carry out experiments. Results show that when ADV was used for molecular docking, 10 million molecules were screened and analyzed in 22.31 h with 16000 cores, and the throughput reached up to 1324 molecules per second, averaging 145 molecules per second during the steady-running process. For WEGA, 958 million conformations were screened and analyzed in 34.12 min with 4000 cores, of which throughput reached up to 9448 molecules per second, 6430 molecules per second on average.

- Big Data | Pp. 3-17

HBasechainDB – A Scalable Blockchain Framework on Hadoop Ecosystem

Manuj Subhankar Sahoo; Pallav Kumar Baruah

After the introduction of Bitcoin, blockchain has made its way through numerous applications and been adopted by various communities. A number of implementations exist today providing a platform to carry on business with ease. However, it is observed the scalability of blockchain still remains an issue. Also, none of the framework can claim the ability to handle Big Data and support to perform analytics, which is an important and integral facet of current world of business. We propose HBasechainDB, a scalable blockchain-based tamper-proofed Big Data store for distributed computing. HBasechainDB adds the blockchain characteristics of immutability and decentralization to the HBase database in the Hadoop ecosystem. Linear scaling is achieved by pushing computation to the data nodes. HBasechainDB comes with inherent property of efficient big data processing as it is built on Hadoop ecosystem. HBasechainDB also makes adaptation of blockchain very easy for those organizations whose business logic are already existing on Hadoop ecosystem. HBasechainDB can be used as a tamper-proof, decentralized, distributed Big Data store.

- Big Data | Pp. 18-29

DETOUR: A Large-Scale Non-blocking Optical Data Center Fabric

Jinzhen Bao; Dezun Dong; Baokang Zhao

Optical data center networks (DCNs) are attracting growing interest due to the technical strength compared to traditional electrical switching networks, which effectively eliminates the potential hotspot caused by over-subscription. However, the evolving traffics with high fan-out and various patterns pose new challenges to optical DCNs. Prior solutions are either hard to support high fan-out communications in large-scale or suffer from limited connections with low performance.

In this paper we propose DETOUR, a large-scale non-blocking optical switching data center fabric. DETOUR composes of optical circuit switches (OCSes) and connects them in a 2D-Torus topology. It supports up to 729 racks and 69K+ ports with each OCS having 96 wavelengths. DETOUR utilizes a broadcast-and-select mechanism and enables signals optically forwarded to any dimension. Moreover, it realizes non-blocking by recursively adjusting conflict links between the diagonal forwarding OCSes. Our extensive evaluation results show that DETOUR delivers comparable high performance to a non-blocking optical switching fabric. It outperforms up to 2.14 higher throughput, and reduces 34% flow completion times (FCT) and 21% energy consumption compared with the state-of-the-art works.

- Big Data | Pp. 30-50

Querying Large Scientific Data Sets with Adaptable IO System ADIOS

Junmin Gu; Scott Klasky; Norbert Podhorszki; Ji Qiang; Kesheng Wu

When working with a large dataset, a relatively small fraction of data records are of interest in each analysis operation. For example, while examining a billion-particle dataset from an accelerator model, the scientists might focus on a few thousand fastest particles, or on the particle farthest from the beam center. In general, this type of selective data access is challenging because the selected data records could be anywhere in the dataset and require a significant amount of time to locate and retrieve. In this paper, we report our experience of addressing this data access challenge with the Adaptable IO System ADIOS. More specifically, we design a query interface for ADIOS to allow arbitrary combinations of range conditions on known variables, implement a number of different mechanisms for resolving these selection conditions, and devise strategies to reduce the time needed to retrieve the scattered data records. In many cases, the query mechanism can retrieve the selected data records orders of magnitude faster than the brute-force approach.

Our work relies heavily on the data processing feature of ADIOS to allow user functions to be executed in the data transport pipeline. This feature allows us to build indexes for efficient query processing, and to perform other intricate analyses while the data is in memory.

- Big Data | Pp. 51-69

On the Performance of Spark on HPC Systems: Towards a Complete Picture

Orcun Yildiz; Shadi Ibrahim

Big Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been increasingly used by many companies and research labs to facilitate large-scale data analysis. However, with the growing needs of users and size of data, commodity-based infrastructure will strain under the heavy weight of Big Data. On the other hand, HPC systems offer a rich set of opportunities for Big Data processing. As first steps toward Big Data processing on HPC systems, several research efforts have been devoted to understanding the performance of Big Data applications on these systems. Yet the HPC specific performance considerations have not been fully investigated. In this work, we conduct an experimental campaign to provide a clearer understanding of the performance of Spark, the in-memory data processing framework, on HPC systems. We ran Spark using representative Big Data workloads on Grid’5000 testbed to evaluate how the latency, contention and file system’s configuration can influence the application performance. We discuss the implications of our findings and draw attention to new ways (e.g., burst buffers) to improve the performance of Spark on HPC systems.

- Big Data | Pp. 70-89

Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

Peng Cheng; Yutong Lu; Yunfei Du; Zhiguang Chen

With the rapid development of big data analytics frameworks, many existing high performance computing (HPC) facilities are evolving new capabilities to support big data analytics workloads. However, due to the different workload characteristics and optimization objectives of system architectures, migrating data-intensive applications to HPC systems that are geared for traditional compute-intensive applications presents a new challenge. In this paper, we address a critical question on how to accelerate complex application that contains both data-intensive and compute-intensive workloads on the Tianhe-2 system by deploying an in-memory file system as data access middleware; we characterize the impact of storage architecture on data-intensive MapReduce workloads when using Lustre as the underlying file system. Based on our characterization and findings of the performance behaviors, we propose shared map output shuffle strategy and file metadata cache layer to alleviate the impact of metadata bottleneck. The evaluation of these optimization techniques shows up to 17% performance benefit for data-intensive workloads.

- Big Data | Pp. 90-106

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use

Kazuaki Matsumura; Mitsuhisa Sato; Taisuke Boku; Artur Podobas; Satoshi Matsuoka

Graphics Processing Units (GPUs) perform the majority of computations in state-of-the-art supercomputers. Programming these GPUs is often assisted using a programming model such as (amongst others) the directive-driven OpenACC. Unfortunately, OpenACC (and other similar models) are incapable of automatically targeting and distributing work across several GPUs, which decreases productivity and forces needless manual labor upon programmers. We propose a method that enables OpenACC applications to target multi-GPU. Workload distribution, data transfer and inter-GPU communication (including modern GPU-to-GPU links) are automatically and transparently handled by our compiler with no user intervention and no changes to the program code. Our method leverages existing OpenMP and OpenACC backends, ensuring easy integration into existing HPC infrastructure. Empirically we quantify performance gains and losses in our data coherence method compared to similar approaches, and also show that our approach can compete with the performance of hand-written MPI code.

- GPU/FPGA | Pp. 109-127

Acceleration of Wind Simulation Using Locally Mesh-Refined Lattice Boltzmann Method on GPU-Rich Supercomputers

Naoyuki Onodera; Yasuhiro Idomura

A real-time simulation of the environmental dynamics of radioactive substances is very important from the viewpoint of nuclear security. Since airflows in large cities are turbulent with Reynolds numbers of several million, large-scale CFD simulations are needed. We developed a CFD code based on the adaptive mesh-refined Lattice Boltzmann Method (AMR-LBM). AMR method arranges fine grids in a necessary region, so that we can realize a high-resolution analysis including a global simulation area. The code is developed on the GPU-rich supercomputer TSUBAME3.0 at the Tokyo Tech, and the GPU kernel functions are tuned to achieve high performance on the Pascal GPU architecture. The code is validated against a wind tunnel experiment which was released from the National Institute of Advanced Industrial Science and Technology in Japan Thanks to the AMR method, the total number of grid points is reduced to less than 10% compared to the fine uniform grid system. The performances of weak scaling from 1 nodes to 36 nodes are examined. The GPUs (NVIDIA TESLA P100) achieved more than 10 times higher node performance than that of CPUs (Broadwell).

- GPU/FPGA | Pp. 128-145

Architecture of an FPGA-Based Heterogeneous System for Code-Search Problems

Yuki Hiradate; Hasitha Muthumala Waidyasooriya; Masanori Hariyama; Masaaki Harada

Code search problems refer to searching a particular bit pattern that satisfies given constraints. Obtaining such codes is very important in fields such as data encoding, error correcting, cryptography, etc. Unfortunately, the search time increases exponentially with the number of bits in the code, and typically requires many months of computation to find large codes. On the other hand, the search method mostly consists of 1-bit computations, so that reconfigurable hardware such as FPGAs (field programmable gate arrays) can be used to successfully obtain a massive degree of parallelism. In this paper, we propose a heterogeneous system with a CPU and an FPGA to speed-up code search problems. According to the evaluation, we obtain over 86 times speed-up compared to typical CPU-based implementation for extremal doubly even self-dual code search problem of length 128.

- GPU/FPGA | Pp. 146-155