Extreme Big Data Examples

Slides:

Advertisements

Similar presentations

1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.

Advertisements

Shredder GPU-Accelerated Incremental Storage and Computation

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters

Large Scale Computing Systems

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

1 Computational models of the physical world Cortical bone Trabecular bone.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.

Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.

Running Large Graph Algorithms – Evaluation of Current State-of-the-Art Andy Yoo Lawrence Livermore National Laboratory – Google Tech Talk Feb Summarized.

Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.

Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.

Network Support for Cloud Services Lixin Gao, UMass Amherst.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.

Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

Scale up Vs. Scale out in Cloud Storage and Graph Processing Systems

| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

Parallel IO for Cluster Computing Tran, Van Hoai.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:

Parallel Programming Models

F1-17: Architecture Studies for New-Gen HPC Systems

Hang Zhang1, Xuhao Chen1, Nong Xiao1,2, Fang Liu1

Performance Assurance for Large Scale Big Data Systems

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

World’s fastest Machine Learning Engine

Early Results of Deep Learning on the Stampede2 Supercomputer

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

Computing models, facilities, distributed computing

Enabling Effective Utilization of GPUs for Data Management Systems

Chilimbi, et al. (2014) Microsoft Research

Tohoku University, Japan

Multi-Processing in High Performance Computer Architecture:

CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (

Interactive Website (

Tools and Techniques for Processing (and Management) of Data

Toward a Unified HPC and Big Data Runtime

Data Structures and Algorithms in Parallel Computing

Chapter 17: Database System Architectures

Early Results of Deep Learning on the Stampede2 Supercomputer

CS110: Discussion about Spark

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Scalable Parallel Interoperable Data Analytics Library

Ch 4. The Evolution of Analytic Scalability

Managing GPU Concurrency in Heterogeneous Architectures

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Data-Intensive Computing: From Clouds to GPU Clusters

Parallel Applications And Tools For Cloud Computing Environments

Tor Skeie Feroz Zahid Simula Research Laboratory 27th June 2018

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

Indiana University, Bloomington

$1M a year for 5 years; 7 institutions Active:

Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J

Exascale Programming Models in an Era of Big Computation and Big Data

Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale

Chapter 01: Introduction

Open Source Activity Showcase Computational Storage SNIA SwordfishTM

Assoc. Prof. Marc FRÎNCU, PhD. Habil.

Fast Accesses to Big Data in Memory and Storage Systems

Motivation Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication.

Presentation transcript:

Extreme Big Data Examples 佐藤　仁 (産業技術総合研究所　人工知能研究センター) 将来の大規模メニーコアプロセッサ環境に向けたビッグデータ基盤処理の性能評価 Extreme Big Data Examples Large-scale Graphs AI / Machine Learning Graph500: New benchmark for Big Data processing on supercomputers 0.45 sec for a Kronecker graph w/ 1 trillion vertices and 16 trillion edges using 82,944 nodes, 663,552 processes Deep Learning Training: 96 GPUs are used for a single DNN model in ImageNet dataset using a node-distributed asynchronous machine learning implementation 1146 GPUs are used in Total for the training Reaching 1 Tflops (SFP) per GPU for the cost derivative parts AIST AAIC AI-Cloud ABCI AI-IDC 〜8.4P AI-Flops Towards HPC and Big Data Convergence Motivation: How to utilize many-core environments for accelerating Extreme Big Data applications, such as Large-scale Graphs, AI, etc. How to overcome memory capacity limitation? How to offload bandwidth oblivious operations onto low throughput memory volumes? 130 P AI-Flops〜 Dropping available memory capacity per core for achieving efficient bandwidth by increasing in parallelism, heterogeneity, density of processors, e.g.) Multi-core CPUs, Many-core accelerator, Post-Moore Era Deeping Memory/Storage Architectures, such as NVM (Non-Volatile Memory), SCM (Storage Class Memory), e.g. Flash, MCDRAMM, PCM, STT-MRAM, ReRAM, HMC, etc. Out-of-core BFS for Graph500 Scalable Large-scale Sorting Effectiveness of large-scale sorting on supercomputers w/ massive amounts of throughput processors remains unclear What operations are accelerated or bottlenecked by GPUs in large-scale sorting Handling data overflow on hierarchical memory volumes may also be required Based on sample sorting, offloading the most time-consuming local sort onto GPUs and network communications GPU MapReduce for Scalable Supercomputers Weak Scaling on K20X-based TSUBAME2.5 MapReduce-based Page Rank Application for RMAT Graph (Graph500) 2.81 Giga Edges/sec on 3072 GPUs (17.2 bn vertices, 274 bn edges) A Software Framework for Large-scale Supercomputers w/ Many-core Accelerators and Burst Buffer NVMs Abstraction for deep memory hierarchy, e.g., GPU, CPU, NVM Object-oriented (C++) Weak-scaling over 1000 GPUs Out-of-core GPU data management Optimized data formats for many-core accelerators The size of graph exceeds the total GPU memory capacity 2.10x Speedup (3 GPU vs 2CPU) How to apply similar techniques for many-core environments Performance improvement by sorting, prefix-sum, data-parallel operations