Presentation is loading. Please wait.

Presentation is loading. Please wait.

HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International.

Similar presentations


Presentation on theme: "HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International."— Presentation transcript:

1 HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International Inc Newark, Delaware USA

2 From “Cool Vendors” Report – By Gartner (April 17,2012):
Who is ETI ? From “Cool Vendors” Report – By Gartner (April 17,2012): [ ET International Newark, Delaware ( Analysis by Carl Claunch Why Cool: ET International delivers its dataflow-oriented ETI Swarm environment for garnering high efficiency from highly parallel software, based on the alternative ParalleX execution model. As highly parallel execution becomes essential to addressing the more substantial computing tasks that HPC users face today, progress is increasingly being stymied by the application's inability to keep all the parallel strands working productively. …] 1 minute Finish by 1 minute 15 seconds

3 Motivation Many-core is coming Hardware is getting more heterogeneous
Current paradigms don't have the expressive power to harness concurrency Hardware is getting more heterogeneous Current hybrid programming techniques (OpenMP+MPI+OpenCL) are not maintainable: too complicated Caches are disappearing or becoming non-coherent Distributed memory is everywhere, and at different levels Fine grained power management Use what you need and turn off/down the rest Failure is the norm Resilience must be baked in the whole stack (application, compiler, runtime, hardware) Increasing Application Computation/data Irregularity Static scheduling can no longer properly load balance 1 minute Finish by 1 minute 15 seconds

4 We need new “Execution Models”!
ETI Vision We need new “Execution Models”! Leverage ETI’s deep and growing IP position based on 25+ years of applied R&D expertise and $20M+ in R&D software engineering and development (e.g. extensive system software base for Cyclops, CELL, SCC, Intel Runnemede, Intel X86 based machines, Adapteva, etc) Provide high-performance SWARM software solutions to our OEM’s, partners and direct customers Advance SWARM solutions to address optimization opportunities driven by heterogeneous multi-/many- core processing including: Big Compute (Private HPC Cloud) systems Big Data HPC systems HPC embedded appliances etc 1 minute

5 Execution Paradigm Comparisons
MPI, OpenMP, OpenCL SWARM Time Time Active threads Waiting 1.5 minutes Finish by 2 minute 45 seconds Communicating Sequential Processes Bulk Synchronous Message Passing Asynchronous Event-Driven Tasks Dependencies Resources Active Messages Control Migration

6 SWARM Execution Overview
Enabled Tasks Tasks with Unsatisfied Dependencies Tasks enabled SWARM Tasks mapped to resources Dependencies satisfied Start at Enabled Tasks and work clockwise. Available Resources Resources in Use CPU GPU CPU Resources allocated GPU Resources released

7 Case Studies of Fine-Gran Execution Models
Static Dataflow Model (1970s - ) EARTH Model ( ) TNT Model and Cyclops-64 ( ) Codelet Model under Intel-led DARPA/UHPC 11/19/2018 FT Gao

8 DARPA/Intel Runnemede Program
1000X Energy reduction Heterogeneous, Tightly-Coupled Simple Architecture System Management & Concurrency Assured Operation Event driven codelets Self-aware introspection Code and data motion <10% overhead Checkpoint with Flash/CPM Security Through Sandboxing CPU Resiliency Execution Model ET International, Inc. University of Illinois HW/SW Co-Design Interconnect Fabric Productivity Application Efficiency Data Movement Model-based Goal-oriented Self-morphing Heterogeneous & tapered Large local memory 30 seconds Memory Courtesy of The Intel DARPA UHPC Team 1000X energy reduction Overhauled DRAM mArch Resilient memory Our Collaborators

9 Progress & Proof Points To-Date

10 Barnes-Hut SWARM vs OpenMP
Ideal SWARM OpenMP 30 seconds Barnes-Hut

11 SWARM/MPI Performance Comparison
Consistent Speed-up from 2X to 14.5X 30 seconds

12 Cholesky Decomposition (SWARM vs MKL/ScaLAPACK)
30 seconds

13 Summary and Acknowledgements
Summary (productivity observation) N-Body: 1 man-day, 3X G-500: man-month, upto 14x Cholesky: 2 man-week, 1.5x NOTE: the base is performance of optimized code Acknowledgements Our Sponsors Our Collaborators and Colleagues My Host Others .

14 Cholesky Profiles SWARM OpenMP


Download ppt "HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International."

Similar presentations


Ads by Google