Presentation is loading. Please wait.

Presentation is loading. Please wait.

Battle of the Accelerator Stars Pavan Balaji Computer Scientist Group lead, Programming Models and Runtime Systems Argonne National Laboratory

Similar presentations


Presentation on theme: "Battle of the Accelerator Stars Pavan Balaji Computer Scientist Group lead, Programming Models and Runtime Systems Argonne National Laboratory"— Presentation transcript:

1 Battle of the Accelerator Stars Pavan Balaji Computer Scientist Group lead, Programming Models and Runtime Systems Argonne National Laboratory balaji@mcs.anl.gov http://www.mcs.anl.gov/~balaji

2 Pavan Balaji, Argonne National Laboratory Accelerators != GPUs  Anything that is built to make a specific type of computation (i.e., not general purpose computation) is an accelerator –A vector instruction unit is an accelerator –The double/quad floating point operations on BG/P and Q are accelerators –An H.264 media decoder sitting on a processor die is an accelerator –There is no such thing as a “general purpose” accelerator  GPUs are one form of accelerators, but not the only ones P2S2 Workshop Panel (09/10/2012)

3 Pavan Balaji, Argonne National Laboratory Divergence in Accelerator Computing?  Divergence == Increasing difference  No -- There are a lot of different models of accelerator computing today –NVIDIA/AMD GPUs, FPGAs, AMD’s fused chip architectures, Intel MIC architecture, Intel Xeon, Blue Gene (Yes, they are accelerators too) –Broadly classified into: Decoupled processing/Decoupled memory (think GPUs) Coupled processing/Decoupled memory (think AMD Fusion) Coupled processing/Coupled memory (think Intel Xeon/MIC, BG/Q)  But the trend is not towards increasing difference, but rather towards convergence –Vendors and researchers are trying out different options to see what works and what does not P2S2 Workshop Panel (09/10/2012) You have to kiss many frogs before you can find your prince!

4 Pavan Balaji, Argonne National Laboratory Who will be the last man standing? P2S2 Workshop Panel (09/10/2012) GPUs Decoupled processing Decoupled memory Fused Processors (e.g., AMD Fusion) Coupled processing Decoupled memory General Purpose Processors with Accelerator Extensions (e.g., Xeon, MIC, BG/P, BG/Q) Coupled processing Coupled memory

5 Pavan Balaji, Argonne National Laboratory Quantum mechanical interactions are near-sighted (Walter Kohn) P2S2 Workshop Panel (09/10/2012) Traditional quantum chemistry studies lie within the nearsighted range where interactions are dense: Future quantum chemistry studies expose both short- and long-range interactions: Range of interactions between particles Note that the figures are phenomenological. Quantum chemistry methods treat correlation using a variety of approaches and have different short/long- range cutoffs. distance Interaction strength Courtesy Jeff Hammond, Argonne National Laboratory

6 Pavan Balaji, Argonne National Laboratory Wind Turbine and Flight Blade Designs  Blades are getting larger with every new design –With larger blades, the additional lift or torque generated is from the outer regions of the blade –Air flow from far out regions of the blade has lesser computational intensity making the computation more “sparse” P2S2 Workshop Panel (09/10/2012)

7 Pavan Balaji, Argonne National Laboratory Decoupled Processing/Decoupled Memory (GPUs)  Pros: –A separate can be custom built for acceleration –Faster memory; better designed memory and memory controllers for acceleration  Cons: –Decoupled from the main processing unit P2S2 Workshop Panel (09/10/2012) Control Unit ALU Cache DRAM Regular CPU coresGPU cores Verdict

8 Pavan Balaji, Argonne National Laboratory Coupled Processing/Decoupled Memory  Pros: –Improved coupling of the processing units and memory allows for much faster synchronization –Separate memory allows for better optimized memory and memory controllers  Cons: –The need for data staging does not disappear P2S2 Workshop Panel (09/10/2012) CPU GPU CPU Memory GPU Memory CPU GPU CPU Memory GPU Memory Verdict

9 Pavan Balaji, Argonne National Laboratory General Purpose Processors with Accelerator Extensions  Pros: –Very fine-grained synchronization (no memory synchronization required; processing synchronization for power constraints)  Cons: –Unified memory means that specialization not possible (either in memory or in memory controllers) –Single die memory constraints P2S2 Workshop Panel (09/10/2012) Intel: MIC IBM: BG/Q Power Constrained Memory Consistency Tilera: GX Godson T Intel: SCC Dally: Echelon Extreme Specialization and Power Management Chien: 10x10

10 Pavan Balaji, Argonne National Laboratory Towards On-chip Instruction-level Heterogeneity  Vector units were a form of instruction-level heterogeneity –Some instructions use vector hardware, some don’t –Vector instruction units processed the same data that other units processed  Synchronization requirements –No memory staging requirements –Theoretically, accelerator units can fit into the same instruction pipeline as general purpose processing  But, there are some practicality constraints –Amount of acceleration is so high that not all hardware can be turned on at the same time (dark silicon with power gating will lead the way) So synchronization is not absent, but much more fine-grained (10s of cycles) –Compilers (with help from users – OpenMP, OpenACC) will have to do some work to coalesce hardware power-gating P2S2 Workshop Panel (09/10/2012) Verdict

11 Pavan Balaji, Argonne National Laboratory Summary  Accelerators are of different kinds – GPUs are just one example of it  Decoupled memory accelerators do not have much of a chance to survive because of data staging requirements –Fundamentally ill-suited for sparse/fine-grained computations –Caveat: LINPACK is not a fine-grained computation, so the Top500 might still boast a GPU-like machine  Fine-grained instruction-level heterogeneity is required –Many architectures are already going in that direction –BG/Q and Intel MIC’s planned roadmap are in that direction P2S2 Workshop Panel (09/10/2012)


Download ppt "Battle of the Accelerator Stars Pavan Balaji Computer Scientist Group lead, Programming Models and Runtime Systems Argonne National Laboratory"

Similar presentations


Ads by Google