Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton.

Similar presentations


Presentation on theme: "Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton."— Presentation transcript:

1 Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton

2 Motivation Emerging hybrid applications & architectures: Many codes & kernels being ported to hybrid CPU/GPU clusters “Production”-level applications not yet (?) Other accelerated systems being deployed/developed: FPGA-accelerated systems Traditional Convey systems Cray XMT system: x86 front-end nodes + multithreaded compute nodes 2

3 Motivation (cont.) 3

4 Motivating Applications 4 Molecular dynamics GPU-based kernels being developed at PNNL Evolving into more sophisticated/complete hybrid MD application CCSD(T) GPU-based kernels also being developed Cray XMT-based power grid contingency analysis application: Dynamic power flow-based topology analysis on multithreaded processors to determine non-critical elements Numerical solution of critical contingency cases on x86-based cluster

5 Motivating Applications (cont.) 5 “N-x” CxNCxN CxNbCxNb non-critical elements Dynamic Topology Analysis Western Power Grid “N-5” C 5 17000 ~= 10 20

6 Objective Provide a device independent interface to access accelerators and execute computations on them. e.g.: Graphics Processor Units (GPUs) Field-Programmable Gate Arrays (FPGAs) Cray XMT 6

7 Proposed SIDL Framework Initialize() Initializes the connection between accelerator and local host opaque AllocMem() Allocate Memory on the accelerator and returns a pointer to it opaque ReadMem( opaque m) Synchronous - Reads data from the accelerator and copies it to local buffer WriteMem(opaque m, opaque data) Synchronous - Copy data from local buffer to accelerator memory Exec() Synchronous - Executes the computation kernel on the accelerator int AsyncExec() Asynchronous – Executes the computation kernel on the accelerator and returns a handle 7

8 Proposed SIDL Framework AsyncWriteMem(opaque m, opaque data) Asynchronous – Issue request to copy data from local buffer to accelerator memory opaque AsyncReadMem (opaque m) Asynchronous – Issue a request to read data from accelerator and copy it to local buffer Wait() Wait for all the asynchronous data transfer operations to complete WaitExec(int handle) Wait for asynchronous execution (identified by the handle) to complete 8

9 Implementation Challenges Needs to differentiate between on board vs “farther away” accelerators. Init – will need to set up a connection between host and accelerator How do we start the execution of a kernel (move data) on an accelerator? Heavily system and platform dependent Good to hide behind a component interface! 9


Download ppt "Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton."

Similar presentations


Ads by Google