Presentation is loading. Please wait.

Presentation is loading. Please wait.

System Coordination Library (SCL) Framework Vikas Aggarwal Rafael Garcia Abraham Sanchez Philips Shih.

Similar presentations


Presentation on theme: "System Coordination Library (SCL) Framework Vikas Aggarwal Rafael Garcia Abraham Sanchez Philips Shih."— Presentation transcript:

1 System Coordination Library (SCL) Framework Vikas Aggarwal Rafael Garcia Abraham Sanchez Philips Shih

2 Challenges & Problems FPGAs and other devices (eg. Cell & GPUs) gaining popularity as accelerators  Lack of direct co-ordination amongst devices precludes usage as peers in massively parallel machines Development support for large-scale applications is lacking  Device design languages for FPGAs are migrating towards true HLL  Missing piece: System-level Coordination Library, extension to HLL Complete lack of inter-operability, several IDEs and devices gaining popularity in smaller domains  Standardization of communication, compatibility amongst different devices is highly desirable to capture larger user-base Lack of transition from Formulation phase to Design phase 2 2

3 Proposed Solution Design a System Coordination Library to facilitate coordination amongst heterogeneous set of devices  Provide a familiar coordination/communication interface to parallel program developers, employ MPI-like interfaces Standardize coordination primitives across different technologies  Provide a higher level of abstraction for communication  Allows applications to be more portable across changing platforms Life cycles of software are generally longer than the corresponding hardware version  Provide communication based on relevant communication infrastructure Build communication from bottoms up, employing existing work and effort like MPI, GenAPI etc. Provide a transition from Formulation phase to Design phase  Allow parallel programs to be expressed as task graphs  Provide a framework to auto-generate communication infrastructure based on mapping of tasks to different devices 3 3

4  Device design languages for FPGA devices are migrating upward in abstraction towards true HLL  Missing piece in Design layer is System Coordination Library, extension to HLL Formulation -- strategic design abstraction Formulation – prediction, tradeoff analysis Design – system coordination language Design – device design languages Design -- library reuse (modules, cores) Translation Execution FPGA devices (e.g. Stratix-II/III, Virtex-4/5) x86 Cell etc. AccelDSP Carte C Impulse C VHDL Gedae etc. … DARPA Study – Quick Glance 4 F D T E

5 Programming Modularity Num Micro-tasks : Task 1 : foo_scl Task 2 : bar_scl Target: FPGA IDE: Handel- C... Num Micro-tasks : Task 1 : foo_scl Task 2 : bar_scl Target: FPGA IDE: Handel- C... Target: FPGA_Virtex5 IDE: Accel Target: FPGA_Stratix3 IDE: Impulse Target: Cell IDE: Gedae Changing 20 task mappings means 40 line changes in SCL MPI would require rewriting the entire program 5

6 6 Bigger Picture Formulation enables abstract modeling of algorithms  Allows decomposition of apps into constituent tasks  Allows automated performance prediction for a particular algorithm decomposition Missing Components  Multi-FPGA applications still present a major development bottleneck  Automated grouping & mapping of tasks onto resources provide tremendous benefits Several techniques have reaped benefits of automated DSE in conventional computing  Bridging Formulation and Design phases Providing automatically generated framework for communication between tasks Example RCML model of a conceptual application Corresponding task graph of application Auto-generation of communication Infrastructure using the mapping information Suggested mapping of tasks on resources 6

7 Why not MPI? MPI not meant for future heterogeneous architectures  A more resource specific and not a generic approach is required in such cases Need ability of having tasks developed in different IDEs to be able to talk to each other  Facilitates programming modularity SCL leverages MPI concepts where applicable  Sends/receives in SCL task definition will look very familiar to MPI developers Bridging from the Formulation phase: SCL allows developers to leverage efforts from prior stage 7

8 8

9 Basic Definitions SCL Task: Finest unit of computation in SCL Task definition code: Implements the computational part of a task in a DDL Task graph: Defines tasks graph, by describing the tasks and the communication between them Mapping: Provides mapping of tasks onto devices Architectural Model Programming Model SCL Device: finest granularity of computational resource that can execute one or more task and has a unique address within a platform SCL Platform/Node: a set of SCL compliant devices connected together by some underlying topology into a single uniquely addressable entity in the system SCL System: a set of platforms connected together by some underlying topology SCL Resource graph: maintains information about all devices and platforms in the system with their interconnection FFT H(f) IFFT FFT H(f) IFFT 9

10 Co-ordination Using SCL Intra-device-level coordination: coordination between tasks within a single device  Two tasks mapped to a single FPGA or two SPEs of a single Cell Intra-platform-level coordination : coordination between tasks on different devices on a single platform  Coordination between a Nallatech board and its host processor System-level coordination - coordination between tasks mapped on different platforms  A Nallatech board communicating with a PS3 and a Gidel board SCL Compliance : to support coordination at above levels of hierarchy A device is SCL compliant if  It can support communication between multiple tasks mapped onto the same device,  And provides some mechanism for specifying communication with the platform A platform is SCL compliant if  It is composed of SCL compliant devices,  And can support communication between tasks running on different SCL-compliant devices within the platform,  And provides some mechanism for specifying communication external to the platform A system is SCL compliant if  It is composed of SCL compliant platforms,  And can support communication between multiple SCL-compliant platforms 10

11 Communication using Hierarchy Hierarchical addressing  Each platform has a unique “platform address” in the system  Each device has a unique “device address” in its platform and hence in the system Use of address to build communication structure  SCL Resource graph Contains knowledge of the SCL compliant resources available in the system in hierarchical manner SCL parser will use info. from the graph to find appropriate communication routines Communication constructs will be auto-stitched in the task definition code Given a task graph of the application and a resource graph for the system, a mapping of tasks onto devices is required to run the application C C F F Interconnect F F F F C C F F Cell C C GPU C C D1 Platforms Devices D1 System D1 D2 D1 D2 D3 D2 P1 P2 P3 P4 P5 11

12 Quick Peek: Example SCL_Init( … ); for (unsigned i=0; i < 100; i++) { int x = rand(); scl_send( "out1", &x, … ); } SCL_Init( … ); for (unsigned i=0; i < 100; i++) { int x = rand(); scl_send( "out1", &x, … ); } Num Micro-tasks : Task 1 : random Target: x86 IDE: C++ Address: Library: Task 2 : process Target: FPGA IDE: Handel- C Address:... Num Micro-tasks : Task 1 : random Target: x86 IDE: C++ Address: Library: Task 2 : process Target: FPGA IDE: Handel- C Address:... Edge edge1; Task random ( Out out1 ) { edge1 = out1; } Task process ( In in1 ) { in1 = edge1; } Edge edge1; Task random ( Out out1 ) { edge1 = out1; } Task process ( In in1 ) { in1 = edge1; } SCL_Init( … ); int acc=0; for (unsigned i=0; i < 100; i++) { int temp; scl_receive( "in1", &temp, … ); acc += temp; } SCL_Init( … ); int acc=0; for (unsigned i=0; i < 100; i++) { int temp; scl_receive( "in1", &temp, … ); acc += temp; } tasks.map systemApp.scl process.handelC random.cpp process.impulseC Architecture dependent IDE Architecture Independent System-level Coordination Tasks to resource Mapping  Defines application as a task graph  Define communication between tasks as edges in the task graph Generate random numbers Process numbers A B edge1 12

13 Compilation Process Step1 : Parse task-graph in “.scl” file  Gather information about “communication edges” from.scl file  Definition for “SCL_” functions will be populated with one entry for each edge at a later stage  In future, could also provide a script to add partially auto-generated functionality for legacy code in existing languages Step 2 : Reading “.map” file  Parser would extract the information from the.map file about the mappings of various tasks  Definition of “SCL_” functions is auto-generated based on this mapping information Step 3: Build tasks in their native build environment  Definition for SCL functions is linked to the definition generated in previous step Run-time service responsible for spawning tasks/(could be a manual process in the beginning) 13

14 Basic Co-ordination Primitives Identify baseline functions to support basic communication in the initial phase Identify necessary static and run-time parameters Focus on synchronous blocking communication based on message passing(dominant mode of communication in MPI) Consider other modes wherever applicable to facilitate efficient data transfer  Shared memory constructs for data movement within a platform  Streaming communication model – for systems capable of supporting this mode 14

15 Challenges Mapping from tasks to device requires a static-compile time behavior  # of processes and communication is statically defined at compilation  Is it over restrictive? – majority of applications follow a well-behaved structure Static task graphs are a well studied problem Re-compilation required in most cases when mapping changes or number of tasks changes – explore ways to minimize such situations Allow for changing the task graph by changing parameters in.scl file in acceptable cases  Provision of loops to accommodate variable number of tasks in the graph System should allow for post-compile time scaling on homogeneous node 15

16 16

17 SCL Parser Requirements Basic grammar to define SCL task graph language  SCL_FILESCL_CONSTRUCTARITH_OPEDGE_ASSIGNMENT EXPREDGE_DECLARATIONPORT_TYPE TASK_HEADER TASK EDGE_TYPE LOOPTASK_DEFINITION LOOP_EXPR Build abstract syntax tree and extract edge & task information Generate platform-specific code that implements specified communication behavior 17

18 SCL Parser reads task graph definition  Finds all tasks  Determines communication SCL Code Generator reads.map file  Determines resource mapping  Implements SCL calls in native platform code SCL Parser Design 18

19 Eclipse Using Eclipse environment to develop the SCL parser Compatible with other HPCSA tools  Allows easier integration with other tools/entry points RCML, PTP Portable across most operating systems  Windows, Linux, Mac OS X Graphical editing environment Easy plug-in based integration 19

20 Eclipse-based framework for developing Domain- Specific Languages (DSL) DSL: small specialized languages used to raise the abstraction level of software Removes extraneous programming details Provides for simplified specification Features  Allows specification of the grammar, creates a parser  Generates a complete Eclipse text editor Syntax coloring, Syntax checking / Error markers Code completion Navigation, Folding Outline, Find References 20

21 SCL Environment Console Text Editor Project Files Outline view 21

22 Graphviz Converts textual descriptions of graphs into diagrams Aids in design and verification of task graphs  Textual description is automatically derived from user’s design and converted into Graphviz language digraph edge_map { P1 -> C1 [ label = "E1" ]; P2 -> P1 [ label = "E2" ]; G1 -> P1 [ label = "E3" ]; } 22

23 Simple SCL example Installation  Download self-extracting SCL plugin and extract into Eclipse plug-in directory Project setup  Open Eclipse->File->New Project->Xtext DSL Wizards->SCL Project Project specification  Describe SCL task graph in the model.scl file  Create and specify model.map file Task graph parse & code generation  Run the.oaw file Verification  View Graphviz diagram and verify proper task graph description Compilation & Execution  Compile task definition code & execute application 23

24 Proof of Concept – Building First App Initial emphasis: SCL coordinating computing on two different platforms selected from heterogeneous suite (FPGA, CPU, GPU, etc.)  Feature FPGA as superior device technology  Multi-FPGA platform – Gidel board with a host CPU Development environments  Impulse C, VHDL – for FPGA  C++ – for processors Multi-FPGA platform Applications  Target tracking application using multi-fpga design 24

25 F1 C1 F2 F3 F4 F1 C1 F2 F3 F4 CF1 CF2 E1 E2 E3 BE1 edge CF1, CF2 ; task C1 ( output out1, input in1 ) { in1 = CF2 ; CF1 = out1 ; } edge CF1, CF2 ; task C1 ( output out1, input in1 ) { in1 = CF2 ; CF1 = out1 ; } edge E2, E3 ; taskId t[2] ; loop(i=2; i<=3; i++) ( t[$i] = $i ; task F$i( output out1, input in1, input in2) { in1 = BE1 ; in2 = E$i ; E$(i-1) = out1 ; } edge E2, E3 ; taskId t[2] ; loop(i=2; i<=3; i++) ( t[$i] = $i ; task F$i( output out1, input in1, input in2) { in1 = BE1 ; in2 = E$i ; E$(i-1) = out1 ; } edge E1 ; bedge BE1 ; task F1 ( output out1, output out2, input in1, intput in2) { in1 = CF1 ; in2 = E1 ; CF2 = out1 ; BE1 = out2 ; } edge E1 ; bedge BE1 ; task F1 ( output out1, output out2, input in1, intput in2) { in1 = CF1 ; in2 = E1 ; CF2 = out1 ; BE1 = out2 ; } C1 F2/F3 F1 Target tracking – Task Graph 25

26 Gidel Board Architecture Data  Register  DDRII Memory Communication Path  PCI Bus -> Local Bus (CPU)  Neighbor Bus (Unicast FPGA)  Main Bus (Broadcast FPGA) Assumptions  Send and receive block until they have done communicating  CPU and FPGA only execute one communication process at a time  Single task executing on each FPGA 26

27 CPU to FPGA: Memory to Register CPU SEND: Move data from working space to a “Send Buffer” Communication: Structures in software and hardware move data its homologous “Receive Buffer Register” FGPA Receive: Move the data from “Receive Buffer Register” to working registers 27

28 FPGA to CPU: Register to Memory FPGA Send: Move data from the working register to its “Send Buffer Register” Communication: Structures in hardware and software transfer data to its homologous “Received Buffer” in CPU main memory CPU Receive Move data from the “Received Buffer” to the right working space in main memory 28

29 FPGA-FPGA: Register-Register FPGA Send: Move data from the working register to its “Send Buffer Register” Communication : Hardware structures will transfer the information to its homologous “Received Buffer Register” on the other core FPGA Receive: Move data from “Receive Buffer Register” to right working register 29

30 FPGA-FPGA: Register-Register Broadcast FPGA Send: Move data from working register to “Send Buffer Register” Communication: hardware will transfer data to homologous “Received Buffer Register” on the other core FPGA Receive: Move data from “Receive Buffer Register” to working register 30

31 CPU-FPGA: Memory-Memory Transfer to FPGA fixed memory location CPU Send: Move data from the working spaces in memory to the “Send Buffer” Communication: Software and hardware structure move data to the fixed memory location on the FPGA memory FPGA Receive: No data movement, just handshake and blocking until process is completed 31

32 FGPA-CPU: Memory-Memory Transfer from FPGA fixed memory location FPGA Send: No data movement, just handshake and blocking until process is completed Communication: Software and hardware structure Move data to buffer from fixed location on the FPGA memory CPU Receive: Moves data from “Receive Buffer” to the Working spaces in main memory 32

33 CPU to FGPA Memory to Memory: FPGA Receive Two control signals:  FPGA is ready for communication  CPU has completed communication Use Signals XNOR  Avoid signal reset and extra communication  Equal value means the communication is complete  Different values means is ready to communicate 33

34 34

35 Demo Slides 35


Download ppt "System Coordination Library (SCL) Framework Vikas Aggarwal Rafael Garcia Abraham Sanchez Philips Shih."

Similar presentations


Ads by Google