Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 18, 2005 PACL and ASC Processor Research Overview 1 Research Overview Parallel and Associative Computing Group and the ASC Processor Group Kent.

Similar presentations


Presentation on theme: "November 18, 2005 PACL and ASC Processor Research Overview 1 Research Overview Parallel and Associative Computing Group and the ASC Processor Group Kent."— Presentation transcript:

1 November 18, 2005 PACL and ASC Processor Research Overview 1 Research Overview Parallel and Associative Computing Group and the ASC Processor Group Kent State University Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus), Michael Scherger, Wittaya Chantamas, Hong Wang Sabegh Singh Virdi, Shannon Steinfadt, Kevin Schaffer Department of Computer Science Kent State University Kent, Ohio

2 PACL and ASC Processor Research Overview2 November 18, 2005 Associative Models of Computation Parallel Runtime Environments Parallel and Associative System Software Parallel and Associative Applications Associative and Parallel Algorithms Parallel and Associative Research Group ASC Processor Research Group FPGA-Based ASC Processor MASC Processor Structure Codes, ASC-centric Implementations Pipelined ASC w/ Reconfigurable Network Multithreaded ASC Processor

3 PACL and ASC Processor Research Overview3 November 18, 2005 Presentation Outline Short Overview of Associative Models  The Single Instruction Stream ASC Model  The Multiple-Instruction Stream MASC Model Architectural Modeling and Runtime Environments  MASC Runtime Environments – Michael Scherger  Supporting Multiple Instruction Streams using the Manager- Worker Paradigm – Wittaya Chantamas ASC Processor Design  Scalable Pipelined ASC Processor with Reconfigurable PE Network to Support MASC – Hong Wang

4 PACL and ASC Processor Research Overview4 November 18, 2005 Presentation Outline Short Overview of Associative Models  The Single Instruction Stream ASC Model  The Multiple-Instruction Stream MASC Model Architectural Modeling and Runtime Environments  MASC Runtime Environments – Michael Scherger  Supporting Multiple Instruction Streams using the Manager- Worker Paradigm – Wittaya Chantamas ASC Processor Design  Scalable Pipelined ASC Processor with Reconfigurable PE Network to Support MASC – Hong Wang

5 PACL and ASC Processor Research Overview5 November 18, 2005 Associative Models of Computation Associative Computer: A SIMD computer with certain additional hardware features.  Features can be supported (less efficiently) in software by a traditional SIMD  The name “associative” is due to its ability to locate items in the memory of PEs by content rather than location. Uses associative features to simulate an associative memory The ASC model (for ASsociative Computing) identifies the properties assumed for an associative computer.

6 PACL and ASC Processor Research Overview6 November 18, 2005 The Associative Computing (ASC) Model cells

7 PACL and ASC Processor Research Overview7 November 18, 2005 Associative Properties of the ASC Model Broadcast data in constant time Constant time global reduction of  Boolean values using AND/OR  Integer values using MAX/MIN Constant time associative search Responder processing  An IS can detect if a data test is satisfied by any of its cells in constant time (i.e., any-responders)  An IS can select one arbitrary responder in constant time (i.e., pick- one) Above properties supported in hardware with broadcast and reduction networks References:  M. Jin, J. Baker, and K. Batcher, Timings of Associative Operations on the MASC model, Workshop of Massively Parallel Processing, IPDPS ’01.

8 PACL and ASC Processor Research Overview8 November 18, 2005 The MASC Model

9 PACL and ASC Processor Research Overview9 November 18, 2005 The MASC Model MASC (i.e., Multiple ASC) is a multiple ASC model Multiple SIMD model with more than one Instruction Stream (IS) Each IS can execute a separate data-parallel task  These threads execute to completion without interacting or interruption Dynamically reconfigurable  Each cell listens to only one IS  Cells can switch ISs, based on a data test.  Cells can switch between being active, inactive, or idle Each IS with its cells satisfy the ASC model Job/functional parallelism is used to control the ISs

10 PACL and ASC Processor Research Overview10 November 18, 2005 WEBSITE FOR PAPERS http://www.cs.kent.edu/~parallel Follow pointer to “papers”

11 PACL and ASC Processor Research Overview11 November 18, 2005 Presentation Outline Short Overview of Associative Models  The Single Instruction Stream ASC Model  The Multiple-Instruction Stream MASC Model Architectural Modeling and Runtime Environments  MASC Runtime Environments – Michael Scherger  Supporting Multiple Instruction Streams using the Manager- Worker Paradigm – Wittaya Chantamas ASC Processor Design  Scalable Pipelined ASC Processor with Reconfigurable PE Network to Support MASC – Hong Wang

12 PACL and ASC Processor Research Overview12 November 18, 2005 MASC Runtime Environment Designed extensions to the existing ASC instruction set to support multiple instruction streams  ISGEN compiler extension  Reference: Scherger, Michael, Jerry Potter, and Johnnie Baker, “Multiple Instruction Stream Control for an Associative Model of Parallel Computation", Proc. of the 16th International Parallel and Distributed Processing Symposium (Workshop in Massively Parallel Processing), April 2003. Developed a prototype MASC runtime environment using a cluster (proof of concept for multiple instruction streams)

13 PACL and ASC Processor Research Overview13 November 18, 2005 Parallel if-then-else with Instruction Stream Commands

14 PACL and ASC Processor Research Overview14 November 18, 2005 Shape Example

15 PACL and ASC Processor Research Overview15 November 18, 2005 Runtime Environment IS 0IS 1 compare shape == circle nonresponders -> IS 1 compute circle area noop compare shape == rect non-responders -> IS 2 compute rectangle area noop listen IS 0 IS 2 noop compare shape == triangle compute triangle area listen IS 1

16 PACL and ASC Processor Research Overview16 November 18, 2005 Presentation Outline Short Overview of Associative Models  The Single Instruction Stream ASC Model  The Multiple-Instruction Stream MASC Model Architectural Modeling and Runtime Environments  MASC Runtime Environments – Michael Scherger  Supporting Multiple Instruction Streams using the Manager- Worker Paradigm – Wittaya Chantamas ASC Processor Design  Scalable Pipelined ASC Processor with Reconfigurable PE Network to Support MASC – Hong Wang

17 Outline A review of MASC Computational Model using manager/worker paradigm and work pool of tasks Design and implementation of MASC back-end compiler for ASC language (an on going project) An overview of the MASC emulator (the next project)

18 MASC Computational Model Two types of ISs  one manager IS fork and join tasks manage work pool  a few worker ISs execute tasks A work pool of tasks

19 Outline A review of MASC Computational Model using manager/worker paradigm and work pool of tasks Design and implementation of MASC back-end compiler for ASC language (an on going project) An overview of the MASC emulator (the next project)

20 MASC Directive Concurrent data parallel executions of different paths in a branch can be achieved by using the directive /*.masc fork */ A user has a tight control  Not all different paths in branches will be executed concurrently  Only those in branches with directives will Considered as a comment by the ASC compiler (will show in.lst file, not show in.iob file) No need for a new ASC compiler in order to run an ASC program in MASC system

21 main test int parallel b[$], c[$], d[$]; logical parallel BCD[$]; associate b[$], c[$], d[$] with BCD[$]; read b[$] c[$] d[$] in BCD[$]; b[$] = c[$] + 2; c[$] = d[$] - 3; /* will be no fork here */ if (b[$].lt. c[$]) then b[$] = c[$]; d[$] = 4; else c[$] = b[$]; b[$] = d[$]; endif; c[$] = d[$]; d[$] = c[$]; end; M100 0000 W110 0000 M111 0000 M100 0000 W110 0000 a structure code.MI_BEGIN W1100000 beg_of_stmt 1c00 6 0 beg_read 5a00 SYSOT BCD B,C,D, beg_read 5a00 SYSOT BCD B,C,D,… beg_of_stmt 1c00 20 0 beg_of_stmt 1c00 20 0 mvpa_ 4812 C D mvpa_ 4812 C D.MI_END W1100000 M111 0000

22 main test int parallel b[$], c[$], d[$]; logical parallel BCD[$]; associate b[$], c[$], d[$] with BCD[$]; read b[$] c[$] d[$] in BCD[$]; b[$] = c[$] + 2; c[$] = d[$] - 3; /*.MASC FORK */ if (b[$].lt. c[$]) then b[$] = c[$]; d[$] = 4; else c[$] = b[$]; b[$] = d[$]; endif; c[$] = d[$]; d[$] = c[$]; end; M100 0000 W110 0000 M111 0000 W111 1000 W111 2000 W111 X100 M111 X110 a structure code.MI_BEGIN W1112000 beg_of_stmt 1c00 16 0 mvpa_ 4812 B C beg_of_stmt 1c00 17 0 mvpa_ 4812 B C beg_of_stmt 1c00 17 0 mvpa_ 4812 D B mvpa_ 4812 D B.MI_END W1112000 M100 0000 W110 0000 W111 1000 M111 0000 W111 X100 M111 X110 W111 2000

23 Outline A review of MASC Computational Model using manager/worker paradigm and work pool of tasks Design and implementation of MASC back-end compiler for ASC language (an on going project) An overview of the MASC emulator (the next project)

24 A MASC Emulator  A software that emulates exact MASC hardware ’s behavior on a PC  Thus, allows an ASC program to run on a PC computer as if the program were run on a MASC system  A modified version of the existing ASC emulator with built-in performance monitoring  Manager/worker paradigm and work pool idea will be implemented in the emulator MASC runtime system

25 PACL and ASC Processor Research Overview25 November 18, 2005 Presentation Outline Short Overview of Associative Models  The Single Instruction Stream ASC Model  The Multiple-Instruction Stream MASC Model Architectural Modeling and Runtime Environments  MASC Runtime Environments – Michael Scherger  Supporting Multiple Instruction Streams using the Manager- Worker Paradigm – Wittaya Chantamas ASC Processor Design  Scalable Pipelined ASC Processor with Reconfigurable PE Network to Support MASC – Hong Wang

26 PACL and ASC Processor Research Overview26 November 18, 2005 Outline of Talk ASC Processor (Work Mostly Complete)  Pipelined Architecture  Reconfigurable PE Interconnection Network  Processor and Network Performance MASC Architecture (Work in Progress)  Implementation of Task Manager and Instruction Stream  Sample Code  Architecture and Sample Execution Conclusion

27 PACL and ASC Processor Research Overview27 November 18, 2005 ASC Processor’s Pipelined Architecture We have implemented a pipelined SIMD Associative (ASC) Processor using Altera FPGAs Five single-clock-cycle pipeline stages are split between the SIMD Control Unit (CU) and the PEs  In the Control Unit Instruction Fetch (IF) Part of Instruction Decode (ID)  In the Scalar PE (SPE), in each Parallel PE (PPE) Rest of Instruction Decode (ID) Execute (EX) Memory Access (MEM) Data Write Back (WB)

28 PACL and ASC Processor Research Overview28 November 18, 2005 ID/EX Latch EX/MEM Latch MEM/WB Latch Data Memory Register File IF/ID Latch Instruction Memory Decoder Control Unit (CU) Sequential PE (SPE) Parallel PE (PPE) Array Immediate Data Broadcast Register Data Pipelined ASC Processor with Reconfigurable Interconnection Network

29 PACL and ASC Processor Research Overview29 November 18, 2005 Register File Data Switch Comparator ID/EX Latch Mask EX/MEM LatchMEM/WB Latch Data Memory MUX Processing Element (PE) Comparator implements associative search, pushes ‘1’ onto top of stack for responders, ‘0’ otherwise Top of mask of ‘0’ disables ID/EX Latch

30 PACL and ASC Processor Research Overview30 November 18, 2005 Pipelined ASC Processor’s Performance Our pipelined ASC Processor has been implemented an Altera APEX20KC1000 FPGA with 70 8-bit PEs  Other 8-bit processor cores implemented on this FPGA / speed grade have clock speeds ranging from 30 to 106 MHz, typically 60-68 MHz Our pipelined ASC Processor has a clock speed of 56.4 MHz, comparable with these other processors  With the 5-stage pipeline, our ASC Processor can approach a peak performance of 300 MHz

31 PACL and ASC Processor Research Overview31 November 18, 2005 Reconfigurable PE Interconnection Network Our pipelined ASC Processor also has a reconfigurable PE interconnection network Reconfigurable PE network allows arbitrary PEs in the PE Array to be connected via  Linear array (currently implemented), or  2D mesh (to be implemented soon) without the restriction of physical adjacency Each PE in the PE Array can  Choose to stay in the PE interconnection network, or  Choose to stay out of the PE interconnection network, so that it is bypassed by any inter-PE communication

32 PACL and ASC Processor Research Overview32 November 18, 2005 ID/EX Latch EX/MEM Latch MEM/WB Latch Data Memory Register File IF/ID Latch Instruction Memory Decoder Control Unit (CU) Sequential PE (SPE) Parallel PE (PPE) Array Immediate Data Broadcast Register Data Pipelined ASC Processor with Reconfigurable Interconnection Network

33 PACL and ASC Processor Research Overview33 November 18, 2005 Data Switch Register File Register Data (from SPE) Immediate Data (from CU) Left Neighbor Right Neighbor Top of Mask Stack Comparator & ID/EX Latch Reconfigurable Network Implementation Data switch  Passes register, broadcast, and immediate data to the PE and to its two neighbors  Routes data from the PE’s neighbors to its EX stage Reconfigurable network — supports Bypass Mode to remove the PE non-responders from the network  Will be needed by MASC Processor

34 PACL and ASC Processor Research Overview34 November 18, 2005 ASC Processor’s Network Performance Performance of ASC Processor degrades as number of PEs is increased with Bypass Mode present  Due to the long path from the first PE to the last PE in the PE array 4-PE ASC Processor requires 2152 LEs and runs at 56.4 MHz with Bypass Mode present  When the number of PEs is increased to 50, the clock frequency drops to 22 MHz In the future we hope to reduce this delay using a pipelined or other multi-hop architecture

35 PACL and ASC Processor Research Overview35 November 18, 2005 Outline of Talk ASC Processor (Work Mostly Complete)  Pipelined Architecture  Reconfigurable PE Interconnection Network  Processor and Network Performance MASC Architecture (Work in Progress)  Implementation of Task Manager and Instruction Stream  Sample Code  Architecture and Sample Execution Conclusion

36 PACL and ASC Processor Research Overview36 November 18, 2005 IDLE Task Manager Task_Allocation Wait_For_IS Join Call_TM Task_Execution IDLE Instruction Stream

37 PACL and ASC Processor Research Overview37 November 18, 2005 MASC PE Structure PE IS_TM_Chooser IS1IS2TM1TM2 ID Register

38 PACL and ASC Processor Research Overview38 November 18, 2005 IDLE Task Manager Task_Allocation Wait_For_IS Join Call_TM Task_Execution IDLE Instruction Stream TM ID IS ID

39 PACL and ASC Processor Research Overview39 November 18, 2005 Assembly Code Example. 101Parallel_Select_StartMem(110) 102 Pcase Condition1 Mem(104) 103 Pcase Condition2 Mem(107) 104 Case1 105 … 106 Parallel_Case_End 107 Case 2 108 … 109 Parallel_Case_End 110 Parallel_Select_End (note: This does not trigger JOIN, lack of tasks do).

40 PACL and ASC Processor Research Overview40 November 18, 2005 TM0 TM1 TM2 IS0IS1IS2 Task ManagersInstruction Streams PE0PE1PE2PE3PE4PE5

41 PACL and ASC Processor Research Overview41 November 18, 2005 TM0 TM1 TM2 Task Managers IS0 IS1IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 Originally All PEs listen to IS0

42 PACL and ASC Processor Research Overview42 November 18, 2005 TM0 TM1 TM2 Task Managers IS0 IS1IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 When Parallel Select is met, Task Manager takes over PEs 101Parallel_Select_StartMem(110)

43 PACL and ASC Processor Research Overview43 November 18, 2005 TM0 TM1 TM2 Task Managers IS0 IS1IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 TM then calls IS0 to perform 1 st task 102 Pcase Condition1 Mem(104) 104 Case1 105 …

44 PACL and ASC Processor Research Overview44 November 18, 2005 TM0 TM1 TM2 Task Managers IS0IS1 IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 TM then calls IS1 to perform 2 nd task 102 Pcase Condition2 Mem(107) 107 Case 2 108 … 102 Pcase Condition1 Mem(104) 104 Case1 105 …

45 PACL and ASC Processor Research Overview45 November 18, 2005 TM0 TM1 TM2 Task Managers IS0 IS1 IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 2 nd task finishes and gives control back to TM 107 Case 2 108 … 109 Parallel_Case_End 102 Pcase Condition1 Mem(104) 104 Case1 105 …

46 PACL and ASC Processor Research Overview46 November 18, 2005 TM0 TM1 TM2 Task Managers IS1 IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 1 st task finishes and gives control back to TM 104 Case1 105 … 106 Parallel_Case_End

47 PACL and ASC Processor Research Overview47 November 18, 2005 TM0 TM1 TM2 Task Managers IS0 IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 Control is back to the last finished IS which is IS0 110 Parallel_Select_End. IS1

48 PACL and ASC Processor Research Overview48 November 18, 2005 TM0 TM1 TM2 Task Managers IS0 IS1 IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 IS1 meets a nested parallel select code

49 PACL and ASC Processor Research Overview49 November 18, 2005 TM0 TM1 TM2 Task Managers IS0 IS1IS2 Instruction Streams PE0PE1PE2PE3PE4PE5 TM1 allocates the two tasks to IS1 and IS2 A = 2 C = AB = A Common Register

50 PACL and ASC Processor Research Overview50 November 18, 2005 Conclusion We have implemented a SIMD associative ASC Processor (on an FPGA) that combines the parallelism of SIMD architectures with the search capabilities of associative computing  Performance is improved by adding a 5-stage pipeline, split between the Control Unit and the PEs  Additional functionality is provided by a reconfigurable PE interconnection network Future work will include  Support for multiple Control Units (in progress)  Performance improvement to support more efficient broadcast to a large number of PEs


Download ppt "November 18, 2005 PACL and ASC Processor Research Overview 1 Research Overview Parallel and Associative Computing Group and the ASC Processor Group Kent."

Similar presentations


Ads by Google