Presentation is loading. Please wait.

Presentation is loading. Please wait.

Closely-Coupled Timing-Directed Partitioning in HAsim Michael Pellauer † Murali Vijayaraghavan †, Michael Adler ‡, Arvind †, Joel.

Similar presentations


Presentation on theme: "Closely-Coupled Timing-Directed Partitioning in HAsim Michael Pellauer † Murali Vijayaraghavan †, Michael Adler ‡, Arvind †, Joel."— Presentation transcript:

1 Closely-Coupled Timing-Directed Partitioning in HAsim Michael Pellauer † pellauer@csail.mit.edu Murali Vijayaraghavan †, Michael Adler ‡, Arvind †, Joel Emer †‡ † MIT CS and AI Lab Computation Structures Group ‡ Intel Corporation VSSAD Group To Appear In: ISPASS 2008

2 Motivation We want to simulate target platforms quickly We also want to construct simulators quickly Partitioned simulators are a known technique from traditional performance models: ISA Off-chip communication Micro-architecture Resource contention Dependencies Interaction Simplifies timing model Amortize functional model design effort over many models Functional Partition can be extremely FPGA-optimized Timing Partition Timing Partition Functional Partition Functional Partition

3 Different Partitioning Schemes As categorized by Mauer, Hill and Wood: Source: [MAUER 2002], ACM SIGMETRICS We believe that a timing-directed solution will ultimately lead to the best performance Both partitions upon the FPGA

4 Functional Partition in Software Asim Get Instruction (at a given Address) Get Dependencies Get Instruction Results Read Memory * Speculatively Write Memory * (locally visible) Commit or Abort instruction Write Memory * (globally visible) * Optional depending on instruction type

5 Execution in Phases FDXRCFDXWCWFDXC The Emer Assertion: All data dependencies can be represented via these phases FDXRA FDXXCW

6 Detailed Example: 3 Different Timing Models Executing the same instruction sequence:

7 Functional Partition in Hardware? Requirements Support these operations in hardware Allow for out-of-order execution, speculation, rollback Challenges Minimize operation execution times Pipeline wherever possible Tradeoff between BRAM/multiport RAMs Race conditions due to extreme parallelism

8 Functional Partition As Pipeline Conveys concept well, but poor performance Token Gen DecExeMemLCom GComFet Timing Model Memory State Register State RegFile Functional Partition

9 Implementation: Large Scoreboards in BRAM Series of tables in BRAM Store information about each in-flight instruction Tables are indexed by “token” Also used by the timing partition to refer to each instruction New operation “getToken” to allocate a space in the tables

10 Implementing the Operations See paper for details (also extra slides)

11 Assessment: Three Timing Models Unpipelined Target MIPS R10K-like out-of-order superscalar 5-Stage Pipeline

12 Assessment: Target Performance Targets have idealized memory hierarchy

13 Assessment: Simulator Performance Some correspondence between target and functional partition is very helpful

14 Assessment: Reuse and Physical Stats Where is functionality implemented: FPGA usage: DesignIMemProgram Counter Branch Predictor Scoreboard/ ROB Reg File Maptable/ Freelist ALUDMemStore Buffer Snapshots/ Rollback Functional Partition UnpipelinedN/A 5-StageN/A Out-of-Order Unpipelined5-stageOut of Order FPGA Slices6599 (20%)9220 (28%)22,873 (69%) Block RAMs18 (5%)25 (7%) Clock Speed98.8 MHz96.9 MHz95.0 MHz Average FMR41.17.4915.6 Simulation Rate2.4 MHz14 MHz6 MHz Average Simulator IPS 2.4 MIPS5.1 MIPS4.7 MIPS Virtex IIPro 70 Using ISE 8.1i

15 Future Work: Simulating Multicores Scheme 1: Duplicate both partitions Scheme 2: Cluster Timing Parititions Timing Model A Timing Model A Func Reg + Datapath Func Reg + Datapath Timing Model B Timing Model B Func Reg + Datapath Func Reg + Datapath Func Reg + Datapath Func Reg + Datapath Timing Model C Timing Model C Func Reg + Datapath Func Reg + Datapath Timing Model D Timing Model D Functional Memory State Functional Memory State Timing Model A Timing Model A Timing Model B Timing Model B Timing Model C Timing Model C Timing Model D Timing Model D Functional Reg State + Datapath Functional Reg State + Datapath Functional Memory State Functional Memory State Interaction occurs here Interaction still occurs here Use a context ID to reference all state lookups

16 Future Work: Simulating Multicores Scheme 3: Perform multiplexing of timing models themselves Leverage HASim A-Ports in Timing Model Out of scope of today’s talk Timing Model D Timing Model D Functional Reg State + Datapath Functional Reg State + Datapath Functional Memory State Functional Memory State Interaction still occurs here Use a context ID to reference all state lookups Timing Model C Timing Model C Timing Model B Timing Model B Timing Model A Timing Model A

17 UT-FAST is Functional-First This can be unified into Timing-Directed Just do “execute-at-fetch” Future Work: Unifying with the UT-FAST model Func Partition Func Partition Timing Partition Timing Partition Emulator Ø Ø Ø Ø functional emulator running in software FPGA execution stream resteer execution stream resteer functional emulator running in software

18 Summary Described a scheme for closely-coupled timing- directed partitioning Both partitions are suitable for on-FPGA implementation Demonstrated such a scheme’s benefits: Very Good Reuse, Very Good Area/Clock Speed Good FPGA-to-Model Cycle Ratio: Caveat: Assuming some correspondence between timing model and functional partitions (recall the unpipelined target) We plan to extend this using contexts for hardware multiplexing [Chung 07] Future: rare complex operations (such as syscalls) could be done in software using virtual channels

19 Questions? pellauer@csail.mit.edu

20 Extra Slides pellauer@csail.mit.edu

21 Functional Partition Fetch

22 Functional Partition Decode

23 Functional Partition Execute

24 Functional Partition Back End

25 Timing Model: Unpipelined

26 5-Stage Pipeline Timing Model

27 Out-Of-Order Superscalar Timing Model


Download ppt "Closely-Coupled Timing-Directed Partitioning in HAsim Michael Pellauer † Murali Vijayaraghavan †, Michael Adler ‡, Arvind †, Joel."

Similar presentations


Ads by Google