Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transforming a FAST simulator into RTL implementation Nikhil A. Patil & Derek Chiou FAST Research group, University of Texas at Austin 1.

Similar presentations


Presentation on theme: "Transforming a FAST simulator into RTL implementation Nikhil A. Patil & Derek Chiou FAST Research group, University of Texas at Austin 1."— Presentation transcript:

1 Transforming a FAST simulator into RTL implementation Nikhil A. Patil & Derek Chiou FAST Research group, University of Texas at Austin 1

2 Outline Research Goal Motivation Quick introduction to FAST Going from FAST to RTL – Data-path – Microcode Compiler – Golden Models – Optimizing to single-cycle Benefits Conclusions 2

3 Research Goal Simplify the design, development, and verification of computer systems Significantly reduce overall architecture, RTL, verification, software effort Eliminate wasted work; enable code-reuse 3

4 Motivation Information duplication in traditional design flow Architectural Simulator RTL Verification Low Accuracy Software Simulator Compiler Synthesis Flow Software 4

5 Pre-silicon S-RTL Bugs in Pentium 4 Bob Bentley, “Validating the Intel® Pentium® 4 Microprocessor”, DAC

6 Vision of an ideal design flow Architectural & Micro-architectural Specification Architectural Simulator RTLVerificationSoftware Shared specification reduces information duplication 6

7 Vision of an ideal design flow Single central source (“code-base”) for all of the following: – Architectural studies – Micro-architectural tuning – RTL implementation – RTL level power modeling – RTL Verification – Software development Note: For now, we don’t address anything beyond synthesizable RTL (physical design, etc.) 7

8 Overview of FAST 8

9 Points to note about FAST FM is ISA specific, but micro-architecture agnostic – Trace sent from FM to TM is ISA-specific, not micro-architecture specific; e.g., x86 opcode, not x86 microcode TM implements a (potentially inaccurate) microcode table to “decode” the meaning of the trace – For a simpler ISA, table is an identity mapping Currently, our FM can model x86 and PowerPC targets TM written in Bluespec SystemVerilog TM is composed of modules connected with FAST Connectors, that manage latency, throughput and buffering (built upon the theory of Asim A-Ports) FAST methodology itself does not introduce any inherent inaccuracies; all inaccuracies are due to lower fidelity models (or bugs) 9

10 Vision for FAST Single central codebase will be comprised of the following three sub-modules: – ISA simulator (C/C++) – Micro-op definition (C/C++) – Micro-architectural definition (Bluespec/C) Note that the information contained in each is mutually exclusive – Eliminates possibility of inconsistency 10

11 From FAST to RTL Add data-paths to the timing model – ALU, cache data-stores, forwarding paths Magically move the ISA from the FM to TM Detach trace-buffers; use internal data-path  TM module, improve fidelity 100% fidelity, we have a Golden model  TM module, improve host/target-cycle ratio 1:1 h/t-cycle ratio, we have RTL – Will need changes to FAST connector 11

12 Caveats Fidelity of the simulation models is transferred to the implementation Depending on the model fidelity, it may or may not be possible to run actual software on the implementation Use software that uses only the subset of features supported with 100% fidelity; e.g.: – Self-modifying code – Unaligned accesses 12

13 From FAST to RTL Add Data-path Add Functionality Detach trace-buffers Improve fidelity Improve host performance 13

14 Data-path Assuming a sufficiently high fidelity model: Adding data-path does not change the module interfaces significantly It is simple enough to do manually (TASK) This process can sometimes unearth fidelity bugs in the simulator; e.g., not accounting for limited number of ports on a register file The data-path can be trivially removed for simulation flows Data-path also needed for power modeling of certain modules `if `DATA_PATH == 1 typedef Bit#(32) Data_t; `else typedef Bit#(0) Data_t; `end struct { Bool write; Addr_t addr; Data_t data; } DCacheReq_t 14

15 Functionality ISA simulation (in FM) can be summarized as: – Fetch: fetch instructions, advancing PC Modeled in the TM already (with very high fidelity) – Decode: identifies an instruction with a function Not modeled in TM at all Can be written manually or auto-generated (TASK) – Execute: calls the function Corresponds to target microcode and data-path Microcode needs to be made 100% accurate (TASK) 15

16 Microcode Compiler Microcode Compiler (MCC) maps each instruction onto one or more micro-ops Takes two software (C/C++) simulators as it’s input: – ISA simulator (currently, bochs) – Micro-op simulator Compiles the specification of each instruction/micro- op into a data-flow graph Uses exhaustive search to statically map instruction execution onto one or more micro-ops based on a cost table In case of a failure, says why a mapping is not possible Work in progress 16

17 From FAST to RTL Add Data-path √ Add Functionality √ Detach trace-buffers  TM module, improve fidelity 100% fidelity, we have a Golden model  TM module, improve host/target-cycle ratio 1:1 h/t-cycle ratio, we have RTL – Will need changes to FAST connector 17

18 Golden models A 100% cycle-accurate model May still take multiple FPGA cycles to model a single target cycle It is in fact a legitimate implementation Serves as a golden reference model for the next step (optimization) as well as for writing and debugging verification suites Traditionally, verification teams have written golden models from the architectural specs Likely to use FPGA structures efficiently 18

19 Optimizing to single-cycle Automatic transformation of modules may be possible for some simple modules using algorithms to – Unroll a “loop” in hardware – Collapse a multi-state FSM into a single state Can Bluespec help here? Manual optimization is certainly feasible Currently, FAST Connectors don’t allow this optimization (TASK) – Connector interface cannot support modules that take exactly 1 host cycle for every target cycle – Work in progress 19

20 From FAST to RTL Add Data-path √ Add Functionality √ Detach trace-buffers √  TM module, improve fidelity √ 100% fidelity, we have a Golden model  TM module, improve host/target-cycle ratio √ 1:1 h/t-cycle ratio, we have RTL – Will need changes to FAST connector 20

21 Alternative path Design the original TM modules as 1-host- cycle implementations Automatically convert to n-host-cycle for the simulator – Using Bluespec? Without automatic conversion, we would end up with RTL before FAST simulator! – Almost like prototyping 21

22 Potential benefits Provides a way to verify FAST simulators Golden models can be generated for the verification teams – Verify resulting implementation Provide working implementation to RTL designers – Replace one component at a time – Provides a test-rig – Runs software Improves communication between teams Eliminates SIM-RTL calibration Potentially faster than the simulator – Early versions can be made available to software team 22

23 Conclusions This technology provides a way to use a “single codebase” to meet a variety of needs from Simulation to Implementation to Verification. Single central codebase will be comprised of the following three sub-modules: – ISA simulator (C/C++) – Micro-op definition (C/C++) – Micro-architectural definition (Bluespec/C) 23

24 24


Download ppt "Transforming a FAST simulator into RTL implementation Nikhil A. Patil & Derek Chiou FAST Research group, University of Texas at Austin 1."

Similar presentations


Ads by Google