Presentation is loading. Please wait.

Presentation is loading. Please wait.

RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Similar presentations


Presentation on theme: "RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing."— Presentation transcript:

1 RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing Laboratory University of California, Berkeley

2 RAMP Gold Overview Tiled CMP simulator ISA: SPARC V8 – (ARM/Thumb-2 later?) Split timing and function (both on FPGA) Host-multithreaded Runs on V5LX110T (XUP) Par Lab InfiniCore Functional Model Pipeline Arch State Timing Model Pipeline Timing State

3 RAMP Gold Target Machine SPARC V8 CORE SPARC V8 CORE I$ D$ DRAM Shared L2$ / Interconnect SPARC V8 CORE SPARC V8 CORE I$ D$ SPARC V8 CORE SPARC V8 CORE I$ D$ SPARC V8 CORE SPARC V8 CORE I$ D$ … 64 cores

4 RAMP Gold v1 Target Features 64 single issue in-order SPARCv8 processors – Simple, 5-stage pipeline – FPU Cache Timing model – Configurable size, line size, associativity, miss penalty, shared/private – Change parameters without resynthesis

5 RAMP Gold Architecture Mapping the target machine directly to an FPGA is inefficient Solution: split timing and functionality + Multithreading – The timing logic decides how many target cycles an instruction sequence should take – Simulating the functionality of an instruction might take multiple host cycles

6 Function/Timing Split Advantages Flexibility – Can configure target at runtime – Synthesize design once, change target model parameters at will Efficient FPGA resource usage – Example 1: model a 2-cycle FPU in 10 host cycles – Example 2: model a 16MB L2$ using only 256KB host BRAM to store tags/metadata Enables multithreading

7 Split Timing and Function Functional model executes ISA correctly Timing model determines how long a program takes to run CPU L1 D$ MEM = Target Machine CPU FM MEM FM Functional ModelTiming Model CPU TM L1 D$ TM MEM TM L1 D$ FM +

8 Functional model executes ISA correctly Timing model determines how long a program takes to run CPU L1 D$ MEM CPU FM MEM FM = Target MachineFunctional ModelTiming Model CPU TM L1 D$ TM MEM TM + Split Timing and Function

9 TM + FM from 30,000 ft CPU Timing Model CPU Timing Model L1 D$ Timing Model CPU Functional Model CPU Functional Model Memory Timing Model Memory Timing Model Memory Functional Model Memory Functional Model instruction ld/st address store data ld/st addressstall load data ld/st address store data stall instruction complete

10 TM + FM from 3,000 ft Memory Timing Model Memory Timing Model Memory Functional Model Memory Functional Model instruction ld/st address, store data ld/st addressstall load data ld/st address, store data stall instruction complete CPU TM IF CTRL DEC EX MEM WB CPU FM TM1 TM2 L1 D$ TM

11 Example: Target Load Miss Memory Timing Model Memory Timing Model Memory Functional Model Memory Functional Model instruction ld/st address, store data ld/st addressstall load data ld/st address, store data stall instruction complete CPU TM IF CTRL DEC EX MEM WB CPU FM TM1 TM2 L1 D$ TM 1 1 2 2 3 3 4 4 4 4 4 4 5 5 6 6 7 7

12 Timing-Driven Host Pipeline TS IF DE EX WB MEM2 TM1 TARGET MEMORY TM/FM TM2 TM3 L1 D$ TM MEM1 Store Buffer Load Result Buffer CPU/D$ Timing Model CPU Functional Model {TID,INST}{TID,ADDR} T0T1T2 ADDLDST LD ADD

13 Cache Modeling The cache model maintains tag, state, protocol bits internally Whenever the functional model issues a memory operation, the cache model determines how many target cycles to stall … tag index offset tag, state = = = = = = hit/miss associativity

14 Multithreaded, Pipelined Cache TM tag, state = = Address tag, state = = = = Index hit?

15 Quick & Dirty Validation 32KB, 2-way L1 D$, 64B lines 256KB, 4-way L2$, 64B lines

16 Status Functional + simple timing model work in HW – Running real programs (e.g. SPLASH2) Near term future work – Move from current “functional-first + stall” configuration to timing-driven described here – More interesting memory system timing model – Functional potpourri (FDIV, MMU, …)

17 DEMO Run OCEAN with different L1 D$ parameters

18 Questions? Thank you!


Download ppt "RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing."

Similar presentations


Ads by Google