Download presentation
Presentation is loading. Please wait.
1
Out of Order SuperScalar Ankit Sethia Daya Shanker Gaurav Chadha Kuldeep Singh
2
Basic Design Out of Order (T3) 2 way SuperScalar Number of RS – 16 Number of ROB – 64 (tested for 8 as well). PRF entry – 64 ALU – 2 Multiplier - 2 System Verilog used for the design process. Helpful in designing. We had just 5 synthesis runs. 2
3
Advanced Features 2 way superscalar Instruction Prefetcher Stride Prefetcher RAS Load Store Queue (4 loads, 4 stores) BTB, Local Branch Predictor Non-blocking D - Cache Attempted Features Unconditional branch resolution in IF stage. 3
4
LSQ Out of order load launch. After dependency resolution with preceding stores. Forwarding of data from store queue to load structure. Load structure is not a queue Auxiliary load queue for outstanding loads 4
5
DCache Handles Hit under Miss and Miss under Miss. Can support 16 outstanding load requests. Highest priority to eviction, followed by current request, followed by outstanding misses. Has the highest priority among requests to memory. 5
6
Features Contd. Heavy instruction Prefetching 60 at the max Varied a lot BTB/ Branch Predictor 2 bit local branch predictor 6
7
Features Contd. Unconditional branch resolution in IF-stage. Calculate the next PC for br/bsr in the IF-stage RAS Lot of difficulties in implementing RAS 7
8
Stride Prefetcher A data prefetching mechanism, which prefetches data from stride based access pattern. Can handle upto four loads. Keeps a table of four non-stride loads that may be present. 3 rd highest priority among requests to memory. 8
9
Results Final clock period after synthesis 6.7 ns All 33 benchmarks passed in simulation and synthesis CPI varies from 0.59 – 5.00 9
10
Results 10
11
Interesting Bugs In the I-cache the input address doesn’t change but the data has changed. so the fetching stops Eviction during branch squash - reason same reset. Speculative load with invalid address returns a continuous Nack from the cache-controller. 11
12
Suggestions System-Verilog was really helpful Always_comb (no inferred latches ), always_ff Don’t worry about wire, reg. Use logic type. Structures, multiple dimensional arrays, literal assignment. As queues are used a lot (ROB, LSQ, DCACHE, PREFETCHER). A robust queue could be given beforehand. Faced problems with bottom up synthesis. This could be made as a tutorial section. 12
13
13
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.