2 Generic Superscalar Processor Models FetchRename Wakeup select Regfile FU bypass D-cache execute commit FetchRename ROB FU bypass D-cache execute commit Reg Wakeup select Issue queue based Reservation based (already studied) Revised from Paracharla PhD thesis 1998 schedule
3 Issue Queue Based Pipeline Fetch->Rename->Issue->Reg-read-> Execute- >Writeback/Commit Core structure: register mapping table Rename: translate architectural registers into physical registers Issue: send instruction out to register read and then execution Commit: Process mis-prediction/exception, update register renaming Why study? Used in Alpha 21264, MIPS R10000, Intel P4
4 Compare Reservation Station and Issue Queue Pipeline Stage Sequence 1. RS: IF -> REN -> REG/ROB->SCHD->… 2. IQ: IF -> REN -> SCHD -> REG ->… Mapping Table vs. Status Table 1. RS: Status table chooses architectural register or ROB 2. IQ: Always renames to a physical register Register file 1. RS: Architectural register file stores architectural states 2. IQ: Physical register file; No architectural register file! Mapping table determines architectural states
5 Compare Reservation Station and Issue Queue Reservation Station 1. RS: busy, fu, op, Qj, Qk, Vj, Vk 2. IQ: busy, fu, op, Pj, Pk, ReadyJ, ReadyK ROB 1. RS: Store register values 2. IQ: No register contents Pros and Cons of IQ: No copying between ROB and register Efficient use of register Bad: Complex mapping table design
6 Register Mapping Table Records the mapping from virtual, architectural registers to physical registers Mapping is stored in RAM or CAM memories Arch reg (virtual) Phy reg R1 => P3 R2 => P10 R3 => P6 R4 => P8 R5 => P12 …
7 Register Renaming Examples Loop: LW R2, 0(R1) ADD R2, R2, 1 SW R2, 0(R1) ADD R1, R1, 4 BNE R2, R3, LOOP LW returns 100, R1=1000 Renamed dynamic instructions: … BNE P2, P3, Loop LW P32, 0(P1) ADD P33, P32, 1 SW R33, 0(P1) ADD P34, P1, 4 BNE P34, P3, LOOP … Assume at first BNE.rename, R1-R31 mapped to P1-P31, P32-P127 are free First BNE may be predicted either correctly or not
9 Commit and Rollback R1 => P1 R2 => P32 R3 => P3 R4 => P4 R5 => P5 … P1=>R1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … R1 => P1 R2 => P2 R3 => P3 R4 => P4 R5 => P5 … P1=>R1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … P1=>R34 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … P1=4000 P2=200 … P32=100 P33=? P34=4004 Commit successful: make the next mapping status as committed mapping status free the previous physical register Mis-prediction/exception: flush pipeline, flush the following mappings Rename point commit point
10 Program Execution Correctness Only committed instructions write to register and memory Yes, from programmer’s viewpoint -- only committed instructions’ register output becomes visible Maintain correct data flow – a child instruction always use the values from its parents Yes, in renamed form, and not affected by speculative execution Register/memory receives the value of last write Yes, from programmer’s viewpoint -- architectural mapping status is updated in program order Note memory correctness is not affected
11 Mapping Table Design – MIPS R1000 RAM-based structure: Automatically, parallel saving on branches at rename On mis-prediction: restore the previous mapping immediately, flush pipeline, restart fetch at the alternative PC On commit of branch instruction: make the corresponding mapping as the committed one Stall if branch stack is full Mapping after Br4 Mapping after Br3 Mapping after Br2 Mapping after Br1 Committed mapping Branch stack Alternative PC4 Alternative PC3 Alternative PC2 Alternative PC1 Mapping tables Current mapping Committed mapping
12 Mapping Table Design – MIPS R1000 How about precise exception? Cannot preserve every mapping status for every instruction Solution: record the change of mapping in ROB ROB: Contains Dest Architectural Register, Renamed physical register, Old renamed physical register On exception: rollback mapping one instruction by one instruction, four instructions per cycle Slow performance – but how frequent is exception? Note branch mis-prediction has fast recovery
13 Mapping Table Design – Alpha 21264 CAM structure Associative searching on architecture register index, output physical register index (through an encoder) One column represents one mapping, allocated to each instruction with register output at rename One pair of valid bit changes per one dest renaming Fast recovery even on exceptions Arch. Reg # … … p0 p1 p2 pk 1 1 0 1 1 0 1 1 Valid bits current mapping committed mapping Match and valid
14 Multiple Issue Pipelines Each pipeline stages accept k instructions – k- issue processor Alpha 21264 – 4-issue MIPS R1000 – 4-issue Intel P4 – 3-issue Memory structure must have multiple ports proportional to issue width! What if k instructions at rename have dependence among them? Need Dependence check logic!
15 Dependence Check Logic Any change to the first renaming? What is the change to the second one? Third and forth ones? mapping table Rs0Rt0Rd0 Ps0Ps1 Rs1Rt1Rd1Rs2Rt2Rd2Rs3Rt3Rd3 Ps0Ps1Ps0Ps1Ps0Ps1 Pd0Pd1Pd2Pd3 No dependence check yet
Your consent to our cookies if you continue to use this website.