Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.

Similar presentations


Presentation on theme: "COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept."— Presentation transcript:

1 COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept. stasys.maciulevicius@ktu.lt

2 ©S.Maciulevičius2 2009-2014 Instruction execution Computer executes sequences of instructions I 1, I 2, I 3,..., I n. Every instruction I i consists from several steps or phases, which can be described as follows:  F – instruction fetch,  D - instruction decoding,  O – operand fetch,  X - operation executing,  W – result storing. Of course, partition can be different (it depends on processor)

3 ©S.Maciulevičius3 2009-2014 Sequential execution In case of sequential execution, (i+1)-th instruction starts after finishing execution of i-th instruction: Phases have different time-span FDOXW FDOXW FD

4 ©S.Maciulevičius4 2009-2014 Pipeline Pipelined execution of instructions requires rhythmic functioning of pipeline: FDOXWFDOXW t F t D t O t X t W       = max(t F, t D, t O, t X, t W ) ? Duration of stage (phase):

5 ©S.Maciulevičius 5 2009-2014 Pipeline Then execution of the (i +1)-th instruction starts by one step later than the i-th: i) i+1) i+2) i+3) FDOXWFDOXWFDOXWFDOXW

6 ©S.Maciulevičius6 2009-2014 Pipeline implementation Pipelined execution of instructions requires correct transmitting of information between the stages: Stage circuits Latch Stage circuits Latch Stage circuits Latch Data Clock Of course, the latches between levels of memory cells may be excluded, however, the pipeline design complexity will be higher, but the pipeline can be accelerated

7 ©S.Maciulevičius7 2009-2014 Example of pipeline 4 stage pipeline can be as in this picture: PC ADD R3R2R1 Register file Address Instruction ADD R3 Value R3 Result R1 R2 Values ALU X OF W R3, Result Cock

8 ©S.Maciulevičius8 2009-2014 PowerPC pipelines IQ-7 IQ-6 IQ-5 IQ-4 IQ-3 IQ-2 IQ-1 IQ-0 (IU decode) IU buffer IU execute Write FPU buffer FPU decoding FPU execute 1 FPU execute 2 Write BPU decode/ execute Write Load/Store From cache

9 ©S.Maciulevičius9 2009-2014 PowerPC pipeline – IU 1 and 2 mul 3 cmp 4 add 0 add dedoding execution writing waiting in IQ waiting in IU buffer IQ-1 IQ-0 (decoding) IU buffer IU execution Writing

10 ©S.Maciulevičius10 2009-2014 PowerPC pipeline – IU Decoding each IU instruction requires 1 cycle After decoding follows execution of operation in integer pipeline mul instruction requires for execution 5 cycles, thus cmp can not be executed in 5-th cycle, so it falls to the IU buffer and stays there, till functional unit gets free Thus add instruction stays in decoding stage

11 ©S.Maciulevičius11 2009-2014 Pipeline hazards Pipeline work really is not as perfect as previously depicted. There are typically three types of hazards: structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time data hazard refers to a situation where an instruction needs as operand result of previous instruction control hazard occurs when processor executes branch or jump operation; pipeline must be filled from target address

12 ©S.Maciulevičius12 2009-2014 Data hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequentially executing instructions on the unpipelined machine 123456789 ADDR1, R2, R3FDOXW SUBR4, R5, R1 FDOXW ANDR6, R1, R7 FDOXW ORR8, R1, R9 FDOXW XORR10, R1, R11 FDOXW

13 ©S.Maciulevičius13 2009-2014 Data hazards Let we have such two instructions: add r1, r2, r3; r1 := r2 + r3 sub r4, r1, r5; r4 := r1 – r5 add: sub: Similar occurs is a such case: ld r1, a; r1 := ATM[a] add r4, r1, r5; r4 := r1 + r5 FDOX W:  r1 FD O: r1  XW

14 ©S.Maciulevičius14 2009-2014 Data hazards Data hazards can be eliminated using : Software tools: – inserting NOOP – changing order of instructions Hardware tools: – stalling the pipeline – adding special data lines – bypassing FDOX W:  r1 FD O: r1  X

15 ©S.Maciulevičius15 2009-2014 Data hazards - Bypassing Data bus Main memory Register file Mux Result buffer ALU Bypass for data load Bypass for result

16 ©S.Maciulevičius16 2009-2014 Control hazards Control hazards can cause a greater performance loss for pipeline than data hazards When a branch is executed, it may or may not change the PC (program counter) to something other than its current value plus 4 If a branch changes the PC to its target address, it is a taken branch; if it falls through, it is not taken

17 ©S.Maciulevičius17 2009-2014 Control hazards Branches and jumps branch FDO X:  PC W F Stall XW XW FDO FD FDOXW FDOX i+1 i+2 i+3 i+4 Stall After recognizing branch, pipeline is stalled until branch target address is calculated

18 ©S.Maciulevičius18 2009-2014 Control hazards What to do in order to reduce possible time losses? As soon as possible find out whether branch occurs As soon as possible calculate new value of PC Measures to reduce the delay time: Using branch prediction Changing instruction order Using multithreading Using buffers for storing unused instructions

19 ©S.Maciulevičius19 2009-2014 Superpipelining Superpipelining simply refers to pipelining that uses a longer pipeline (with more stages) than "regular" pipelining In theory, a design with more stages, each doing less work, can be scaled to higher clock frequency However, this depends a lot on other design characteristics, and it isn't true by default that a processor claiming superpipelining is "better"

20 ©S.Maciulevičius20 2009-2014 Superpipeline Pipeline rhythm can be achieved otherwise either: FDOXW Duration of stage (phase): t F t D t O t X t W      = max(t X /2, t D ) F1DOX1W1F2X2W2

21 ©S.Maciulevičius21 2009-2014 Superpipeline Such superpipeline looks so: F1DOX1W1F2X2W2 F1DOX1W1F2X2W2 F1DOX1W1F2X2W2 F1DOX1W1F2X2W2 F1DOX1W1F2X2W2 F1DOX1W1F2X2W2

22 ©S.Maciulevičius22 2009-2014 Superpipeline in Pentium II IFU – Instruction Fetch Unit ID – Instruction Decode RAT – Register Allocator ROB – Reorder Buffer DIS – Dispatcher EX – Execute Stage RET – Retire Unit RET2RET1 EXDISROBRATID2ID1IFU3IFU2IFU1

23 Haswell pipeline Haswell pipeline can be seen on two next slides:  First part of pipeline - Front End  Second part of pipeline - Back End; this part usually is presented as Haswell Execution Engine ©S.Maciulevičius23 2013-2014

24 ©S.Maciulevičius24 2013-2014

25 ©S.Maciulevičius25 2013-2014


Download ppt "COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept."

Similar presentations


Ads by Google