Download presentation
Presentation is loading. Please wait.
1
Wackiness Algorithm A: Algorithm B:
Generate 200,000 random values 0-255 Add up all values >= 128 Algorithm B: Sort the values
2
Pipelining Pt2
3
Pipelining Limits In theory: n times speedup for n stage pipeline But
Only if all stages are balanced Only if can be kept full
4
Hazards Hazard : Situation preventing next instruction from continuing in pipeline Structural : Resource (shared hardware) conflict Data : Needed data not ready Control : Correct action depends on earlier instruction
5
Branch Unconditional Branch in perfect world:
Skip inst 3, 4, no bubble
6
Branch Timing Don’t know it is branch until ID
7
Branch Timing Branch address not available until after EX
8
Branch Real Timing Branch destination calculated at T4
Can’t start the instruction until T5 Need to insert NOP bubble
9
Branch Real Timing If we can forward address from EX to IF can start x at T4
10
Branch Real Timing Branch destination calculated at T4
Already started running instruction 3 Need ability to ignore started instruction Still a bubble – ignored instruction instead of No-OP
11
Conditional Branch Conditional branch has two possibilities: Not taken
12
Solving Conditional Branch
Option 1: Stall until we know Not taken Taken
13
Solving Conditional Branch
Option 2: Prediction Predict Not Taken & Is Not Taken Predict Not Taken & Is Taken
14
Predicting Taken Calculating branch destination in time to use in next cycle = more hardware:
15
Solving Conditional Branch
Option 2: Prediction Predict Taken & Is Not Taken Predict Taken & Is Taken
16
Branch Prediction Penalty
In our CPU Predict correct = 0 cycle penalty Predict wrong = 1 cycle penalty Longer pipeline No way to decode before next fetch Bigger penalty for miss Penalty for any taken branch
17
Static Branch Prediction
Static prediction : Hardcoded assumptions If branch backwards, it is a loop, assume we take the branch
18
Dynamic Branch Prediction
Dynamic Prediction : Predict based on runtime behavior More hardware : Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction
19
Prediction 1 bit history (Taken / Not taken) may not be optimal
Ex Nested loop: Inner CBZ missed on Last iteration Next first iteration
20
Prediction 2 bit history avoids that issue
21
Real Stuff Is it worth it?
22
Real Stuff Is it worth it?
23
Pipelineing worth it? Yes… to a point
24
ARM Pipelines Early ARM Pipeline: ARM v6 pipeline
25
Modern Pipeline Cortex A53 : ARMv8
26
Modern Pipeline Cortex A53 : Pipeline stalls basically double CPI
27
Why Loads Have +8 in Address
Fun Fact Why Loads Have +8 in Address LDR : Calculates location as: currentLocation immediate (PC) C ( ) (2010) By the time it executes, PC will be 8 greater
28
Intel Pipelines
29
Intel i7 Branch Performance
A few mispredictions can have large impact:
30
Intel vs AMD Part of Intel's IPC advantage: Branch prediction
AMD claiming major advances in new architecture:
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.