Wackiness Algorithm A: Algorithm B:

Wackiness Algorithm A: Algorithm B:
Generate 200,000 random values 0-255 Add up all values >= 128 Algorithm B: Sort the values

Pipelining Pt2

Pipelining Limits In theory: n times speedup for n stage pipeline But
Only if all stages are balanced Only if can be kept full

Hazards Hazard : Situation preventing next instruction from continuing in pipeline Structural : Resource (shared hardware) conflict Data : Needed data not ready Control : Correct action depends on earlier instruction

Branch Unconditional Branch in perfect world:
Skip inst 3, 4, no bubble

Branch Timing Don’t know it is branch until ID

Branch Timing Branch address not available until after EX

Branch Real Timing Branch destination calculated at T4
Can’t start the instruction until T5 Need to insert NOP bubble

Branch Real Timing If we can forward address from EX to IF can start x at T4

Branch Real Timing Branch destination calculated at T4
Already started running instruction 3 Need ability to ignore started instruction Still a bubble – ignored instruction instead of No-OP

Conditional Branch Conditional branch has two possibilities: Not taken

Solving Conditional Branch
Option 1: Stall until we know Not taken Taken

Option 2: Prediction Predict Not Taken & Is Not Taken Predict Not Taken & Is Taken

Predicting Taken Calculating branch destination in time to use in next cycle = more hardware:

Option 2: Prediction Predict Taken & Is Not Taken Predict Taken & Is Taken

Branch Prediction Penalty
In our CPU Predict correct = 0 cycle penalty Predict wrong = 1 cycle penalty Longer pipeline No way to decode before next fetch Bigger penalty for miss Penalty for any taken branch

Static Branch Prediction
Static prediction : Hardcoded assumptions If branch backwards, it is a loop, assume we take the branch

Dynamic Branch Prediction
Dynamic Prediction : Predict based on runtime behavior More hardware : Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction

Prediction 1 bit history (Taken / Not taken) may not be optimal
Ex Nested loop: Inner CBZ missed on Last iteration Next first iteration

Prediction 2 bit history avoids that issue

Real Stuff Is it worth it?

Pipelineing worth it? Yes… to a point

ARM Pipelines Early ARM Pipeline: ARM v6 pipeline

Modern Pipeline Cortex A53 : ARMv8

Modern Pipeline Cortex A53 : Pipeline stalls basically double CPI

Why Loads Have +8 in Address
Fun Fact Why Loads Have +8 in Address LDR : Calculates location as: currentLocation immediate (PC) C ( ) (2010) By the time it executes, PC will be 8 greater

Intel Pipelines

Intel i7 Branch Performance
A few mispredictions can have large impact:

Intel vs AMD Part of Intel's IPC advantage: Branch prediction
AMD claiming major advances in new architecture:

Wackiness Algorithm A: Algorithm B:

Similar presentations

Presentation on theme: "Wackiness Algorithm A: Algorithm B:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Wackiness Algorithm A: Algorithm B:

Similar presentations

Presentation on theme: "Wackiness Algorithm A: Algorithm B:"— Presentation transcript:

Similar presentations

About project

Feedback