Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Similar presentations


Presentation on theme: "Processor Level Parallelism 2. How We Got Here Developments in PC CPUs."— Presentation transcript:

1 Processor Level Parallelism 2

2 How We Got Here Developments in PC CPUs

3 Development Single Core

4 Development Single Core with Multithreading – 2002 Pentium 4 / Xeon

5 Development Multi Processor – Multiple processors coexisting in system – PC space in ~1995

6 Development Multi Core – Multiple CPU's on one chip – PC space in ~2005

7 Power Density Prediction circa 2000 Core 2 Adapted from UC Berkeley "The Beauty and Joy of Computing"

8 Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005

9 Moore's Law Related Curves Adapted from UC Berkeley "The Beauty and Joy of Computing"

10 Moore's Law Related Curves Adapted from UC Berkeley "The Beauty and Joy of Computing"

11 Development Modern Complexity – Many cores – Private / Shared cache levels

12 Homogenous Multicore i7 : Homogenous multicore – 4 chips in one – separate L2 cache, shared L3

13 Heterogeneous Multicore Different cores for different jobs – Standard CPU – Low Power CPU – Graphics – Video

14 Coprocessors Coprocessor : Assists main CPU with some part of work

15 Co Processors Graphics Card : floating point specialized – 100s-1000s of SIMD cores –i7 ~ 100 gigaflops –Kepler GPU ~ 1300 gigaflops

16 CUDA Compute Unified Device Architecture – Programming model for general purpose work on GPU hardware – Streaming Multiprocessors each with 16-48 CUDA cores

17 CUDA Designed for 1000's of threads – Broken into "warps" of 32 threads – Entire warp runs on SM in lock step – Branch divergence cuts speed

18 Other Coprocessors CPU's used to have floating point coprocessors – Intel 30386 & 80387 Audio cards Crytpo – SLL encryption for servers

19 Parallelism & Memory

20 Multiprocessing & Memory Memory demo…

21 Memory Access Multiple Processes accessing same memory = interactions – May add 10, 1 or 11 to x

22 UMA Uniform Memory Access – Every processor sees every memory using same addresses – Same access time for any CPU to any memory word

23 NUMA Non Uniform Memory Access – Single memory address space visible to all CPUs – Some memory local Fast – Some memory remote Accessed in same way, but slower

24 NUMA & Cache Memory problems compounded by cache X = 10

25 NUMA & Cache Memory problems compounded by cache X = 10 X = 15

26 Cache Coherence Cores need to "snoop" other reads Cores need to broadcast writes

27 MESI MESI : Cache Coherence Protocol – Modified I have this cached and I have changed it – Exclusive I have this uncached and unmodified and am only one with it – Shared I and another both have this cached – Invalid I do not have this cached

28 State Change Changes based on OWN actions Read Fulfilled By Other Cache Read Fulfilled By Other Cache

29 State Change Changes based on OTHERS actions I have only modified copy of this… Write it out to memory and have other core wait

30 State Change Sample CPU 2 broadcasts write message… CPU 1 invalidates

31 State Change Sample CPU 2 snoops read… has to write modified value to memory CPU 2 snoops write… has to write modified value to memory

32 Parallelism Bad News

33 Parallel Speedup In Theory: N cores = N times speedup

34 Issues Not every part of a problem scales well – Parallel : can run at same time – Serial : must run one at a time in order

35 Amdahl’s Law In Practice: Amadahl's law applied to N processors on a task where P is parallel portion:

36 Amdahl’s Law 60% of a job can be made parallel. We use 2 processors: 1.43x faster with 2 than 1

37 Applications can almost never be completely parallelized; some serial code remains Speedup Issues : Amdahl’s Law Time Number of Cores Parallel portion Serial portion 1 5

38 Speedup Issues : Amdahl’s Law Time Number of Cores Parallel portion Serial portion 123 45 Serial portion becomes limiting factor

39 Ouch More processors only help with high % of parallelized code

40 Amdahl's Law is Optimistic Each new processor means more – Load balancing – Scheduling – Communication – Etc…

41 Parallel Algorithms Some problems highly parallel, others not:


Download ppt "Processor Level Parallelism 2. How We Got Here Developments in PC CPUs."

Similar presentations


Ads by Google