Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processor Level Parallelism 1

Similar presentations


Presentation on theme: "Processor Level Parallelism 1"— Presentation transcript:

1 Processor Level Parallelism 1

2 Parallelism Levels Levels we can attack parallelism:

3 Bit Level Parallelism Circuits process bits in parallel

4 Instruction Level Parallelism
Organization level may process instructions in parallel

5 Higher levels Thread Level Task Level Application Level
Ability to run multiple simultaneous streams of instrucions Task Level Ability to run parts of a program on different chips Application Level Run separate jobs on different machines

6 Process vs Thread Process : Program Own memory space
Has at least one thread

7 Multi Tasking Multitasking
Done on single cores running multiple programs OS handles switch "Large" chunks of time Flush cache on switch

8 Process vs Thread Thread : Instruction sequence Own registers/stack
Share memory with other threads in process

9 Threaded Code Demo…

10 Resource Usage Four threads running in 4-wide pipeline
Can't always fill all 4 issue slots Have bubbles from memory access, page faults, etc… Issue Slots

11 Multithreading Multithreading
Alternate or combine threads to maximize use of processor Finer timescale Maintain cache Hardware required Multiple register sets Track "owner" of pipeline instructions

12 Multithreading Corse Grained Multitasking
Threads run for number of cycles Must drain pipeline before switch

13 Multithreading Single Pipeline Course Grained
Assumption 1 cycle to retire after stall Threads to run Single Pipeline Time 

14 Multithreading Dual Pipeline Course Grained
Assumption 1 cycle to retire after stall Threads to run Dual Pipeline Time 

15 Latency vs Throughput Multithreading favors throughput over latency

16 Multithreading Fine Grained Multitasking
Hardware can switch to a new thread each cycle without draining pipeline

17 Multithreading Single Pipeline Fine Grained
Assumption: Switches every cycle Threads to run Single Pipeline Time 

18 Multithreading Dual Pipeline Fine Grained
Assumption: Switches every cycle Threads to run Dual Pipeline Time 

19 SMT SMT : Simultaneous Multithreading
AKA Hyperthreading Issue ops from multiple threads in one cycle Time 

20 Multithreading SMT Try to start next thread early if spare pipeline
Threads to run C gets to jump in early as B2 not ready Time 

21 Multithreading SMT Otherwise switch like fine grained Threads to run
C gets full turn, A up next Time 

22 Multithreading SMT Still constrained by load delays Threads to run
C5, B3 not ready until 8; A7 not ready until 9 Time 

23 SMT Challenges Resources must be duplicated or split
Split too thin hurts performance… Duplicate everything and you aren't maximizing use of hardware…

24 Intel vs AMD Variations on SMT

25 Processor Level Parallelism Styles

26 Processor Parallelism
Process Parallelism : Run multiple instruction streams simultaneously

27 Flynn's Taxonomy Categorization of architectures based on
Number of simultaneous instructions Number of simultaneous data items

28 Flynn's Taxonomy Categorization of architectures based on

29 SISD SISD : Single Instruction – Single Data
One instruction One piece data May be pipelined or superscalar

30 SISD SIMD : Single Instruction – Multiple Data
One instruction Multiple pieces of data

31 SIMD Roots ILLIAC IV One instruction issued to 64 processing units

32 SIMD Roots Cray I Vector processor
One instruction applied to all elements of vector register

33 Modern SIMD x86 Processors SSE Units : Streaming SIMD Execution
Operate on special 128 bit registers 4 32bit chunks 2 64bit chunks 16 8 bit chiunks

34 MISD MISD : Multiple Instruction – Single Data
One piece of data Processed by multiple instructions Rare Space shuttle : Five processors handle fly by wire input, vote

35 MIMD MIMD : Multiple Instruction – Multiple Data
Multiple pieces of data, multiple instruction streams

36 MIMD MIMD : Multiple Instruction – Multiple Data Multi core processors
Super computers Computational Grids

37 Coupling and Topologies
MIMD differences How connected are nodes? How shared is memory?

38 BlueGene

39 BG/P Full system : 72 x 32 x 32 torus of nodes

40 COW Cluster of Workstations


Download ppt "Processor Level Parallelism 1"

Similar presentations


Ads by Google