Presentation is loading. Please wait.

Presentation is loading. Please wait.

In1210/01-PDS 1 TU-Delft Large systems. In1210/01-PDS 2 TU-Delft Why parallelism(1) l Fundamental laws of nature: -example: channel widths are becoming.

Similar presentations


Presentation on theme: "In1210/01-PDS 1 TU-Delft Large systems. In1210/01-PDS 2 TU-Delft Why parallelism(1) l Fundamental laws of nature: -example: channel widths are becoming."— Presentation transcript:

1 In1210/01-PDS 1 TU-Delft Large systems

2 In1210/01-PDS 2 TU-Delft Why parallelism(1) l Fundamental laws of nature: -example: channel widths are becoming so small that quantum properties are going to determine device behavior -Signal propagation time is getting worse if channel widths shrink

3 In1210/01-PDS 3 TU-Delft Why parallelism(2) (Bron: IEEE Computer, September 1997)

4 In1210/01-PDS 4 TU-Delft Why parallelism(3) l Engineering constraints -Phase transition of a component is a good measure for maximum obtainable computing speed »example: optical or superconducting devices can swith in 10 -12 sec »Optimistic suggestion: 1 TIPS (Tera Instruction Per Second, 10 12 ) is possible -However, we must calculate something »assume we need 10 phase transitions: 0.1 TIPS

5 In1210/01-PDS 5 TU-Delft Why parallelism(4) 0.5 cm But what about memory ? It takes light approximately 16 picoseconds to cross 0.5 cm, yielding a possible execution rate of 60 GIPS However, in Silicon speed is 10 times slower, resulting in 6 GIPS

6 In1210/01-PDS 6 TU-Delft Why parallelism(5) l Speed of sequential computer is limited to few GIPS l Improvements by -Exploiting locality through caches -Using parallelism »multiple functional units (instruction level parallelism) »multiple CPU’s (parallel processing)

7 In1210/01-PDS 7 TU-Delft Classification l Single Instruction, Single Data (SISD) -conventional system l Single Instruction, Multiple Data (SIMD) -one instruction on multiple data objects l Multiple Instruction, Multiple Data (MIMD) -multiple instruction streams on multiple data streams l Multiple Instruction, Single Data (MISD) -?????

8 In1210/01-PDS 8 TU-Delft SIMD processors Instruction Issuing Unit PE PE = Processing Unit INCR 567 -> 568652 -> 653.....

9 In1210/01-PDS 9 TU-Delft UMA architecure 0 1 2 3 4 5.. N P1P1 P2P2 PmPm...... M1M1 M2M2 MkMk interconnection network Uniform Memory Access computer

10 In1210/01-PDS 10 TU-Delft NUMA architecture 0 1 2 3 4 5.. N P1P1 P2P2 PmPm...... M1M1 M2M2 MmMm interconnection network Non-Uniform Memory Access computer realization in hardware or software (distributed shared memory)

11 In1210/01-PDS 11 TU-Delft Distributed memory architecture 0 1 2 0 1 2 0 1 2 P1P1 P2P2 PmPm...... M1M1 M2M2 MmMm interconnection network

12 In1210/01-PDS 12 TU-Delft Interconnection Networks l Important parameter -Diameter: maximal distance between processors -Degree: maximal number of connections per processor -Total number of Connections

13 In1210/01-PDS 13 TU-Delft Multiple bus Bus 1 Bus 2 (Multiple) bus structures

14 In1210/01-PDS 14 TU-Delft Local networks l Ethernet -based on collision detection -if collision back off and randomly try later -speed 10-100 Mb/s l Token ring -based on token circulation on ring -possession of token allows putting message on the ring

15 In1210/01-PDS 15 TU-Delft Cross bar Cross bar interconnection network N 2 Switches

16 In1210/01-PDS 16 TU-Delft Multi-stage(1) P0P0 P1P1 stage1stage2stage3

17 In1210/01-PDS 17 TU-Delft Multi-stage (2) l Multistage networks l Example: Shuffle or Omega network l Every processor identified by three-bit number (in general, n-bit number) l Message from processor to another contains identifier of destination l In every stage: inspect one bit of destination l If 0: upper output l If 1: lower output

18 In1210/01-PDS 18 TU-Delft Multi-stage (3) l Properties: -Let N = 2 n Processing elements -Number of stages n = log 2 N -No of switches per stage N/2 -Total no of (2x2) switches N(log 2 N)/2 l Not every pair of connections can be simultaneously realized

19 In1210/01-PDS 19 TU-Delft Hypercubes (1) 0001 1011 000 001 010 011111 101 100 110 n = 2 n = 3 n.2 n-1 connections max distance n hops Connected PE’s differ 1 bit - scan bits from right to left - if different sent to neighbor with same bit difference - repeat until end 000 -> 111

20 In1210/01-PDS 20 TU-Delft Hypercubes (2) l Question: what is the average distance between two nodes in a hypercube?

21 In1210/01-PDS 21 TU-Delft Hypercubes (3) l Answer: -take a specific node (situation is the same for all of them) -there is 1 node at distance 0 and 1 at distance n; average n/2 -there are n nodes at distance 1 (one bit difference) and n nodes at distance n-1 (all but one bit difference); average n/2 -similar for distances k and n-k -so overall average distance is n/2

22 In1210/01-PDS 22 TU-Delft Mesh Constant number of connections per node

23 In1210/01-PDS 23 TU-Delft Torus

24 In1210/01-PDS 24 TU-Delft Tree

25 In1210/01-PDS 25 TU-Delft Fat Tree

26 In1210/01-PDS 26 TU-Delft Memory organization(1) Processor Secondary Cache Network Interface network UMA organization

27 In1210/01-PDS 27 TU-Delft Memory organization(2) Processor Secondary Cache Network Interface network NUMA organization Local Memory

28 In1210/01-PDS 28 TU-Delft Cache Coherence l Problem: caches in multiprocessors have copy of the same variable. Copies must be kept identical l Cache coherence: all copies of a shared variable have the same value l Solutions: -Write through to shared memory and all caches -Invalidate cache entry in all other caches l Snoopy caches: PE’s sense write and adapt cache or do invalidate

29 In1210/01-PDS 29 TU-Delft Parallelism PARBEGIN task_1; task_2;.... task_n; PAREND parbegin parend

30 In1210/01-PDS 30 TU-Delft Shared variables (1)..... STWR2, SUM(0)..... Task_1..... STWR2, SUM(0)..... Task_2 T1T2 SUM shared memory

31 In1210/01-PDS 31 TU-Delft Shared variables (2) l Suppose processors 1 and 2 execute: LW A,R0/* A is variable in */ ADDR1,R0/* main memory */ STWR0,A l Initially: -A=100 -R1 in processor 1 is 20 -R1 in processor 2 is 40 l What is the final value of A? 120, 140, 160?

32 In1210/01-PDS 32 TU-Delft Shared variables (3) l So there is a need for mutual exclusion: different components of the same program need exclusive access to a a data structure to ensure consistent values l Occurs in many situations: -access to shared variables -access to a printer l A solution: a single instruction (T&S) that -tests whether somebody else accesses the variable -if not, indicates that the variable is being accessed

33 In1210/01-PDS 33 TU-Delft Shared variables (4) crit:T&S LOCK,crit...... STWR2, SUM(0)..... CLRLOCK Task_1Task_2 T1T2 SUM shared memory crit:T&S LOCK,crit...... STWR2, SUM(0)..... CLR LOCK LOCK

34 In1210/01-PDS 34 TU-Delft Example program l Compute dot product of two vectors with a -Sequential program -Two tasks with shared memory -Two tasks with distributed memory using messages l Primitives in parallel programs: -create_thread() (create a process) -mypid() (who am I?)

35 In1210/01-PDS 35 TU-Delft Sequential program integer array a[1..N], b[1..N] integer dot_product dot_product :=0; do_dot(a,b) print do_product procedure do_dot(integer array x[1..N], integer array y[1..N]) for k:=1 to N dot_product := dot_product + x[k]*y[k] end

36 In1210/01-PDS 36 TU-Delft Shared memory program (1) shared integer array a[1..N], b[1..N] shared integer dot_product shared lock dot_product_lock shared barrier done dot_product :=0; create_thread (do_dot,a,b) do_dot(a,b) print do_product.... barrier dot_product id=0id=1

37 In1210/01-PDS 37 TU-Delft Shared memory program (2)..... procedure do_dot(integer array x[1..N], integer array y[1..N]) private integer id id := mypid(); for k:=(id*N/2)+1 to (id+1)*N/2 lock (dot_product_lock) dot_product := dot_product + x[k]*y[k] unlock (dot_product_lock) end barrier (done) end

38 In1210/01-PDS 38 TU-Delft Shared memory program (3) procedure do_dot(integer array x[1..N], integer array y[1..N]) private integer id, local_dot_product id := mypid(); local_dot_product :=0; for k:=(id*N/2)+1 to (id+1)*N/2 local_dot_product := local_ dot_product + + x[k]*y[k] end lock (dot_product_lock) dot_product := dot_product +local_dot_product unlock (dot_product_lock) end barrier (done) end

39 In1210/01-PDS 39 TU-Delft Shared memory program (4) barrier dot_product local_dot_product id=0 id=1

40 In1210/01-PDS 40 TU-Delft Message passing program(1) integer array a[1..N/2], temp_a [1..N/2], b[1..N/2], temp_b [1..N/2] integer dot_product, id, temp id := mypid() if (id=0) then send (temp_a[1..N/2], 1); send (temp_b[1..N/2], 1); else receive (a[1..N/2], 0); receive (b[1..N/2], 0); end.....

41 In1210/01-PDS 41 TU-Delft Message passing program(2)..... dot_product :=0; do_dot(a,b) if (id =1) send (do_product,0) else receive (temp,1) dot_product := dot_product +temp print do_product end

42 In1210/01-PDS 42 TU-Delft Message passing program(3) local_dot_products data result id=0id=1


Download ppt "In1210/01-PDS 1 TU-Delft Large systems. In1210/01-PDS 2 TU-Delft Why parallelism(1) l Fundamental laws of nature: -example: channel widths are becoming."

Similar presentations


Ads by Google