Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed and Parallel Processing

Similar presentations


Presentation on theme: "Distributed and Parallel Processing"— Presentation transcript:

1 Distributed and Parallel Processing
George Wells

2 Terminology (cont.)

3 Flynn’s Taxonomy Single Instruction stream, Single Data stream (SISD) = serial computer Single Instruction stream, Multiple Data stream (SIMD) = processor arrays/vector processors/GPU Multiple Instruction stream, Single Data stream (MISD)‏ Multiple Instruction stream, Multiple Data stream (MIMD) = multiprocessors

4 Terminology Middleware Connectivity software Functional set of APIs
“the software layer that lies between the operating system and the applications on each side” Connectivity software Functional set of APIs

5 Terminology Data Access DBMS house and manage data access
Allow disparate data sources to be viewed in a consistent way Database middleware – data passing

6 Terminology MOM - Message Oriented Middleware
resides between applications and network infrastructure refers to process of distributing data and control through exchange of messages includes message passing and queueing models asynchronous and synchronous communications

7 Granularity The term grain is used to indicate the amount of computation performed between synchronisations: Coarse grain Fine grain

8 Communication : Computation Ratio
Important performance characteristic when communication is explicit (e.g. message passing) Related to grain size

9 Hardware Models The RAM (Random Access Machine) model provides a useful abstraction We can reason about performance of algorithms, etc. Can we create a similar model for parallel systems?

10 PRAM Parallel Random Access Machine
Multiple processing units connected to a shared memory unit Instructions executed in lock-step Simplifies synchronisation Multiple simultaneous accesses to one memory location Differing approaches: disallowed; must all write same value; one (randomly selected) succeeds; etc.

11 Problem PRAM does not adequately model memory behaviour
Assumes all memory accesses take unit time Overhead of enforcing consistency grows with number of processors

12 CTA Candidate Type Architecture
Distinguishes between local and non-local memory accesses Multiple processors connected by some form of “network”

13 Interconnection network
CTA PC P0 P1 P2 Pm . . . Interconnection network Processor Memory NIC Network connections (1 <= n <= 6)

14 CTA Data references can be Local (unit cost)
Non-local (λ, non-local memory latency – multiple of local cost ) Models for non-local access Shared memory High hardware cost, poor scalability 1-sided communication One processor “gets” and “puts” non-local data; requires synchronisation Message passing Explicit “send” and “receive” required

15 Processor Topologies Criteria to measure effectiveness in implementing parallel algorithms Diameter of network = largest distance between 2 nodes Bisection width = minimum no. of edges to be removed to split network in two No of edges per node Maximum edge length

16 Processor Topologies Ideal Organisation:
Low diameter - lower bound on complexity for algs that require comms between arbitrary nodes. High bisection width - in algs with large amounts of data movement, size of data divided by bisection width puts lower bound on complexity. No of edges constant independent of network size - scalability Max edge length constant - scalability

17 Processor Topologies Mesh Pyramid Shuffle-Exchange Butterfly
Hypercube / Cube-connected Cube-connected Cycles Others: Binary Tree; Hypertree; de Bruijn Network; minimum path

18 Simple 2-D Mesh

19 Wrap-around Mesh

20 Toroidal Wrap-around Mesh

21 Pyramid Attempt to combine advantages of mesh networks and tree networks A pyramid of size p is a 4-ary tree of height log4 p

22 Shuffle-Exchange Network
Solid arrows = shuffle connections. Dashed arrows = exchange connections. Shuffle-exchange network: used for Discrete Fourier Transforms and sorting bitonic sequences Necklace of i = nodes which a data item (starting at position i) traverses in response to shuffles.

23 Butterfly

24 Hypercube

25 Cube Connected Cycles

26


Download ppt "Distributed and Parallel Processing"

Similar presentations


Ads by Google