Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/

Similar presentations


Presentation on theme: "Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/"— Presentation transcript:

1 Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/ Memor y refs. Level-1 (vector) y=y+ax Z=y.x 3n2n2/3 Level-2 (Matrix-vector) y=y+Ax A = A+(alpha) xy T n2n2 2n 2 2 Level-3 (Matrix-Matrix) C=C+AB 4n 2 2n 3 n/2

2 Fourier Transform The Fourier transform is widely used for designing filters. You can design systems with reject high frequency noise and just retain the low frequency components. This is natural to describe in the frequency domain. Important properties of the Fourier transform are: 1. Linearity and time shifts 2. Differentiation 3. Convolution

3 A Simple Model for Parallel Processing Parallel Random Access Machine (PRAM) model –a number of processors all can access –a large share memory –all processors are synchronized –all processor running the same program each processor has an unique id, pid. and may instruct to do different things depending on their pid

4 Interconnection Networks Uses of interconnection networks –Connect processors to shared memory –Connect processors to each other Interconnection media types –Shared medium –Switched medium Different interconnection networks define different parallel machines. The interconnection network’s properties influence the type of algorithm used for various machines as it affects how data is routed.

5 Switch Network Topologies View switched network as a graph –Vertices = processors or switches –Edges = communication paths Two kinds of topologies –Direct –Indirect

6 Terminology for Evaluating Switch Topologies We need to evaluate 4 characteristics of a network in order to help us understand their effectiveness in implementing efficient parallel algorithms on a machine with a given network. These are –The diameter –The bisection width –The edges per node –The constant edge length We’ll define these and see how they affect algorithm choice. Then we will investigate several different topologies and see how these characteristics are evaluated.

7 Terminology for Evaluating Switch Topologies Diameter – Largest distance between two switch nodes. –A low diameter is desirable –It puts a lower bound on the complexity of parallel algorithms which requires communication between arbitrary pairs of nodes.

8 Terminology for Evaluating Switch Topologies Bisection width – The minimum number of edges between switch nodes that must be removed in order to divide the network into two halves (within 1 node, if the number of processors is odd.) High bisection width is desirable. In algorithms requiring large amounts of data movement, the size of the data set divided by the bisection width puts a lower bound on the complexity of an algorithm, Actually proving what the bisection width of a network is can be quite difficult.

9 Evaluating Switch Topologies Many have been proposed and analyzed. We will consider several well known ones: –2-D mesh –linear network –binary tree –hypertree –butterfly –hypercube –shuffle-exchange Those in yellow have been used in commercial parallel computers.

10 PRAM [Parallel Random Access Machine] PRAM composed of: – P processors, each with its own unmodifiable program. –A single shared memory composed of a sequence of words, each capable of containing an arbitrary integer. –a read-only input tape. –a write-only output tape. PRAM model is a synchronous, MIMD, shared address space parallel computer. (Introduced by Fortune and Wyllie, 1978)

11 PRAM model of computation p processors, each with local memory Synchronous operation Shared memory reads and writes Each processor has unique id in range 1-p Shared memory

12 Characteristics At each unit of time, a processor is either active or idle (depending on id) All processors execute same program At each time step, all processors execute same instruction on different data ( “ data- parallel ” ) Focuses on concurrency only

13 Why study PRAM algorithms? Well-developed body of literature on design and analysis of such algorithms Baseline model of concurrency Explicit model –Specify operations at each step –Scheduling of operations on processors Robust design paradigm

14 Designing PRAM algorithms Balanced trees Pointer jumping Euler tours Divide and conquer Symmetry breaking...

15 Balanced trees Key idea: Build balanced binary tree on input data, sweep tree up and down “ Tree ” not a data structure, often a control structure


Download ppt "Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/"

Similar presentations


Ads by Google