Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/

Slides:



Advertisements
Similar presentations
Analysis of Computer Algorithms
Advertisements

Parallel Algorithms.
Efficient Parallel Algorithms COMP308
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1 Meshes of Trees (MoT) and Applications in Integer Arithmetic Panagiotis Voulgaris Petros Mol Course: Parallel Algorithms.
Jie Liu, Ph.D. Professor Department of Computer Science
Broadcasting Protocol for an Amorphous Computer Lukáš Petrů MFF UK, Prague Jiří Wiedermann ICS AS CR.
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Models of Parallel Computation
Lecture 21: Parallel Algorithms
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
CSE 830: Design and Theory of Algorithms
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
Architecture and Real Time Systems Lab University of Massachusetts, Amherst An Application Driven Reliability Measures and Evaluation Tool for Fault Tolerant.
1 Tuesday, September 26, 2006 Wisdom consists of knowing when to avoid perfection. -Horowitz.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
1 Lecture 21: Core Design, Parallel Algorithms Today: ARM Cortex A-15, power, sort and matrix algorithms.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
COMP308 Efficient Parallel Algorithms
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Dense Linear Algebra Sathish Vadhiyar. Gaussian Elimination - Review Version 1 for each column i zero it out below the diagonal by adding multiples of.
Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1 Dynamic Interconnection Networks Miodrag Bolic.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Communication and Computation on Arrays with Reconfigurable Optical Buses Yi Pan, Ph.D. IEEE Computer Society Distinguished Visitors Program Speaker Department.
Lecture 3 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Shared versus Switched Media.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Dense Linear Algebra Sathish Vadhiyar. Gaussian Elimination - Review Version 1 for each column i zero it out below the diagonal by adding multiples of.
Vertex Coloring Distributed Algorithms for Multi-Agent Networks
Super computers Parallel Processing
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Parallel Processing & Distributed Systems Thoai Nam Chapter 3.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Interconnection Networks Communications Among Processors.
Distributed and Parallel Processing
Top 50 Data Structures Interview Questions
DIGITAL SIGNAL PROCESSING ELECTRONICS
Connection System Serve on mutual connection processors and memory .
Course Outline Introduction in algorithms and applications
Outline Interconnection networks Processor arrays Multiprocessors
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Parallel Algorithms A Simple Model for Parallel Processing
Birds Eye View of Interconnection Networks
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/ Memor y refs. Level-1 (vector) y=y+ax Z=y.x 3n2n2/3 Level-2 (Matrix-vector) y=y+Ax A = A+(alpha) xy T n2n2 2n 2 2 Level-3 (Matrix-Matrix) C=C+AB 4n 2 2n 3 n/2

Fourier Transform The Fourier transform is widely used for designing filters. You can design systems with reject high frequency noise and just retain the low frequency components. This is natural to describe in the frequency domain. Important properties of the Fourier transform are: 1. Linearity and time shifts 2. Differentiation 3. Convolution

A Simple Model for Parallel Processing Parallel Random Access Machine (PRAM) model –a number of processors all can access –a large share memory –all processors are synchronized –all processor running the same program each processor has an unique id, pid. and may instruct to do different things depending on their pid

Interconnection Networks Uses of interconnection networks –Connect processors to shared memory –Connect processors to each other Interconnection media types –Shared medium –Switched medium Different interconnection networks define different parallel machines. The interconnection network’s properties influence the type of algorithm used for various machines as it affects how data is routed.

Switch Network Topologies View switched network as a graph –Vertices = processors or switches –Edges = communication paths Two kinds of topologies –Direct –Indirect

Terminology for Evaluating Switch Topologies We need to evaluate 4 characteristics of a network in order to help us understand their effectiveness in implementing efficient parallel algorithms on a machine with a given network. These are –The diameter –The bisection width –The edges per node –The constant edge length We’ll define these and see how they affect algorithm choice. Then we will investigate several different topologies and see how these characteristics are evaluated.

Terminology for Evaluating Switch Topologies Diameter – Largest distance between two switch nodes. –A low diameter is desirable –It puts a lower bound on the complexity of parallel algorithms which requires communication between arbitrary pairs of nodes.

Terminology for Evaluating Switch Topologies Bisection width – The minimum number of edges between switch nodes that must be removed in order to divide the network into two halves (within 1 node, if the number of processors is odd.) High bisection width is desirable. In algorithms requiring large amounts of data movement, the size of the data set divided by the bisection width puts a lower bound on the complexity of an algorithm, Actually proving what the bisection width of a network is can be quite difficult.

Evaluating Switch Topologies Many have been proposed and analyzed. We will consider several well known ones: –2-D mesh –linear network –binary tree –hypertree –butterfly –hypercube –shuffle-exchange Those in yellow have been used in commercial parallel computers.

PRAM [Parallel Random Access Machine] PRAM composed of: – P processors, each with its own unmodifiable program. –A single shared memory composed of a sequence of words, each capable of containing an arbitrary integer. –a read-only input tape. –a write-only output tape. PRAM model is a synchronous, MIMD, shared address space parallel computer. (Introduced by Fortune and Wyllie, 1978)

PRAM model of computation p processors, each with local memory Synchronous operation Shared memory reads and writes Each processor has unique id in range 1-p Shared memory

Characteristics At each unit of time, a processor is either active or idle (depending on id) All processors execute same program At each time step, all processors execute same instruction on different data ( “ data- parallel ” ) Focuses on concurrency only

Why study PRAM algorithms? Well-developed body of literature on design and analysis of such algorithms Baseline model of concurrency Explicit model –Specify operations at each step –Scheduling of operations on processors Robust design paradigm

Designing PRAM algorithms Balanced trees Pointer jumping Euler tours Divide and conquer Symmetry breaking...

Balanced trees Key idea: Build balanced binary tree on input data, sweep tree up and down “ Tree ” not a data structure, often a control structure