-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Parallel Algorithms.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Distributed Systems CS
Divide and Conquer Yan Gu. What is Divide and Conquer? An effective approach to designing fast algorithms in sequential computation is the method known.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Parallel vs Sequential Algorithms
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
An Overview of the BSP Model of Parallel Computation Overview Only.
ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.
Models of Parallel Computation
1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
Computer Architecture Computational Models Ola Flygt V ä xj ö University
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.
A.Broumandnia, 1 5 PRAM and Basic Algorithms Topics in This Chapter 5.1 PRAM Submodels and Assumptions 5.2 Data Broadcasting 5.3.
Bulk Synchronous Parallel Processing Model Jamie Perkins.
Chapter 11 Broadcasting with Selective Reduction -BSR- Serpil Tokdemir GSU, Department of Computer Science.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
RAM, PRAM, and LogP models
LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Computer Architecture 2 nd year (computer and Information Sc.)
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Data Structures and Algorithms in Parallel Computing Lecture 1.
2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.
5 PRAM and Basic Algorithms
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
1 5. Abstract Data Structures & Algorithms 5.6 Algorithm Evaluation.
Distributed and Parallel Processing
PRAM Model for Parallel Computation
CS5102 High Performance Computer Systems Thread-Level Parallelism
Chapter 1.
Lecture 2: Parallel computational models
Lecture 22 review PRAM: A model developed for parallel machines
Parallel computation models
PRAM Algorithms.
PRAM Model for Parallel Computation
Data Structures and Algorithms in Parallel Computing
Parallel and Distributed Algorithms
Distributed Systems CS
COMP60621 Fundamentals of Parallel and Distributed Systems
Unit –VIII PRAM Algorithms.
Module 6: Introduction to Parallel Computing
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Parallel Computer Models (1)  A parallel machine model (also known as programming model, type architecture, conceptual model, or idealized model) is an abstract parallel computer from programmer‘s viewpoint, analogous to the von Neumann model for sequential computing.  The abstraction need not imply any structural information, such as the number of processors and interprocessor communication structure, but it should capture implicitly the relative costs of parallel computation.  Every parallel computer has a native model that closely reflects ist own architecture.

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Parallel Computer Models (2)  Five semantic attributes –Homogeneity: how alike the processors of a parallel computer behave –Synchrony: how tightly synchronised the processes are –Interaction mechanism: how parallel processes interact –Address space: the set of memory locations accessible by a process –Memory model: how to handle shared-memory and access conflict

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Parallel Computer Models (3)  Several performance attributes –Machine size: number of processors –Clock rate: speed of processors (MHz) –Workload: number of computation operations (Mflop) –Speedup, efficiency, utilization –Startup time …

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Abstract Machine Models  An abstract machine model is mainly used in the design and analysis of parallel algorithms without worrying about the details of physical machines.  Three abstract machine models: –PRAM –BSP –Phase Parallel

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa RAM  RAM (random access machine) … Memory Program Location counter r0r0 r1r1 r2r2 r3r3 x1x1 x2x2 … x1x1 x2x2 … x n Write-only output tape Read-only input tape

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Global memory Private memory PRAM (1)  Parallel random-access machine … P1P1 … P2P2 … PnPn … Interconnection network … Control

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa PRAM (2)  A control unit  An unbounded set of processors, each with its own private memory and an unique index  Input stored in global memory or a single active processing element  Step: (1) read a value from a single private/global memory location (2) perform a RAM operation (3) write into a single private/global memory location  During a computation step: a processor may activate another processor  All active, enabled processors must execute the same instruction (albeit on different memory location)  Computation terminates when the last processor halts

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa PRAM(3)  Definition: The cost of a PRAM computation is the product of the parallel time complexity and the number of processors used. Ex: a PRAM algorithm that has time complexity O(log p) using p processors has cost O(p log p)

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Time Complexity Problem  Time complexity of a PRAM algorithm is often expressed in the big-O notation  Machine size n is usually small in existing parallel computers  Ex: –Three PRAM algorithms A, B and C have time complexities if 7n, (n log n)/4, n log log n. –Big-O notation: A(O(n)) < C(O(n log log n)) < B(O(n log n)) –Machines with no more than 1024 processors: log n ≤ log 1024 = 10 and log log n ≤ log log 1024 < 4 and thus: B < C < A

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Conflicts Resolution Schemes (1)  PRAM execution can result in simultaneous access to the same location in shared memory. –Exclusive Read (ER)  No two processors can simultaneously read the same memory location. –Exclusive Write (EW)  No two processors can simultaneously write to the same memory location. –Concurrent Read (CR)  Processors can simultaneously read the same memory location. –Concurrent Write (CW)  Processors can simultaneously write to the same memory location, using some conflict resolution scheme.

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Conflicts Resolution Schemes(2)  Common/Identical CRCW –All processors writing to the same memory location must be writing the same value. –The software must ensure that different values are not attempted to be written.  Arbitrary CRCW –Different values may be written to the same memory location, and an arbitrary one succeeds.  Priority CRCW –An index is associated with the processors and when more than one processor write occurs, the lowest-numbered processor succeeds. –The hardware must resolve any conflicts

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa PRAM Algorithm  Begin with a single active processor active  Two phases: –A sufficient number of processors are activated –These activated processors perform the computation in parallel   log p  activation steps: p processors to become active  The number of active processors can be double by executing a single instruction

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Parallel Reduction (1)

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Parallel Reduction (2) (EREW PRAM Algorithm in Figure2-7, page 32, book [1]) Ex: SUM(EREW) Initial condition: List of n  1 elements stored in A[0..(n-1)] Final condition: Sum of elements stored in A[0] Global variables: n, A[0..(n-1)], j begin spawn (P 0, P 1,…, P  n/2  -1 ) for all P i where 0  i   n/2  -1 do for j  0 to  log n  – 1 do if i modulo 2 j = 0 and 2i+2 j < n then A[2i]  A[2i] + A[2i+2 j ] endif endfor end

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa BSP – Bulk Synchronous Parallel  BSP Model –Proposed by Leslie Valiant of Harvard University –Developed by W.F.McColl of Oxford University Communication Network (g) P M P M P M Node (w)Node Barrier (l)

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa BSP Model  A set of n nodes (processor/memory pairs)  Communication Network –Point-to-point, message passing (or shared variable)  Barrier synchronizing facility –All or subset  Distributed memory architecture

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa BSP Programs  A BSP program: –n processes, each residing on a node –Executing a strict sequence of supersteps –In each superstep, a process executes:  Computation operations: w cycles  Communication: gh cycles  Barrier synchronization: l cycles

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Three Parameters  The basic time unit is a cycle (or time step)  w parameter –Maximum computation time within each superstep –Computation operation takes at most w cycles.  g parameter –Number of cycles for communication of unit message when all processors are involved in communication - network bandwidth –(total number of local operations performed by all processors in one second) / (total number of words delivered by the communication network in one second) – h relation coefficient –Communication operation takes gh cycles.  l parameter –Barrier synchronization takes l cycles.

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa A Figure of BSP Programs Superstep 1 Superstep 2 Barrier P1P2P3P4 Computation Communication Barrier Computation Communication

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Time Complexity of BSP Algorithms  Execution time of a superstep: –Sequence of the computation, the communication, and the synchronization operations: w + gh + l –Overlapping the computation, the communication, and the synchronization operations: max{w, gh, l}

Khoa Khoa Học và Công Nghệ Máy Tính – Trường Đại Học Bách Khoa Phase Parallel  Proposed by Kai Hwang & Zhiwei Xu  Similar to the BSP: –A parallel program: sequence of phases –Next phase cannot begin until all operations in the current phase have finished –Three types of phases:  Parallelism phase: the overhead work involved in process management, such as process creation and grouping for parallel processing  Computation phase: local computation (data are available)  Interaction phase: communication, synchronization or aggregation (e.g., reduction and scan)  Different computation phases may execute different workloads at different speed.