Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Slides:

Advertisements

Similar presentations

© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:

Advertisements

PIPELINE AND VECTOR PROCESSING

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Computer Organization and Architecture

CSCI 4717/5717 Computer Architecture

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 6: Multicore Systems

The University of Adelaide, School of Computer Science

Pipeline and Vector Processing (Chapter2 and Appendix A)

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.

Chapter 12 Pipelining Strategies Performance Hazards.

Chapter 17 Parallel Processing.

Chapter 12 CPU Structure and Function. Example Register Organizations.

1 Pertemuan 25 Parallel Processing 1 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.

Introduction to Parallel Processing Ch. 12, Pg

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.

Advanced Computer Architectures

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13

Multi-core architectures. Single-core computer Single-core CPU chip.

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Multi-core.  What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.

ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.

The fetch-execute cycle. 2 VCN – ICT Department 2013 A2 Computing RegisterMeaningPurpose PCProgram Counter keeps track of where to find the next instruction.

Classic Model of Parallel Processing

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

Processor Architecture

Outline Why this subject? What is High Performance Computing?

Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010

EKT303/4 Superscalar vs Super-pipelined.

Pipelining Pipelining is a design feature that allows multiple instructions to be run simultaneously. Speeds up the execution of instruction processing.

Lecture 3: Computer Architectures

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Classification of parallel computers Limitations of parallel processing.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Processor Level Parallelism 1

DICCD Class-08. Parallel processing A parallel processing system is able to perform concurrent data processing to achieve faster execution time The system.

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Advanced Architectures

CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.

Computer Architecture Chapter (14): Processor Structure and Function

Distributed Processors

William Stallings Computer Organization and Architecture 8th Edition

Parallel Processing - introduction

CS 147 – Parallel Processing

Morgan Kaufmann Publishers

Chapter 17 Parallel Processing

Symmetric Multiprocessing (SMP)

AN INTRODUCTION ON PARALLEL PROCESSING

Computer Architecture

Chapter 4 Multiprocessors

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

COMPUTER ORGANIZATION AND ARCHITECTURE

Presentation transcript:

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely true.  Instruction pipelining (micro-instruction level parallelism)  superscalar organization (instruction- level parallelism)

Parallelism in Uniprocessor System  Multiple functional units  Pipelining within the CPU  Overlapped CPU and I/O operations  Use of hierarchical memory system  Multiprogramming and time sharing  Use of hierarchical bus system (balancing of subsystem bandwidths)

Pipelining Strategy  Instruction pipelining is similar to assembly lines in industrial plant (divide the task into subtasks, each of which can be executed by a special hardware concurrently)  Instruction cycle has a number of stages: Fetch instruction (FI) Decode instruction (DI) Calculate operands effective addresses (CO) Fetch operands (FO) Execute instructions (EI) Write result (WO)

Timing Diagram for Instruction Pipeline Operation

Assumptions  Each instruction goes through all six stages of the pipeline  equal duration  All of the stages can be performed in parallel (no resources conflict)

Improved Performance  But not doubled: Fetch usually shorter than execution Any jump or branch means that prefetched instructions are not the required instructions  Add more stages to improve performance

Pipeline Hazards  Pipeline, or some portion of pipeline, must stall. Also called pipeline bubble  Types of hazards Resource Data Control

Resource Hazards  Two (or more) instructions in pipeline need same resource  Executed in serial rather than parallel for part of pipeline  Operand read or write cannot be performed in parallel with instruction fetch  One solution: increase available resources Multiple main memory ports Multiple ALUs

Resource Hazard Diagram

Data Hazards  Conflict in access of an operand location  Two instructions to be executed in sequence  Both access a particular memory or register operand  In a pipeline, operand value could be updated so as to produce different result from strict sequential execution  E.g. x86 machine instruction sequence: ADD EAX, EBX /* EAX = EAX + EBX SUB ECX, EAX /* ECX = ECX – EAX

Data Hazard Diagram

Control Hazard  Also known as branch hazard  Brings instructions into pipeline that must subsequently be discarded

The Effect of a Conditional Branch on Instruction Pipeline Operation

Superscalar Organization  There are multiple execution units within a single processor  May execute multiple instructions from the same program in parallel  Ability to execute instructions in different pipelines independently and concurrently  Allowing instructions to be executed in an order different from the program order

General Superscalar Organization

Types of Parallel Processor Systems  Single instruction, single data stream - SISD  Single instruction, multiple data stream - SIMD  Multiple instruction, single data stream - MISD  Multiple instruction, multiple data stream- MIMD

Single Instruction, Single Data Stream - SISD  Single processor  Single instruction stream  Data stored in single memory Example: Uniprocessor

Parallel Organizations - SISD CU: Control unit IS: Instruction stream PU: Processing unit DS: Data stream MU: Memory unit LM: Local memory

Single Instruction, Multiple Data Stream - SIMD  Single machine instruction  Controls simultaneous execution  Number of processing elements  Each processing element has associated data memory  Each instruction executed on different set of data by different processors Example: Vector and array processors

Parallel Organizations - SIMD

Multiple Instruction, Single Data Stream - MISD  Sequence of data  Transmitted to set of processors  Each processor executes different instruction sequence  Never been implemented

Multiple Instruction, Multiple Data Stream- MIMD  Set of processors  Simultaneously execute different instruction sequences  Different sets of data  Examples: SMPs (symmetric multiprocessors ) Clusters NUMA (nonuniform memory access) systems

Parallel Organizations - MIMD Shared Memory

Parallel Organizations - MIMD Distributed Memory

Taxonomy of Parallel Processor Architectures

Symmetric Multiprocessor Organization

SMP Advantages  Performance If some work can be done in parallel  Availability Since all processors can perform the same functions, failure of a single processor does not halt the system  Incremental growth User can enhance performance by adding additional processors  Scaling Vendors can offer range of products based on number of processors

Multicore Organization  Number of core processors on chip  Number of levels of cache on chip  Amount of shared cache  Examples: (a) ARM11 MPCore (b) AMD Opteron (c) Intel Core Duo (d) Intel Core i7

Multicore Organization Alternatives

Performance: Amdahl’s Law  Potential speed up of program using multiple processors  Concluded that: Code needs to be parallelizable Speed up is bound  Task dependent Servers gain by maintaining multiple connections on multiple processors Databases can be split into parallel tasks

Amdahl’s Law Formula  Conclusions f small, parallel processors has little effect N → ∞, speedup bound by 1/(1 – f)  For program running on single processor Fraction f of code infinitely parallelizable. Fraction (1-f) of code inherently serial T is total execution time for program on single processor N is number of processors that fully exploit parallel portions of code

RQ: 12.5 P: 12.5, 12.8 RQ: 17.1, 17.3