Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Today’s topics Single processors and the Memory Hierarchy
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
History of Distributed Systems Joseph Cordina
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
CS 470/570:Introduction to Parallel and Distributed Computing.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 6.
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Early Adopter: Integration of Parallel Topics into the Undergraduate CS Curriculum at Calvin College Joel C. Adams Chair, Department of Computer Science.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Outline Why this subject? What is High Performance Computing?
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Lecture 3: Computer Architectures
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 9.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 11.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 6, 2006 Session 22.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Lecture 5 Approaches to Concurrency: The Multiprocessor
CS5102 High Performance Computer Systems Thread-Level Parallelism
buses, crossing switch, multistage network.
Parallel Processing - introduction
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Guoliang Chen Parallel Computing Guoliang Chen
buses, crossing switch, multistage network.
AN INTRODUCTION ON PARALLEL PROCESSING
Advanced Computer and Parallel Processing
Chapter 4 Multiprocessors
Advanced Computer and Parallel Processing
Introduction, background, jargon
Lecture 23: Virtual Memory, Multiprocessors
Presentation transcript:

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29

Computer Science and Engineering Copyright by Hesham El-Rewini Contents Group workExams AssignmentsProject Presentations Literature Search Lectures

Computer Science and Engineering Copyright by Hesham El-Rewini Put-it-all-together Memory System Design Pipeline Design Techniques Multiprocessors Shared Memory Systems Message Passing Systems Multiprocessor Systems-on-Chips Network Computing

Computer Science and Engineering Copyright by Hesham El-Rewini Put-it-all-together Memory System Design

Computer Science and Engineering Copyright by Hesham El-Rewini Memory Hierarchy CPU Registers Cache Main Memory Secondary Storage Latency Bandwidth Speed Cost per bit

Computer Science and Engineering Copyright by Hesham El-Rewini Pentium IV two-level cache Cache Level 1 L1 Cache Level 2 L2 Main Memory Processor

Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies How to Map memory blocks (lines) to Cache block frames (line frames) Blocks (lines) Block Frames (Line Frames) Memory Cache n Direct Mapping n Fully Associative n Set Associative

Computer Science and Engineering Copyright by Hesham El-Rewini Direct Mapping Memory Tag cache bits TagBlock frameWord 475

Computer Science and Engineering Copyright by Hesham El-Rewini Example – Fully Associate Memory Tag cache 12 bits TagWord 412

Computer Science and Engineering Copyright by Hesham El-Rewini Example – Set Associate Set 0 Tag cache 7 bits Set Memory TagSetWord 57

Computer Science and Engineering Copyright by Hesham El-Rewini Put-it-all-together Pipeline Design Techniques

Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Task 1 2 n Sub-tasks 1 2 n Pipeline Stream of Tasks

Computer Science and Engineering Copyright by Hesham El-Rewini 5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task Time

Computer Science and Engineering Copyright by Hesham El-Rewini Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1

Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline  Processing Stages are linearly connected  Perform fixed function  Synchronous Pipeline  Clocked latches between Stage i and Stage i+1  Equal delays in all stages  Asynchronous Pipeline (Handshaking)

Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table X X X X S1 S2 S3 S4 Time

Computer Science and Engineering Copyright by Hesham El-Rewini Non Linear Pipelines  Variable functions  Feed-Forward  Feedback

Computer Science and Engineering Copyright by Hesham El-Rewini 3 stages & 2 functions S1 S2 S3 Y X

Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Tables for X & Y XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3

Computer Science and Engineering Copyright by Hesham El-Rewini State Diagram *3* 1*1*

Computer Science and Engineering Copyright by Hesham El-Rewini Put-it-all-together Multiprocessors Shared Memory Systems Message Passing Systems Multiprocessor Systems-on-Chips Network Computing

Computer Science and Engineering Copyright by Hesham El-Rewini Types of Parallelism Single Data Stream Multiple Data Stream Single Instruction Stream SISD Uniprocessors SIMD Array Processors Vector Multiple Instruction Stream MISDMIMD Multiprocessors Multicomputers Flynn’s Taxonomy

Computer Science and Engineering Copyright by Hesham El-Rewini Walk 4 miles /hour Bike 10 miles / hour Car-1 50 miles / hour Car miles / hour Car miles /hour 200 miles 20 hours A B must walk Amdhal’s Law

Computer Science and Engineering Copyright by Hesham El-Rewini 10%20%30%40%50%60%70%80%90%99% Speedup % Serial 1000 CPUs 16 CPUs 4 CPUs Amdahl’s Law

Computer Science and Engineering Copyright by Hesham El-Rewini Gustafson – Barsis Law (1988)  Gordon Bell Prize  Overcoming the conceptual barrier established by Amdahl’s law  Scale the problem to the size of the parallel system  No fixed size problem

Computer Science and Engineering Copyright by Hesham El-Rewini %20%30%40%50%60%70%80%90%99% % Serial Speedup Gustafson-Barsis Amdhal Amdahl vs. Gustafson-Barsis

Computer Science and Engineering Copyright by Hesham El-Rewini SIMD Systems Processor Memory P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M von Neumann Computer Some Interconnection Network One control unit Lockstep All Ps do the same or nothing

Computer Science and Engineering Copyright by Hesham El-Rewini MIMD Shared Memory Systems Interconnection Networks MM MM PPPPP P C P C P C P C MMMM Global Memory P C P C P C One global memory Cache Coherence All Ps have equal access to memory

Computer Science and Engineering Copyright by Hesham El-Rewini Cache Coherent NUMA Interconnection Network M C P M C P M C P M C P Each P has part of the shared memory Non uniform memory access

Computer Science and Engineering Copyright by Hesham El-Rewini MIMD Distributed Memory Systems Interconnection Networks MMMM PPPP S LAN/WAN No shared memory Message Passing Topology

Computer Science and Engineering Copyright by Hesham El-Rewini Cluster Architecture M C P I/O OS M C P I/O OS M C P I/O OS Middleware Programming Environment Interconnection Network Home cluster

Computer Science and Engineering Copyright by Hesham El-Rewini Internet Grids Dependable, consistent, pervasive, and inexpensive access to high end computing. Geographically distributed platforms.

Computer Science and Engineering Copyright by Hesham El-Rewini Multi-core Gate delay does not reduce much The frequency and performance of each core is the same or a little less than previous generation Generation N Generation N Generation N Technology Generation N Technology Generation N+1

Computer Science and Engineering Copyright by Hesham El-Rewini Increasing HW Threads HT Multi-core Era Scalar and Parallel Applications Many-core Era Massively Parallel Applications From HT to Many-Core Intel predicts 100’s of cores on a chip in 2015

Computer Science and Engineering Copyright by Hesham El-Rewini Four Eras Beyond 2000 Parallelism Level Processor level Machine level (In box) LAN levelWAN levelChip level ArchitectureVectorSMP / MPPClusterGridMulti-Core ThreadsOneMultiple Interconnection Network NoneBus, switch, mesh, hypercube Ethernet, Switch InternetOn Chip SystemCustom CommodityCombinationSoC ProgrammingVector Fortran C*, C-Linda, Occam, many others PVM, MPI, HPF, … MPI, OpenMP, … ?

Computer Science and Engineering Copyright by Hesham El-Rewini Degree of Coupling SIMDMIMD Shared Memory Distributed Memory Supported Grain Sizes Communication Speed slowfast finecoarse loose tight SIMDSMPCC-NUMADMPCClusterGridOn Chip!

Computer Science and Engineering Copyright by Hesham El-Rewini Good Luck to You!!!