CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June-22-2006 Class Will Start Momentarily… Homework.

Slides:

Advertisements

Similar presentations

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (

Distributed Systems CS

SE-292 High Performance Computing

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.

Extending the Unified Parallel Processing Speedup Model Computer architectures take advantage of low-level parallelism: multiple pipelines The next generations.

Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

CS 240A: Complexity Measures for Parallel Computation.

Programming with CUDA WS 08/09 Lecture 13 Thu, 04 Dec, 2008.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Introduction to Parallel Processing Ch. 12, Pg

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

CS 470/570:Introduction to Parallel and Distributed Computing.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2001, 2004, 2005, 2006, 2008, Dr. Ken Hoganson CS8625-June-2-08 Class Will Start Momentarily…

18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.

N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.

1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.

Performance Evaluation of Parallel Processing. Why Performance?

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Lecture 13: Multiprocessors Kai Bu

Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.

Wednesday, March 09 Homework #4 Homework #4 Questions? Questions? Solutions will be posted Solutions will be posted Quiz #4 Quiz #4 Later today Later today.

From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.

Classic Model of Parallel Processing

CS 8625 High Performance Computing Dr. Hoganson Copyright © 2003, Dr. Ken Hoganson CS8625 Class Will Start Momentarily… CS8625 High Performance.

Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.

Outline Why this subject? What is High Performance Computing?

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Balance Point The basis for the argument against “putting all your (speedup)

Parallel Processing Presented by: Wanki Ho CS147, Section 1.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.

Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.

1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Self-Tuned Distributed Multiprocessor System Xiaoyan Bi CSC Operating Systems Dr. Mirela Damian.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.

Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.

1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.

Classification of parallel computers Limitations of parallel processing.

The University of Adelaide, School of Computer Science

Lecture 13: Multiprocessors Kai Bu

These slides are based on the book:

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

18-447: Computer Architecture Lecture 30B: Multiprocessors

Parallel Processing - introduction

CS 147 – Parallel Processing

Morgan Kaufmann Publishers

Multi-Processing in High Performance Computer Architecture:

Complexity Measures for Parallel Computation

CSE8380 Parallel and Distributed Processing Presentation

AN INTRODUCTION ON PARALLEL PROCESSING

COMP60621 Fundamentals of Parallel and Distributed Systems

By Brandon, Ben, and Lee Parallel Computing.

Chapter 4 Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June Class Will Start Momentarily… Homework & Midterm Review CS8625 High Performance and Parallel Computing Dr. Ken Hoganson

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Balance Point The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law Note the balance point in the denominator where both parts are equal. Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Balance Point Heuristic Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup. Solved for N  N= α α Solved for α  α= N N + 1

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Balance Point Example Parallel Fraction = 90% (10% in serial) NAlpha/N1-alphaSpeedup / /( ) = /( )= /( )= /( )= /( )= /( )= 8.77 infinity /( )= 10 Solved for N  N= α α N=0.90/0.10=9, Sup=5

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Example Example: Workload has an average alpha of 94%. How many processors can reasonably be applied to speedup this workload? Solved for N  N= α α

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Example Example: An architecture has 32 processors. What workload parallel fraction is the minimum need to make reasonably efficient use of the processors? Solved for α  α= N N + 1

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Multi-Bus Multiprocessors Shared-Memory Multiprocessors are very fast –Low latency to memory on bus –Low communication overhead through shared-memory Scalability problems –Length of bus slows signals (.75 SOL) –Contention for the bus reduces performance –Requires Cache to reduce contention CPU MEM

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Bus Contention Multiple devices – processors, etc, compete for access to a bus Only one device can use a bus at a time, limiting performance and scalability 1 – zero requests – exactly one request = probability of 2 or more (at least one blocked request)

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Performance degrades as requests are blocked Resubmitted blocked requests degrades performance even further than that shown above N=4N=8N=16 R r (1-r)^n Nr(1-r)^(n-1) Blocked

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Clearly, the probability that a processor’s access to a shared bus will be denied will increase with both: The number of processors sharing a bus The probability a processor will need access to the bus. What can be done? What is the “universal band- aid” for performance problems?

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson If cache greatly reduces access to mem, then Blocking rate on the bus is much lower. N=4N=8N=16 R r (1-r)^n Nr(1-r)^(n-1) Blocked

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Two approaches to improving shared memory/bus machine performance: Invest in large amounts, and multiple levels of, cache, –and a connection network to allow caches to synchronize contents. Invest in multiple buses and independently accessible blocks of memory Combining both may be the best strategy.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Homework Your project is to explore the effect on the performance of a shared-memory bus-based multiprocessor, of interconnection network contention. You will do some calculations, use the HPPAS simulator, and write a couple-page report to turn in.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Task 1 For a machine with processors that include on- chip cache that yield a cache hit rate of 90%, determine the maximum number of processors that can go on a single shared-bus, and still maintain at least a 98% acceptance of requests. Use the calculations shown in the lecture to zero in on the correct answer, recording your calculations in a table for your report. Show each step of the calculation as was done in the lecture/ppt. Your results should “bracket” the maximum.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Task 1 Task 1: Use the formula in the table to find N=4N=8N=16N=? R=10% r0.90 (1-r)^n Nr(1-r)^(n-1) Blocked 1 - 0Req - 1Req

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Task 2 Use the maximum number of processors (Task 1) and Amdahl’s law at the balance point, to figure out what workload parallel fraction yields a balance in the denominator. Determine the theoretical speedup that will be obtained. Solved for α  α= N N + 1

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Task 3 Use the data values developed so far, to run the HPPAS simulation system. Record the speedup obtained from this system. If it differs markedly from the theoretical value, check all the settings, and rerun the simulation, and explain any variation from the theoretical expected value. Record your results in your report, showing each step of the calculation as was done in the lecture/ppt.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Dates The current plan: Make the midterm available on Friday June 23. Due date will be July 10 (after the conference and after the July 4 th weekend). Conference week: Complete homework: Due on July 3 by . Work on Midterm exam. No class lecture on June 27 and 29. No class on July 4. Next live class is Wed July 6.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson Topic Overview Overview of topics for the exam: Five parallel levels Problems to be solved for parallelism Limitations to parallel speedup Amdahl’s Law: theory, implications Limiting factors in realizing parallel performance Pipelines and their performance issues Flynn’s classification SIMD architectures SIMD algorithms Elementary analysis of algorithms MIMD: Multiprocessors and Multicomputers Balance point and heuristic (from Amdahl’s Law) Bus contention and analysis of single shared bus. Use of the online HPPAS tool. Specific multiprocessor clustered architectures: –Compaq –DASH –Dell Blade Cluster

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson End of Lecture End Of Today’s Lecture.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson This slide left intentionally blank.