EECE 571R (Spring 2009) Massively parallel/distributed platforms Matei Ripeanu matei at ece.ubc.ca.

Slides:



Advertisements
Similar presentations
LECTURE 1: COURSE INTRODUCTION Xiaowei Yang. Roadmap Why should you take the course? Who should take this course? Course organization Course work Grading.
Advertisements

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Parallelism Lecture notes from MKP and S. Yalamanchili.
“Amdahl's Law in the Multicore Era” Mark Hill and Mike Marty University of Wisconsin IEEE Computer, July 2008 Presented by Dan Sorin.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Distributed Systems CS
CS1104: Computer Organisation School of Computing National University of Singapore.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Lecturer: Simon Winberg Lecture 18 Amdahl’s Law & YODA Blog & Design Review.
University of Wisconsin-Madison © 2008 Multifacet Project Amdahl’s Law in the Multicore Era Mark D. Hill and Michael R. Marty University of Wisconsin—Madison.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
EECE 571R (Spring 2010) Autonomic Computing (Building Self* Systems) Matei Ripeanu matei at ece.ubc.ca.
COMP25212 SYSTEM ARCHITECTURE Antoniu Pop Jan/Feb 2015COMP25212 Lecture 1.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  logistics  why computer organization is important  modern trends.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Lecture 39: Review Session #1 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Computer System Architectures Computer System Software
University of Wisconsin-Madison © 2008 Multifacet Project Amdahl’s Law in the Multicore Era Mark D. Hill and Michael R. Marty Univ. of Wisconsin—Madison.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Performance Evaluation of Parallel Processing. Why Performance?
CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,
Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
Amdahl’s Law in the Multicore Era Mark D.Hill & Michael R.Marty 2008 ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun Ham 2012.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Agenda Project discussion Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design, S. Eyerman, L. Eeckhout, ISCA'10 [pdf]pdf.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
COMP25212: System Architecture Lecturers Alasdair Rawsthorne Daniel Goodman
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
EECE 571e (Fall 2015) (Massively) Parallel Computing Platforms Matei Ripeanu ece.ubc.ca.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Computer Engineering Rabie A. Ramadan Lecture 1. 2 Welcome Back.
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Classic Model of Parallel Processing
Advanced Computer Networks Lecture 1 - Parallelization 1.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
E-Mission + Team of undergraduates = ??? Background and motivation.
Concurrency and Performance Based on slides by Henri Casanova.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
CS203 – Advanced Computer Architecture
Parallel Processing - introduction
The University of Adelaide, School of Computer Science
EE 193: Parallel Computing
The University of Texas at Austin
Introduction, Focus, Overview
T Computer Architecture, Autumn 2005
CSE8380 Parallel and Distributed Processing Presentation
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Introduction, Focus, Overview
Presentation transcript:

EECE 571R (Spring 2009) Massively parallel/distributed platforms Matei Ripeanu matei at ece.ubc.ca

Contact Info ece.ubc.ca Office: KAIS 4033 Office hours: by appointment ( me) Course page:

EECE 571R: Course Goals Primary –Gain understanding of fundamental issues that affect design of: Large-scale systems – massively parallel / massively distributed –Survey main current research themes –Gain experience with distributed systems research Research on: federated system, networks Secondary –By studying a set of outstanding papers, build knowledge of how to do & present research –Learn how to read papers & evaluate ideas

What I’ll Assume You Know Basic Internet architecture –IP, TCP, DNS, HTTP Basic principles of distributed computing –Asynchrony (cannot distinguish between communication failures and latency) –Incomplete & inconsistent global state knowledge (cannot know everything correctly) –Failures happen (in large systems, even rare failures of individual components, aggregate to high failure rates) If there are things that don’t make sense, ask!

Outline Case study (and project ideas): –Amdahl’s Law in the Multi-core Era Administrative: course schedule/organization

Amdahl’s Law in the Multicore Era Acknowledgement:: some slides borrowed form Mark D. Hill, David Patterson, Jim Larus, Saman Amarasinghe, Richard Brunner, Luddy Harrison presentations

Moore’s Laws

Technology & Moore’s Law Transistor 1947 Integrated Circuit 1958 (a.k.a. Chip) Moore’s Law 1964: # Transistors per Chip doubles every two years (or 18 months)

Architects & Another Moore’s Law Microprocessor M transistors ~2000  Popular Moore’s Law: Processor (core) performance doubles every two years

Multicore Chip (a.k.a. Chip Multiprocesors) Why Multicore? Power  simpler structures Memory  Concurrent accesses to tolerate off-chip latency Wires  intra-core wires shorter Complexity  divide & conquer But more cores; NOT faster cores Will effective chip performance keep doubling every two years? Eight 4-way cores 2006

Virtuous Cycle, circa 1950 – 2005 World-Wide Software Market (per IDC): $212b (2005)  $310b (2010) Increased processor performance Larger, more feature-full software Larger development teams Higher-level languages & abstractions Slower programs

Increased processor performance Larger, more feature-full software Larger development teams Higher-level languages & abstractions Slower programs Virtuous Cycle, 2005 – ??? World-Wide Software Market $212b (2005)  ? X GAME OVER — NEXT LEVEL? Thread Level Parallelism & Multicore Chips

Summary: A Corollary to Amdahl’s Law Develop Simple Model of Multicore Hardware –Complements Amdahl’s software model –Fixed chip resources for cores –Core performance improves sub-linearly with resources Shows Need For Research To –Increase parallelism (Are you surprised?) –Increase core performance (especially for larger chips) –Refine asymmetric designs (e.g., one core enhanced) –Refine dynamically harnessing cores for serial performance

Outline Recall Amdahl’s Law A Model of Multicore Hardware Symmetric Multicore Chips Asymmetric Multicore Chips Dynamic Multicore Chips Caveats & Wrap Up

Amdahl’s Law Begins with Simple Software Assumption (Limit Arg.) –Fraction F of execution time perfectly parallelizable –No Overhead for –Scheduling –Synchronization –Communication, etc. –Fraction 1 – F Completely Serial Time on 1 core = (1 – F) / 1 + F / 1 = 1 Time on N cores = (1 – F) / 1 + F / N

Amdahl’s Law [1967] Implications: –Attack the common case when introducing paralelizations: When f is small, optimizations will have little effect. –The aspects you ignore also limit speedup: As N approaches infinity, speedup is bound by 1/(1 – f ). Amdahl’s Speedup = F 1 F N

Amdahl’s Law [1967] Discussion: –Can you ever obtain super-linear speedups? Amdahl’s Speedup = F 1 F N

Amdahl’s Law [1967] For mainframes, Amdahl expected 1 - F = 35% –For a 4-processor speedup = 2 –For infinite-processor speedup < 3 –Therefore, stay with mainframes with one/few processors Q: Does it make sense to design massively multicore chips? What kind? Amdahl’s Speedup = F 1 F N

Designing Multicore Chips Hard Designers must confront single-core design options –Instruction fetch, wakeup, select –Execution unit configuation & operand bypass –Load/queue(s) & data cache –Checkpoint, log, runahead, commit. As well as additional design degrees of freedom –How many cores? How big each? –Shared caches: levels? How many banks? –Memory interface: How many banks? –On-chip interconnect: bus, switched?

Want Simple Multicore Hardware Model To Complement Amdahl’s Simple Software Model (1) Chip Hardware Roughly Partitioned into –(I) Multiple Cores (with L1 caches) –(II) The Rest (L2/L3 cache banks, interconnect, pads, etc.) –Assume: Changing Core Size/Number does NOT change The Rest (2) Resources for Multiple Cores Bounded –Bound of N resources per chip for cores –Due to area, power, cost ($$$), or multiple factors –Bound = Power? (but pictures here use Area)

Want Simple Multicore Hardware Model, cont. (3) Architects can improve single-core performance using more of the bounded resource A Simple Base Core –Consumes 1 Base Core Equivalent (BCE) resources –Provides performance normalized to 1 An Enhanced Core (in same process generation) –Consumes R x BCEs –Performance as a function Perf(R) What does function Perf(R) look like?

More on Enhanced Cores (Performance Perf(R) consuming R BCEs resources) If Perf(R) > R  Always enhance core Cost-effectively speedups both sequential & parallel Therefore, equations assume Perf(R) < R Graphs Assume Perf(R) = square root of R –2x performance for 4 BCEs, 3x for 9 BCEs, etc. –Why? Models diminishing returns with “no coefficients” How to speedup enhanced core? –

Outline Recall Amdahl’s Law A Model of Multicore Hardware Symmetric Multicore Chips Asymmetric Multicore Chips Dynamic Multicore Chips Caveats & Wrap Up

How Many (Symmetric) Cores per Chip? Each Chip Bounded to N BCEs (for all cores) Each Core consumes R BCEs Assume Symmetric Multicore = All Cores Identical Therefore, N/R Cores per Chip — (N/R)*R = N For an N = 16 BCE Chip: Sixteen 1-BCE coresFour 4-BCE cores One 16-BCE core

Performance of Symmetric Multicore Chips Serial Fraction 1-F uses 1 core at rate Perf(R) Serial time = (1 – F) / Perf(R) Parallel Fraction uses N/R cores at rate Perf(R) each Parallel time = F / (Perf(R) * (N/R)) = F*R / Perf(R)*N Therefore, w.r.t. one base core: Implications? Symmetric Speedup = F Perf(R) F * R Perf(R)*N Enhanced Cores speed Serial & Parallel

Symmetric Multicore Chip, N = 16 BCEs F=0.5, Opt. Speedup S = 4 = 1/(0.5/ *16/(4*16)) Need to increase parallelism to make multicore optimal! (16 cores)(8 cores)(2 cores)(1 core) F=0.5 R=16, Cores=1, Speedup=4 (4 cores)

Symmetric Multicore Chip, N = 16 BCEs At F=0.9, Multicore optimal, but speedup limited Need to obtain even more parallelism! F=0.5 R=16, Cores=1, Speedup=4 F=0.9, R=2, Cores=8, Speedup=6.7

Symmetric Multicore Chip, N = 16 BCEs F matters: Amdahl’s Law applies to multicore chips Researchers should target parallelism F first F  1, R=1, Cores=16, Speedup  16

Symmetric Multicore Chip, N = 16 BCEs As Moore’s Law enables N to go from 16 to 256 BCEs, More core enhancements? More cores? Or both? Recall F=0.9, R=2, Cores=8, Speedup=6.7

Symmetric Multicore Chip, N = 256 BCEs As Moore’s Law increases N, often need enhanced core designs Researcher should target single-core performance too F=0.9 R=28 (vs. 2) Cores=9 (vs. 8) Speedup=26.7 (vs. 6.7) CORE ENHANCEMENTS! F  1 R=1 (vs. 1) Cores=256 (vs. 16) Speedup=204 (vs. 16) MORE CORES! F=0.99 R=3 (vs. 1) Cores=85 (vs. 16) Speedup=80 (vs. 13.9) CORE ENHANCEMENTS & MORE CORES!

Outline Recall Amdahl’s Law A Model of Multicore Hardware Symmetric Multicore Chips Asymmetric Multicore Chips Dynamic Multicore Chips Caveats & Wrap Up

Asymmetric (Heterogeneous) Multicore Chips Symmetric Multicore Required All Cores Equal Why Not Enhance Some (But Not All) Cores? For Amdahl’s Simple Software Assumptions –One Enhanced Core –Others are Base Cores How? – –Model ignores design cost of asymmetric design How does this effect our hardware model?

How Many Cores per Asymmetric Chip? Each Chip Bounded to N BCEs (for all cores) One R-BCE Core leaves N-R BCEs Use N-R BCEs for N-R Base Cores Therefore, 1 + N - R Cores per Chip For an N = 16 BCE Chip: Symmetric: Four 4-BCE cores Asymmetric: One 4-BCE core & Twelve 1-BCE base cores

Performance of Asymmetric Multicore Chips Serial Fraction 1-F same, so time = (1 – F) / Perf(R) Parallel Fraction F –One core at rate Perf(R) –N-R cores at rate 1 –Parallel time = F / (Perf(R) + N - R) Therefore, w.r.t. one base core: Asymmetric Speedup = F Perf(R) F Perf(R) + N - R

Asymmetric Multicore Chip, N = 256 BCEs Number of Cores = 1 (Enhanced) – R (Base) How do Asymmetric & Symmetric speedups compare? (256 cores)(253 cores)(193 cores)(1 core) (241 cores)

Recall Symmetric Multicore Chip, N = 256 BCEs Recall F=0.9, R=28, Cores=9, Speedup=26.7

Asymmetric Multicore Chip, N = 256 BCEs Asymmetric offers greater speedups potential than Symmetric As Moore’s Law increases N, Asymmetric gets better Some researchers should target developing asymmetric multicores F=0.9 R=118 (vs. 28) Cores= 139 (vs. 9) Speedup=65.6 (vs. 26.7) F=0.99 R=41 (vs. 3) Cores=216 (vs. 85) Speedup=166 (vs. 80)

Outline Recall Amdahl’s Law A Model of Multicore Hardware Symmetric Multicore Chips Asymmetric Multicore Chips Dynamic Multicore Chips Caveats & Wrap Up

Dynamic Multicore Chips Why NOT Have Your Cake and Eat It Too? N Base Cores for Best Parallel Performance Harness R Cores Together for Serial Performance How? DYNAMICALLY Harness Cores Together – parallel mode sequential mode How would one model this chip?

Performance of Dynamic Multicore Chips N Base Cores Where R Can Be Harnessed Serial Fraction 1-F uses R BCEs at rate Perf(R) Serial time = (1 – F) / Perf(R) Parallel Fraction F uses N base cores at rate 1 each Parallel time = F / N Therefore, w.r.t. one base core: Dynamic Speedup = F Perf(R) F N

Recall Asymmetric Multicore Chip, N = 256 BCEs What happens with a dynamic chip? Recall F=0.99 R=41 Cores=216 Speedup=166

Dynamic Multicore Chip, N = 256 BCEs Dynamic offers greater speedup potential than Asymmetric Researchers should target dynamically harnessing cores F=0.99 R=256 (vs. 41) Cores=256 (vs. 216) Speedup=223 (vs. 166) Note: #Cores always N=256

Outline Recall Amdahl’s Law A Model of Multicore Hardware Symmetric Multicore Chips Asymmetric Multicore Chips Dynamic Multicore Chips Caveats & Wrap Up

Three Multicore Amdahl’s Law Symmetric Speedup = F Perf(R) F * R Perf(R)*N Asymmetric Speedup = F Perf(R) F Perf(R) + N - R Dynamic Speedup = F Perf(R) F N N/R Enhanced Cores Parallel Section N Base Cores 1 Enhanced & N-R Base Cores Sequential Section 1 Enhanced Core

Software Model Charges 1 of 2 Serial fraction not totally serial Can extend software model to tree algorithms, etc. Parallel fraction not totally parallel Can extend for varying or bounded parallelism Serial/Parallel fraction may change Can extend for Weak Scaling [Gustafson, CACM’88] Run larger, more parallel problem in constant time

Software Model Charges 2 of 2 Synchronization, communication, scheduling effects? Can extend for overheads and imbalance Software challenges for asymmetric multicore worse Can extend for asymmetric scheduling, etc. Software challenges for dynamic multicore greater Can extend to model overheads to facilitate

Hardware Model Charges Naïve to consider total resources for cores fixed Can extend hardware model to how core changes effect ‘The Rest’ Naïve to bound Cores by one resource (esp. area) Can extend for Pareto optimal mix of area, power, complexity, reliability, … Naïve to ignore challenges due to off-chip bandwidth limits & benefits of last-level caching Can extend for modeling these Naïve to use performance = square root of resources Can extend as equations can use any function

Just because the model is simple Does NOT mean the conclusions are wrong Prediction –While the truth is more complex –These basic observations will hold

Dynamic Multicore Chip, N = 1024 BCEs F  1 R  1024 Cores  1024 Speedup  1024! NOT Possible Today NOT Possible EVER Unless We Dream & Act

Summary so far: A Corollary to Amdahl’s Law Developed Simple Model of Multicore Hardware –Complements Amdahl’s software model –Fixed chip resources for cores –Core performance improves sub-linearly with resources Show Need For Research To –Increase parallelism (Are you surprised?) –Increase core performance (especially for larger chips) –Refine asymmetric design (e.g., one core enhanced) –Refine dynamically harnessing cores for serial performance

What about clusters / clouds / massively distributed platforms?

Amdahl’s Law – for multi-processor/distributed Begins with Simple Software Assumption (Limit Arg.) –Fraction F of execution time perfectly parallelizable –No Overhead for –Scheduling –Synchronization –Communication, etc. –Fraction 1 – F Completely Serial Amdahl’s Speedup = F 1 F N

Clusters / Distributed systems

This class Weekly schedule (tentative)

Course Organization/Syllabus/etc.

Administravia: Course structure Paper discussion ( most classes ) & lectures Student projects –Aim high! Have fun! It’s a class project, not your PhD! –Teams of up to 3 students –Project presentations at the end of the term

Administravia: Grading Paper reviews:35% Class participation 10% Discussion leading: 10% Project: 45%

Administravia:Paper Reviewing (1) Goals: –Think of what you read –Expand your knowledge beyond the papers that are assigned –Get used to writing paper reviews Reviews due by midnight the day before the class Be professional in your writing Have an eye on the writing style: –Clarity –Beware of traps: learn to use them in writing and detect them in reading –Detect (and stay away from) trivial claims. E.g., 1 st sentence in the Introduction: “The tremendous/unprecedented/phenomenal growth/scale/ubiquity of the Internet…”

Administravia: Paper Reviewing (2) Follow the form provided when relevant. State the main contribution of the paper Critique the main contribution: Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two. Rate how convincing the methodology is. Do the claims and conclusions follow from the experiments? Are the assumptions realistic? Are the experiments well designed? Are there different experiments that would be more convincing? Are there other alternatives the authors should have considered? (And, of course, is the paper free of methodological errors?)

Administravia: Paper Reviewing (3) What is the most important limitation of the approach? What are the three strongest and/or most interesting ideas in the paper? What are the three most striking weaknesses in the paper? Name three questions that you would like to ask the authors. Detail an interesting extension to the work not mentioned in the future work section. Optional comments on the paper that you’d like to see discussed in class.

Administravia: Discussion leading Come prepared! –Prepare discussion outline –Prepare questions: “What if”s Unclear aspects of the solution proposed … –Similar ideas in different contexts –Initiate short brainstorming sessions Leaders do NOT need to submit paper reviews Main goals: –Keep discussion flowing –Keep discussion relevant –Engage everybody (I’ll have an eye on this, too)

Administravia: Projects Combine with your research if relevant to the class Get approval from all instructors if you overlap final projects: –Don’t sell the same piece of work twice –You can get more than twice as many results with less than twice as much work Aim high! –Put one extra month and get a publication out of it –It is doable! Try ideas that you postponed out of fear: it’s just a class, not your PhD.

Administravia: Project deadlines (tentative) 2 nd week – 3 slides idea presentation: “What is the research question you aim to answer” 4 th week: 2-page project proposal 6 th week: 3-page related work survey –Know relevant work in your problem area –If implementation project, list tools, similar projects –Expand proposal 8 th week: 4-page Midterm project due –Have a clear image of what’s possible/doable –Report preliminary results [see schedule] In-class project presentation –Demo, if appropriate Last week of exam session: –10-page write-up

Next Class (Mon, 19/01) Note room change: KAIS Discussion of –Project ideas –Papers To do: Subscribe to mailing list Volunteers for discussion leaders for class next week

Questions?

Backup Slides

Cost-Effective Parallel Computing Isn’t Speedup(P) < P inefficient? (P = processors) If only throughput matters, use P computers instead? But much of a computer’s cost is NOT in the processor [Wood & Hill, IEEE Computer 2/95] Let Costup(P) = Cost(P)/Cost(1) Parallel computing cost-effective: Speedup(P) > Costup(P) E.g. for SGI PowerChallenge w/ 500MB: Costup(32) = 8.6

Symmetric Multicore Chip, N = 16 BCEs

Symmetric Multicore Chip, N = 256 BCEs

Symmetric Multicore Chip, N = 1024 BCEs

Asymmetric Multicore Chip, N = 16 BCEs

Asymmetric Multicore Chip, N = 256 BCEs

Asymmetric Multicore Chip, N = 1024 BCEs

Dynamic Multicore Chip, N = 16 BCEs

Dynamic Multicore Chip, N = 256 BCEs

Dynamic Multicore Chip, N = 1024 BCEs