Presentation on theme: "“Amdahl's Law in the Multicore Era” Mark Hill and Mike Marty University of Wisconsin IEEE Computer, July 2008 Presented by Dan Sorin."— Presentation transcript:
“Amdahl's Law in the Multicore Era” Mark Hill and Mike Marty University of Wisconsin IEEE Computer, July 2008 Presented by Dan Sorin
2ECE 259 / CPS 221 Introduction Multicore is here architects need to cope Time to re-visit Amdahl’s Law Speedup = 1/ [(1-f) + f/s] f = fraction of computation that’s parallel s = speedup on parallel fraction Goal of paper is to gain insights –Not actually a “research paper”, per se
3ECE 259 / CPS 221 System Model & Assumptions Chip contains fixed number, say N, of “base core equivalents” (BCEs) Can construct more powerful cores by fusing BCEs –Performance of core is function of number of BCEs it uses –Perf(1) < Perf (R) < R –In paper, assume Perf(R) = sqrt(R) –Why doesn’t Perf(R) = R? Homogeneous vs heterogeneous cores –Homogeneous: N/R cores per chip –Heterogeneous: 1 + (N-R) cores per chip Rest of paper ignores/abstracts many issues –Shared caches (L2 and beyond), interconnection network
4ECE 259 / CPS 221 Homogeneous Cores Reminder: N/R cores per chip Data in Figures 2a & 2b shows: –Speedups are often depressingly low, especially for large R –Even for large values of f, speedups are low What’s intuition behind results? –For small R, chip performs poorly on sequential code –For large R, chip performs poorly on parallel code
5ECE 259 / CPS 221 Heterogeneous Cores Reminder: 1 big core + (N-R) minimal cores per chip Data in Figures 2c & 2d shows: –Speedups are much better than for homogeneous cores –But still not doing great on parallel code What’s intuition behind results? –For large f, can’t make good use of big core
6ECE 259 / CPS 221 Somewhat Obvious Next Step If homogeneous isn’t great and heterogeneous isn’t always great, can we dynamically adjust to workload? Assign more BCEs to big core when sequential –When parallel code, no need for big core Data in Figures 2e and 2f show: –Yup, this was a good idea (best of both worlds) Is this realistic, though?
7ECE 259 / CPS 221 Conclusions Just because world is now multicore, we can’t forget about single-core performance –Aside: interesting observation from a traditionally MP group Cost-effectiveness matters –Sqrt(R) may seem bad, but may actually be fine Amdahl is still correct – we’re limited by f Dynamic provisioning of resources, if possible, is important
8ECE 259 / CPS 221 Questions/Concerns Is this model too simplistic to be insightful? –Abstractions can be good, but can also be misleading –For example, this paper focuses on cores, when the real action is in the memory system and interconnection network –Concrete example: more cores require more off-chip memory bandwidth having more cores than you can feed isn’t going to help you Are the overheads for dynamic reconfiguration going to outweigh its benefits? –CoreFusion paper does this, but it ain’t cheap or easy What if breakthrough in technology (e.g., from Prof. Dwyer’s research) removes the power wall? –Do we go back to big uniprocessors?