Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Core Designs for Chip Multiprocessors

Similar presentations


Presentation on theme: "Exploring Core Designs for Chip Multiprocessors"— Presentation transcript:

1 Exploring Core Designs for Chip Multiprocessors
Allison Holloway Matthew Allen

2 Outline Motivation Hypotheses Methodology Results Conclusions

3 Motivation What should core of a CMP look like?
Workloads: commercial, scientific OOO wide-issue superscalar? Tradeoffs: Performance, Power, Area, Complexity

4 Hypotheses Commercial workloads will not benefit much from OOO / wide-issue Scientific workloads will benefit significantly from OOO / wide-issue OOO & wide-issue will be less beneficial for larger scale systems Augmenting an in-order processor with non-blocking caches will close OOO gap

5 Methodology Simulator: Multifacet, Ruby, Opal (OOO)
In-order processor model Looked at Simics functional – not comparable Restrict Opal to in-order issue Register renaming not removed Limitations: Can’t recompile code for scheduling Does not model UltraSPARC issue rules

6 Methodology Workloads Issues Commercial: Apache, SPECjbb, OLTP, Zeus
Scientific: Barnes-Hut, Ocean Issues No 4 processor simulation No cache warmup files

7 Methodology Baseline configuration used
ROB, instruction window, and # functional units halved for 2-wide processor

8 Results OOO vs. in-order provides more performance benefit than widening issue from 2 to 4 Tolerating cache misses is the key

9

10 Results Hypothesis 1: Commercial workloads will not benefit much from OOO / wide-issue ~30% speedup Hypothesis 2: Scientific workloads will benefit significantly from OOO / wide-issue ~60% speedup Commercial workloads DO benefit from OOO, but not as much as scientific.

11

12 Results OOO & wide-issue will be less beneficial for larger scale systems True, BUT Workloads don’t scale above 8 processors (except apache)

13

14

15 (Non) Results Hypothesis 4: Augmenting an in-order processor with non-blocking caches will close OOO gap Simulations still running!

16 Future Work Analyze performance trade-offs
vs. power? vs. area? 4 processor runs (if possible) Vary # of MSHRs

17 Conclusions Out-of-order provides substantial benefit over in-order, even for commercial workloads Other methods for tolerating/reducing cache misses may be effective Diminishing returns for larger systems, but workloads don’t scale well Need to consider power and area constraints


Download ppt "Exploring Core Designs for Chip Multiprocessors"

Similar presentations


Ads by Google