Download presentation
Presentation is loading. Please wait.
1
Exploring Core Designs for Chip Multiprocessors
Allison Holloway Matthew Allen
2
Outline Motivation Hypotheses Methodology Results Conclusions
3
Motivation What should core of a CMP look like?
Workloads: commercial, scientific OOO wide-issue superscalar? Tradeoffs: Performance, Power, Area, Complexity
4
Hypotheses Commercial workloads will not benefit much from OOO / wide-issue Scientific workloads will benefit significantly from OOO / wide-issue OOO & wide-issue will be less beneficial for larger scale systems Augmenting an in-order processor with non-blocking caches will close OOO gap
5
Methodology Simulator: Multifacet, Ruby, Opal (OOO)
In-order processor model Looked at Simics functional – not comparable Restrict Opal to in-order issue Register renaming not removed Limitations: Can’t recompile code for scheduling Does not model UltraSPARC issue rules
6
Methodology Workloads Issues Commercial: Apache, SPECjbb, OLTP, Zeus
Scientific: Barnes-Hut, Ocean Issues No 4 processor simulation No cache warmup files
7
Methodology Baseline configuration used
ROB, instruction window, and # functional units halved for 2-wide processor
8
Results OOO vs. in-order provides more performance benefit than widening issue from 2 to 4 Tolerating cache misses is the key
10
Results Hypothesis 1: Commercial workloads will not benefit much from OOO / wide-issue ~30% speedup Hypothesis 2: Scientific workloads will benefit significantly from OOO / wide-issue ~60% speedup Commercial workloads DO benefit from OOO, but not as much as scientific.
12
Results OOO & wide-issue will be less beneficial for larger scale systems True, BUT Workloads don’t scale above 8 processors (except apache)
15
(Non) Results Hypothesis 4: Augmenting an in-order processor with non-blocking caches will close OOO gap Simulations still running!
16
Future Work Analyze performance trade-offs
vs. power? vs. area? 4 processor runs (if possible) Vary # of MSHRs
17
Conclusions Out-of-order provides substantial benefit over in-order, even for commercial workloads Other methods for tolerating/reducing cache misses may be effective Diminishing returns for larger systems, but workloads don’t scale well Need to consider power and area constraints
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.