Exploring Core Designs for Chip Multiprocessors

Exploring Core Designs for Chip Multiprocessors
Allison Holloway Matthew Allen

Outline Motivation Hypotheses Methodology Results Conclusions

Motivation What should core of a CMP look like?
Workloads: commercial, scientific OOO wide-issue superscalar? Tradeoffs: Performance, Power, Area, Complexity

Hypotheses Commercial workloads will not benefit much from OOO / wide-issue Scientific workloads will benefit significantly from OOO / wide-issue OOO & wide-issue will be less beneficial for larger scale systems Augmenting an in-order processor with non-blocking caches will close OOO gap

Methodology Simulator: Multifacet, Ruby, Opal (OOO)
In-order processor model Looked at Simics functional – not comparable Restrict Opal to in-order issue Register renaming not removed Limitations: Can’t recompile code for scheduling Does not model UltraSPARC issue rules

Methodology Workloads Issues Commercial: Apache, SPECjbb, OLTP, Zeus
Scientific: Barnes-Hut, Ocean Issues No 4 processor simulation No cache warmup files

Methodology Baseline configuration used
ROB, instruction window, and # functional units halved for 2-wide processor

Results OOO vs. in-order provides more performance benefit than widening issue from 2 to 4 Tolerating cache misses is the key

Results Hypothesis 1: Commercial workloads will not benefit much from OOO / wide-issue ~30% speedup Hypothesis 2: Scientific workloads will benefit significantly from OOO / wide-issue ~60% speedup Commercial workloads DO benefit from OOO, but not as much as scientific.

Results OOO & wide-issue will be less beneficial for larger scale systems True, BUT Workloads don’t scale above 8 processors (except apache)

(Non) Results Hypothesis 4: Augmenting an in-order processor with non-blocking caches will close OOO gap Simulations still running!

Future Work Analyze performance trade-offs
vs. power? vs. area? 4 processor runs (if possible) Vary # of MSHRs

Conclusions Out-of-order provides substantial benefit over in-order, even for commercial workloads Other methods for tolerating/reducing cache misses may be effective Diminishing returns for larger systems, but workloads don’t scale well Need to consider power and area constraints

Exploring Core Designs for Chip Multiprocessors

Similar presentations

Presentation on theme: "Exploring Core Designs for Chip Multiprocessors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Exploring Core Designs for Chip Multiprocessors

Similar presentations

Presentation on theme: "Exploring Core Designs for Chip Multiprocessors"— Presentation transcript:

Similar presentations

About project

Feedback