Presentation is loading. Please wait.

Presentation is loading. Please wait.

Core-Selectability in Chip-Multiprocessors Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg.

Similar presentations


Presentation on theme: "Core-Selectability in Chip-Multiprocessors Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg."— Presentation transcript:

1 Core-Selectability in Chip-Multiprocessors Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg

2 Dividing the Design A definition Processing Cores All levels of cache Interconnect Ports to Memory and IO

3 What this Talk is About How to improve performance of a CMP by improving the processing the interconnect is not fully utilized by all workloads if it is, there’s nothing to gain here by enabling exploitation of the full potential of the interconnection

4 The Provisioning Factor Balance in provisioned resources need ports to the interconnect If the same interconnect is enough for a quad-core, then it was over-provisioned for a dual-core.

5 The Provisioning Factor Balance in provisioned resources If the design is well provisioned with the same interconnect, then it must have been over-provisioned in the baseline. some technique that boosts general performance

6 The Underutilization Factor Interconnect not fully utilized by all applications workloads that depend the most on interconnect have a louder say in what a well-provisioned design constitutes

7 He’s not much for a conversation. But if he was, it would be a conversation about saving you execution time. The One-size-fits-all Factor A single solution has limited performance RISC v. CISC wide v. narrow issuing deep v. shallow pipelining large v. small issue queue large v. small issue queue Changing these trade-offs will improve performance for some workloads and degrade it for others.

8 The Shrinking Factor Progressively less die area for the cores ` better return on increasing the interconnection resources

9 The Shrinking Factor Progressively less die area for the cores

10 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Intel386 19902010 Niagara-1 Intel Pentium 20002005 IBM Power4 IBM Power5 IBM Power6 IBM Power3 Niagara-2 - Intel 8086 Intel 8088 Intel 80286Intel 486DX Intel Pentium III Intel Core Duo Intel Pentium IV 1995 - The Shrinking Factor Progressively less die area for the cores

11 Program 1 Program 2 Single Core Design: Optimized for all workloads The Diversity Factor Can provide diversity in the core designs

12 Code 2 Code 1 Heterogeneous Cores: Optimized for workload The Diversity Factor Can provide diversity in the core designs

13 Program 1 Program 2 Core-Selectability: Optimized for workload. Core-Selectability

14 Selectability

15 Recap can reduce verification effort by splitting up workload space can improve performance without increasing power density results in a homogeneous design Provisioning FactorOne-size-fits-all FactorShrinking Factor Underutilization Factor Diversity Factor Core-Selectability Port Sharing

16 Core-Selectability Remains homogeneous at a high level CMP

17 Empirical Evaluation Based on Fabscalar A library of the synthesized implementation of different configurations for different microarchitectural units of a contemporary superscalar processor.

18 The selection of cores Core-UCore-ACore-B FETCH STAGES435 DECODE STAGES111 RETIRE STAGES222 ISSUE WIDTH325 ROB SIZE5121024512 IWINDOW SIZE6412832 Clock period.6ns normalized exec. time

19 On Individual Benchmarks normalized execution time

20 The Effect of Selectability normalized exec. time

21 Under Different Task Arrival Patterns Average task turnaround time for (a) normal traffic, and (b) bursty traffic.

22 Overhead of Reconfigurability Issue-Q sizeWakeup DelaySelect DelayWake & Select DelayReconfig. Delay 160.55ns0.54ns1.09ns1.55ns 320.63ns0.59ns1.38ns1.89ns 640.67ns0.65ns1.62ns2.10ns 1280.82ns0.76ns2.00ns2.30ns

23 Implementation of Port Sharing L1 Data Cache core-selection Core A Core B extra switching extra wire (100fF) 26ps added propagation delay

24 Overhead of Reconfigurability With reconfigurability, change is implemented within a core – with complex coupling between pipeline stages. With Core-Selectability, change is implemented at the core level – with less complex coupling between core and interconnect.

25 Thank you It’s as if he knows you like to save execution time.


Download ppt "Core-Selectability in Chip-Multiprocessors Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg."

Similar presentations


Ads by Google