Download presentation
Presentation is loading. Please wait.
Published byDoris Hicks Modified over 9 years ago
1
Copyright © 2012 Houman Homayoun 1 Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman Homayoun National Science Foundation CI Fellow University of California San Diego
2
Why Heterogeneity? Copyright © 2012 Houman Homayoun 2 Existing General Purpose CMP designs use only homogeneous cores A general purpose one-size-fits-all core is not necessarily the most efficient One processor optimized for each application! Core 1 Core 2
3
Static vs. Dynamic Heterogeneity Copyright © 2012 Houman Homayoun 3 Prior proposals (e.g., Kumar 2003) propose static heterogeneity. Increases chance of finding an appropriate core Does not guarantee perfect match Others have proposed solutions for dynamic heterogeneity (Core Fusion, TFlex). Due to the difficult of sharing resources at a fine granularity, they enable only coarse- grain sharing. Big (combined) cores or small cores.
4
Copyright © 2012 Houman Homayoun 4 Outline Resource Pooling Why 3D? Design Solutions Adaptive Policies Results Conclusion
5
Application Resource Utilization Copyright © 2012 Houman Homayoun 5
6
6 ROBLDSQRFIQ Application Resource Utilization
7
Copyright © 2012 Houman Homayoun 7 Application 1 Application 2 underutilized ROBLDSQRFIQ ROBLDSQRFIQ Application Resource Utilization Dual-Core Machine
8
Dynamic Heterogeneity Through Resource Pooling Copyright © 2012 Houman Homayoun 8 Register File ROB Register File ROB Core 2 Core 1
9
Copyright © 2012 Houman Homayoun 9 Outline Need for Heterogeneity Why 3D? Design Solutions Adaptive Policies Results Conclusion
10
Why NOT Sharing in 2D? Copyright © 2012 Houman Homayoun 10 Long wire delay in 2D In 2D, it is not efficient Demanding 500 psec 5 nsec
11
Copyright © 2012 Houman Homayoun 11 Our Solution: 3D
12
Copyright © 2012 Houman Homayoun 12 Our Solution: 3D Fast interconnection network As fast as few ps (three order of magnitude smaller than 2D) Minimize the Communication Latency 5 psec 5000 psec A principal advantage No change to the fundamental pipeline design of 2D architectures, yet still exploits the 3D to provide greater energy proportionality and core customization
13
Copyright © 2012 Houman Homayoun 13 Need for Heterogeneity Why 3D? Design Solutions Adaptive Policies Results Conclusion Outline
14
Stackable Structures for Resource Pooling Performance bottleneck and power hungry resources Reorder Buffer and Register File (SRAM) Instruction Queue and Load and Store Queue (CAM+SRAM) Our goal: share units across multiple cores with minimal impact on design spec (latency, number of ports and power) Use previously proposed modular design Each partition is a self-standing and independently usable unit Effective in reducing power and access delay Copyright © 2012 Houman Homayoun 14 Independent partition Part 1 Part 2 Part 3 Part 4 Register File
15
Example of Resource Sharing Copyright © 2012 Houman Homayoun 15 Decoder MUX TSV Register File in Core 0 Register File in Core 1 Free Partition Additional logic to decide whether partition is empty Additional logic to route the signal to the right partition
16
Copyright © 2012 Houman Homayoun 16 Need for Heterogeneity Why 3D? Design Solutions Adaptive Policies Results Conclusion Outline
17
Adaptive Policies for Resource Pooling Several issues need to be considered Ownership Fast releasing Fast reallocation Cycle by cycle adaptation Prevent starvation A simple adaptive policy specification (MinMax policy) Set limit for the size of resources how much they can grow up to (MAX) or they can shrink down to (MIN) Use free list Use central arbitration Copyright © 2012 Houman Homayoun 17
18
Copyright © 2012 Houman Homayoun 18 Arbitration Unit Core 1 Core 2Core 3 Core 4 Free List Application 1Application 2 Application 3Application 4 Register File MinMax Policy Example MIN
19
Copyright © 2012 Houman Homayoun 19 Arbitration Unit Core 1 Core 2Core 3 Core 4 Free List Application 1Application 2 Application 3Application 4 Register File MinMax Policy Example MIN
20
Copyright © 2012 Houman Homayoun 20 Arbitration Unit Core 1 Core 2Core 3 Core 4 Free List Application 1Application 2 Application 3Application 4 Register File MinMax Policy Example MIN
21
Copyright © 2012 Houman Homayoun 21 Arbitration Unit Core 1 Core 2Core 3 Core 4 Free List Application 1Application 2 Application 3Application 4 Register File MinMax Policy Example MIN
22
Copyright © 2012 Houman Homayoun 22 Need for Heterogeneity Why 3D? Design Solutions Adaptive Policies Results Conclusion Outline
23
Baseline Architecture Copyright © 2012 Houman Homayoun 23 Processor Model High-end architecture, four OoO cores with issue width of 4 Medium-end architecture, four OoO cores with issue width of 2 3D Floorplans (different performance, flexibility, and temperature tradeoff) (1) Conventional (Thermal-Optimized Design) (2) Proposed (Performance-Optimized Design) (1)(2)
24
Evaluation Copyright © 2012 Houman Homayoun 24 1 Thread 4 Thread 2 Thread Power Performance Temperature Energy-Delay Core 1 Core 2 Core 3 Core 4 Active core Idle core Link
25
Single Thread Performance Copyright © 2012 Houman Homayoun 25 Speed Up Standard SPEC2K and SPEC2006 Benchmark Single benchmark (3 out of 4 cores are idle)
26
Multi-Thread Performance 2Thr: 2 idle cores + underutilized resources in the active cores 4Thr: No idle cores, only underutilized resources Copyright © 2012 Houman Homayoun 26 Normalized Weighted Speedup (%) gains are dramatic when some cores are idle
27
Medium-end vs High-end Resource pooling makes the medium core significantly more competitive with the high-end. Copyright © 2012 Houman Homayoun 27 Normalized Weighted Speedup (%) 28% 14% Only 3%! 0 Idle Core 2 Idle Core3 Idle Core Increase Resource Sharing
28
Copyright © 2012 Houman Homayoun 28 power (Watt) 3X 4X Pooling pay a small price in power Because of the enhanced throughput. Large speedups on low-IPC threads and high average speedup, but smaller increase in total instruction throughput and thus smaller increase in power Power
29
Copyright © 2012 Houman Homayoun 29 temperature (Celsius) Temperature Interestingly, the temperature of the medium resource-pooling core is comparable to the high-end core
30
Efficiency Copyright © 2012 Houman Homayoun 30 Even still, at equal temperature, the more modest cores have a significant advantage in energy efficiency measured in MIPS 2 /W (MIPS 2 /W is the inverse of energy-delay product) Normalized 2X
31
Conclusions Homogeneous cores are inherently inefficient for a diverse workload. Cores are typically overprovisioned as a result 3D stacking of cores enables fine-grain sharing (pooling) of resources not possible in 2D designs. Our dynamically heterogeneous 3D architecture allows the processor to construct the right core for each application dynamically, maximizing energy efficiency. Our 3D pooling architecture Leverages our experience in 2D pipeline design, yet still gains significant benefit from 3D Adapts to the specific demands of an application within a few cycles. Reduces reliance on overprovisioned cores, instead grabbing larger resources only when needed. Copyright © 2012 Houman Homayoun 31
32
End of presentation
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.