Download presentation
Presentation is loading. Please wait.
1
ECE 510 Brendan Crowley Paper Review October 31, 2006
2
“Processor Power Reduction Via Single-ISA Heterogeneous Multi- Core Architectures” Rakesh Kumar, Keith Farkas, Norman P. Jouppi, Partha Ranganathan, Dean M. Tullsen
3
Presentation Overview Introduction The Architecture Modeling the Architecture Results Critical Analysis / Conclusion
4
Introduction Background Processors continue to have increased speed and transistor count as transistor sizes decrease This leads to increased power consumption which causes problems Heat dissipation Chip failure Battery life Designers are always searching for new ways to decrease power consumption
5
Introduction (2) Most work on reducing power consumption falls under one of two categories: Voltage and frequency scaling “Gating” – the ability to turn on/off portions of the core Some designs have included the use of multiple identical (homogeneous) cores Others have included processors with co- processors that run a different instruction set
6
Introduction (3) The Main Idea Different software applications have different resource requirements This fact leads the authors to believe that core diversity is of greater value than uniformity Therefore, proposed design is a single-ISA heterogeneous multi-core architecture Each core runs the same instruction set, but has different abilities and performance characteristics
7
The Architecture One method is to take a family of previously designed cores, modify their interfaces, and combine them on one die Each core executes same instruction set, but contains different resources, and therefore achieves different performance and energy efficiency on the same application
8
The Architecture (2) The operating system determines the application’s requirements and decides which core is best to use (which core will be the most energy efficient) To accommodate a wide variety of applications, the cores should have a wide range of performances
9
The Architecture (3) Authors chose a 5-core design, using existing cores with a few changes: Hypothetical single-threaded version of the EV8 (Alpha 21464), which they call the “EV8-” MIPS R4700 EV4 (Alpha 21064) EV5 (Alpha 21164) EV6 (Alpha 21264)
10
The Architecture (4) Assumptions Each core has a private L1 data and instruction cache All cores share an L2 cache, phase-locked-loop circuitry and pins Implemented in 0.10 micron technology One application running at a time (one thread running)
11
The Architecture (5) Relative core sizes
12
The Architecture (6) Different parts of a program may require different resources To take full advantage of the core diversity it is necessary to switch between cores in the middle of program execution This is done at operating system timeslice intervals, with user-state already saved to memory If the OS decides to switch cores, the data is saved to the shared L2 cache, where the next core can retrieve it
13
The Architecture (7) The authors assume the unused cores are powered down to avoid static leakage and dynamic switching power This means time must be spent powering up the cores Experimental results show that this doesn’t affect performance when core- switching is done at OS timer intervals, even with pessimistic assumptions about power-up time and software overhead
14
Modeling the Architecture Data on the EV8 was based on some predictions and reported data Data on the other cores was from published literature Assume all of the alpha cores run at 2.1GHz (since they assume 0.10 micron process), and the R4700 runs at 1GHz
15
Modeling the Architecture (2) All architectures were modeled as accurately as possible on a highly detailed instruction-level simulator, using the configurations in the table below
16
Modeling the Architecture (3) The table below shows the area and peak power statistics of the cores Areas were found from die photos Total Die area is approximately 400mm 2
17
Modeling the Architecture (4) Benchmark execution simulated using SMTSIM Simulator was modified to simulate a multi-core processor with a shared L2 cache Assume a single thread running on one core at a time Switching cores requires the active core’s pipeline to be flushed and writing back the L1 cache lines to the L2 cache
18
Results The following figure shows results for the SPEC application applu The Y-axis, IPS 2 /W, is basically the inverse of power-delay product Constraint: Never choose a core that sacrifices more than 50% performance relative to EV8- over an interval
19
Results (2)
20
Results (3) Compared to a single-core architecture, this design could ideally reduce the PDP by 74% Combination of 25% performance loss and 81% energy savings Could change the constraint to achieve greater PDP savings (sacrificing performance, of course) Another design point gives 36% energy savings with 4% performance loss
21
Results (4) Could optimize other metrics besides PDP, depending on the design goals Different power and performance tradeoffs can be made simply by changing the core switching algorithm (no need to change the hardware)
22
Critical Analysis / Conclusion There are a lot of assumptions made about things like frequency scaling, power consumption of cores, etc. This paper only reports results for one benchmark application Multiple cores/threads running at the same time would likely be used in practice How would this affect the core switching complexity and latency
23
Critical Analysis / Conclusion (2) This technique seems like a very good one Homogeneous multi-core chips are already on the market Potential for significant energy savings
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.