Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Systematic Energy Characterization of CMP/SMT Processor Systems.

Similar presentations


Presentation on theme: "© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Systematic Energy Characterization of CMP/SMT Processor Systems."— Presentation transcript:

1 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks R. Bertran* +, A. Buyuktosunoglu*, M. Gupta*, M. Gonzalez +, P. Bose* *IBM T.J. Watson Research Center + Barcelona Supercomputing Center

2 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 2 What is the maximum power consumption? Any performance bug? Any reliability issues? …  Time consuming and tedious – Error prone task Trial and error process – Several micro- benchmarks are required  Deep expertise limited to few designers – Detailed knowledge of the underlying architecture is required Why do we need micro-benchmarks? Micro-benchmarks! AUTOMATED SOLUTION NEEDED!

3 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe: a micro-benchmark generation framework

4 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe Workflow MicroProbe Framework User Micro- Bench- mark InputsOutputs Micro- benchmark generation policy Architecture Definition files Endless loop 50% INT 50% FP Endless loop for each instruction of the ISA Micro- Bench- mark Micro- Bench- mark Micro- Bench- mark Max Power stressmark External tools Real platforms SimulatorsModels

5 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe: Distinguishing Features 5 FeaturePrevious worksMicroProbe ISA queries - Instruction type - Operand length, binary codification etc.  (manual) Micro-architecture queries - Functional unit, latency, throughput, energy per instruction, average instruction power etc.  (manual) Micro-architecture models - Set-associative cache model  (no) Code generation - Skeleton and instruction definition passes, memory modeling pass, branch modeling pass, ILP definition pass. - Configurable passes  (no) Design space exploration - Integrated  (no) - GA-based search - Exhaustive search  (manual) - Customizable search  (manual)

6 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe Usage and Design Overview Research idea Micro-benchmark generation policies (user-defined scripts) Loop stressing the floating point unit Sequence of loads hitting 50% L1 and 50% L2 Generate a stress- mark for each functional unit of the architecture Search for the sequence of 2 loads and 2 integer operations with maximum IPC MicroProbe Framework (Python API) Architecture module Code generation module Design space exploration module ISA definitions ISA definitions ISA definitions Micro-architecture analytical models Micro-architecture analytical models Micro-architecture analytical models Micro-architecture definitions Micro-architecture definitions Micro-architecture definitions Micro-benchmark synthesizer Passes Search drivers Search drivers Search drivers Properties Micro-benchmark Automatic bootstrap process External tools

7 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Max-power Stressmark Generation 7 Use MicroProbe to generate max- power stressmark Characterize energy per instruction (EPI) and IPC (Architecture Module) Select N instructions with max (IPC* EPI) Form a basic endless loop (e.g. 4K) using selected instructions (Code Generation Module) Generate micro-benchmarks with different orders of the selected N instructions Evaluate using Design Space Exploration Module Pick the highest power microbenchmark Loop: … mulldo lxvw4x xvnmsubmdp … mulldo xvnmsubmdp lxvw4x Loop: … mulldo lxvw4x mulldo xvnmsubmdp lxvw4x xvnmsubmdp …

8 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 CASE STUDIES MicroProbe: A Micro-benchmark Generation Framework 8

9 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Experimental Methodology  Platform: – Processor: POWER7 @ 3GHz 8-core 4-way SMT 32KB L1, 256KB L2 and 4MB L3 per core – Memory: 32 GB DDR3 SDRAM @ 800MHz – OS: RHEL 5.7 + Linux 3.0.1 – EnergyScale architecture Power measurements in miliwatts Sampling rate up to 1ms  In-house software collects power and performance counter traces [C. Lefurgy et al, IBM] 9

10 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Case Study 1: EPI Characterization 10 High differences in EPI across instructions stressing different micro- architecture components High differences in EPI across instructions stressing the same micro- architecture components and at the same rate (IPC)

11 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe Heuristic: Max(EPI * IPC) Selected instructions: mulldo, xvnmsubmdp, lxvw4x Case Study 2: Max-power Stressmark Generation 11 ? Use a computational intensive kernel Use complex instructions accessing different functional units with high IPC Generate all possible combinations of complex instructions stressing different units Use MicroProbe DAXPY Selected intructions: mullw xvmaddadp lxvd2x Loop: … mullw xvmaddadp lxvd2x … Loop: … mullw lxvd2x mullw xvmaddadp lxvd2x … Loop: … mullw lxvd2x mullw xvmaddadp lxvd2x xvmaddadp … MicroProbe Loops Expert DSE Expert manual MicroProbe

12 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Max-power Stressmark Generation 12

13 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Case Study 3: Counter-based Processor Power Model 13 Bottom- up Power modeling method Dynamic Power f(PMCs) Intercept SMT1 Intercept SMT2-4 SMT effect Linear Regression f(CMP) CMP effect Uncore power Func.Unit micro- Benchmarks CMP1–SMT1 Random micro- Benchmarks CMP1–SMT1 Random micro- Benchmarks CMP1–SMT2/4 Random micro- Benchmarks CMP1/8–SMT2/4 Model: Dynamic Power f(PMCs) SMT effect CMP effect Uncore power SMT enabled # cores 1 2 3

14 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Counter-based Processor Power Model Validation  Within acceptable error margins: < 4% on average

15 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Counter-based Processor Power Model Validation on Corner Cases  Models trained using non-micro-architecture aware training sets show high errors and variability  Models trained using the micro-architecture aware training set show acceptable error margins: < 5% on average

16 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Conclusions  MicroProbe is a productive micro-benchmark generation framework – Adaptive and flexible – Includes micro-architecture semantics – Integrates design space exploration  Presented three case studies: – Instruction-based EPI characterization – Automated max-power stressmark generation – CMP/SMT-aware bottom-up counter-based processor power model 16

17 © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 QUESTIONS? MicroProbe: A Micro-benchmark Generation Framework 17


Download ppt "© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Systematic Energy Characterization of CMP/SMT Processor Systems."

Similar presentations


Ads by Google