Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis.

Similar presentations


Presentation on theme: "Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis."— Presentation transcript:

1 Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis.

2 Outline 1. Introduction 2. Open Virtual Platforms 3. Proposed Energy Model 5. Conclusion and Future Works 4. Exploration in Large Scale Systems

3 1. Introduction  High performance many-core systems are a reality up to 256 cores available today What about software development?

4  Software development challenges comprise: inter-processor communication protocol stacks definition OS porting and analysis parallel programming model porting drivers development  Software development costs is increasing… Source: IBS 2013 Simulation is the key tool for many- core research 1. Introduction

5  Full-system simulators are one good option virtual platforms that emulate hardware behaviour, making software believe that it is running on a real physical hardware  Support concomitant HW and SW development improve time-to-market  Examples of such simulators are:  Why OVP simulator? SimulatorAccuracySupported processor architecturesLicenseActive support SimicsFunctionally- accurate Alpha, ARM, MIPS, PowerPC, SPARC, and x86 PrivateYes PTLsimCycle-accurateX86OpenNo SimpleScalarCycle-accurateAlpha, ARM, PowerPC, and x86OpenNo GEM5Cycle-accurate Alpha, ARM, MIPS, PowerPC, SPARC, and x86 OpenYes QEMUInstruction-accurate ARM, MicroBraze,MIPS, PowerPC, SPARC, x86, and others OpenYes OVPsimInstruction-accurate Alpha, ARC, ARM, MIPS, PowerPC, MicroBraze, and others Open and PrivateYes

6 2. Open Virtual Platform - OVP  Large number of processor architectures (ISAs) supported (e.g. MIPS, ARM, x86, PowerPC )  Simulation speeds of up to 2200 MIPS relies on just-in-time (JIT) dynamic binary translation  Complete development environment with APIs  Open source license  Extensive documentation and active forum  Powerful debug environment  Limitation: OVPsim provides instruction accuracy only, resulting in inaccurate software performance estimation  Contribution: propose and integration of an fast and accurate energy models into the OVPsim

7 2. Open Virtual Platform - OVP Source: [Davidmann and Graham 2014] Proposed Model Location Callback Software stack separated from the simulator Target instructions are translated to host machine binary code

8 3. Energy Model  Characteristics Instruction-driven energy model Developed on the basis of OVP APIs Run-time based approach ISA-based approach  Advantages It avoids huge amount of memory No trace files are required Model is transparent to the software engineer no pre- or post-processing application/software is required The approach can be applied to other processor architectures  Model called Watchdog

9  The instruction energy cost information is not available Characterization phase Instructions are organized in groups according their energy cost similarity Less complexity/computation during the characterization and simulation phases  Reference CPU PLASMA Core MIPS Architecture 3-Stage pipeline 100 MHz 65nm low power library from ST Microelectronics  Using Cadence Tools static and dynamic energy 3. Energy Model

10 3. Energy Model - Watchdog the parser module disassembles the binary code and identifies the instruction that must be executed 2 1 identified instruction is used as a hash table key to ascertain to which class such instruction belongs 3 The energy cost is computed and the instruction is executed in the CPU 4 4 4 2 3

11 - Benchmark Conception - Activity Measurement - Power Acquisition - Energy Calculation Energy Model Creation 1 2 3 4 1 2 3 4 GroupsPower (mW)Exec Time (us)Energy (nJ)# of instEnergy per Inst (nJ) Arithmetic6,456342,7552212,826347640,0636528098 Jump6,046102,600620,320102240,0606728873 Load-Store4,0941042,8004269,223485610,0879146476 Logical4,469349,7351562,966354620,0440743815 Move3,129480,7251504,189393630,0382132593 NOP2,141257,155550,569261300,0210703733 Shift3,824298,7351142,363303620,0376247494 Groups Power (mW) Exec Time (us) Energy (nJ) # of inst Energy per Inst (nJ) Arithmetic6,456342,7552212,826347640,0636528098 Jump6,046102,600620,320102240,0606728873 Load-Store4,0941042,8004269,223485610,0879146476 Logical4,469349,7351562,966354620,0440743815 Move3,129480,7251504,189393630,0382132593 NOP2,141257,155550,569261300,0210703733 Shift3,824298,7351142,363303620,0376247494 Characterization Flow 3. Energy Model 5 5 What about accuracy?

12 3. Energy Model – Experimental Setup  Benchmarks 19 applications from different research domains WCET and other benchmarks created in house Model estimation compared with a gate-level implementation (PLASMA) #NameSuite ABFSH In-House production B BinarySear ch Mälardale n WCET C BitManipul ation Mälardale n WCET DBubble Mälardale n WCET ECounts Mälardale n WCET FCrc In-House production GEdn Mälardale n WCET HExpint Mälardale n WCET IFactorial In-House production #NameSuite JFftMälardalen WCET KFibIn-House production LHanoiIn-House production MHarmIn-House production NInsertSortMälardalen WCET OMatrixInverMälardalen WCET PMdcIn-House production QPeakspeedImperas RUdMälardalen WCET SUsqrtMälardalen WCET

13  Benchmarks 19 applications from different research domains MiBench and other benchmarks created in house Model estimation compared with gate-level 3. Energy Model – Accuracy Evaluation  Mismatch is below 6% in 15 out of the 19 adopted benchmarks 0.01 8.56 What about speedup gain ?

14 3. Energy Model – Achieved Speedup  Comparing each benchmark watchdog estimation execution time with gate-level execution time Achieved speedup varying from 461 to 1577 Mean relative gain 1118 the large application code the more relative gain 461.96 1577,16

15 3. Energy Model – Scalability  Scenario exploration with up to 1000 CPUs  Each CPU has one Watchdog associated Around 1.8 MIPS Improvement Watchdog Model

16 4. Exploration in Large Scale Systems  The proposed instruction-driven energy model was integrated into a NoC-based MPSoC model proposed in [Mandelli et al. 2013] Case study: mapping process cost evaluation Nearest Neighbor (NN), first free (FF) and LECDN 8x8 MPSoC size organized in 4x4 clusters Only the heuristic algorithm was observed 5 applications instances: 4 partial MPEG decoder containing 5 tasks 1 DTW containing 10 tasks

17 5. Conclusion and Future Work  Inclusion of a fast and accurate energy models into OVPsim  Extensive evaluation of both models considering several benchmarks, while comparing it to a gate-level simulation  Approach is ISA/CPU-oriented thus everything is transparent to the software engineer  Programmers can use the same simulator to have fast simulation and accurate software performance evaluation  Limitation of our approach:  we are not considering processors with cache

18 5. Conclusion and Future Work  Consider memory access power cost Calibrate our model considering NVSim/CACTI  Porting the proposed model to the OVPSim morphing phase  Improve overall model accuracy evaluate load and stores patterns enhance the division and multiplication algorithm estimation  Complex processor architectures as Out-Order or Super Pipeline

19 Questions?

20 Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis.


Download ppt "Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis."

Similar presentations


Ads by Google