Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sculptor: Flexible Approximation with

Similar presentations


Presentation on theme: "Sculptor: Flexible Approximation with"— Presentation transcript:

1 Sculptor: Flexible Approximation with
Selective Dynamic Loop Perforation Shikai Li, Sunghyun Park, Scott Mahlke Compilers Creating Custom Processors (CCCP) Research Group EECS Department at the University of Michigan

2 Problem The end of Dennard Scaling and the upcoming end of Moore’s Law. VS Data explosion and emerging compute-intensive applications in deep learning, data mining, etc. Quantum Tunneling through the memory wall, K,Gpmez.

3 An example of image quality of increasing quality losses
Opportunity Quality: 100% 95% 90% 85% Compute-intensive applications in various domains are error-tolerant. -- Machine Learning -- Data Mining -- Image Processing -- Video Processing -- Gaming An example of image quality of increasing quality losses From Mehrzard S. et al. “SAGE: Self-Tuning Approximation for Graphics Engines”

4 + Solution Rising Computation Demands
Emerging Error-Tolerant Applications Approximate Computing Trade off output accuracy with performance improvement or energy reduction.

5 Approximate Computing
Reduce the amount of computation Replace accurate computation with fuzzy computation Perform computation without correctness guarantees

6 Previous Work Hardware Software Neural Network Accelerator
-- ASIC (H. Esmaeilzadeh, MICRO 2012) -- Analog (R. St Amant , ISCA 2014) -- FPGA (T. Moreau, HPCA 2015) -- GPU (A. Yazdanbakhsh, MICRO 2015) Approximate Value Prediction -- J. S. Miguel, MICRO 2014; -- A. Yazdanbakhsh, TACO 2016; Cache And Memory System -- Doppelgänger Cache (J. S. Miguel, MICRO 2015) -- Bunker Cache (J. S. Miguel, MICRO 2016) -- Concise Loads & Stores (A. Jain, MICRO 2016) Approximate Operation and Storage -- CPU (H. Esmaeilzadeh, ASPLOS 2012) -- GPU (D. Wong, HPCA 2016) Software Programmer-Assisted Framework -- Green (W. Baek, PLDI 2010) -- EnerJ (A. Sampson, PLDI 2011) Automatic Framework for GPU -- SAGE (M. Samadi, MICRO 2013) -- Paraprox (M. Samadi, ASPLOS 2013) Unleash Parallelism -- QuickStep (S. Misailovic, TECS 2013) -- Helix-Up (S. Campanoni, CGO 2015) Approximation Dynamism -- M. A. Laurenzano, PLDI 2016 -- S. Mitra, CGO 2017 Task Skipping and Loop Perforation -- M. Rinard, et al. SC 2016, MIT Tech Report 2009, SAS 2011, FSE 2011

7 Loop Perforation Loops are transformed to periodically skip subsets of their iterations. Periodically Entirely

8 Skipping Different Instructions
Skipping different instructions have different influences on accuracy. Data Addr Mem Cond Different Final Output Errors Caused by Skipping A Single Instruction at Rate 2 inside The Kernel Loop of Hotspot from Rodinia

9 Skipping Different Iterations
Skipping different iterations have different influences on accuracy. Iteration ID Different Final Output Errors Caused by Skipping A Single Iteration inside A Kernel Loop of Bodytrack from PARSEC

10 Optimized Loop Perforation
Traditional Loop Perforation Dynamic Iteration Loop Perforation Selective Instruction Loop Perforation Selective Dynamic Loop Perforation

11 System Overview

12 Methodology Selective Instruction Loop Perforation
Dynamic Iteration Loop Perforation Runtime Error Management

13 Selective Instruction Loop Perforation
Loops are transformed to skip a subset of instructions in each iteration.

14 Selective Perforation Methodology
Instruction Level Selective Perforation Load Based Selective Perforation Store Based Selective Perforation

15 Instruction Level Selective Perforation
Selection Stage Expansion Stage Transformation Stage

16 Selection Stage 1. Selection Stage Performance Impact

17 Selection Stage 1. Selection Stage Performance Impact
Program Corruption

18 Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error … 101, 102, 103, 104, 105, 106, 107, 108, 109 … Good Temporal data similarity … 100, 200, 100, 300, 200, 500, 200, 300, 500 … Bad Temporal data similarity

19 Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error

20 Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error

21 Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error

22 Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error

23 Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error

24 Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error

25 Expansion Stage 2. Expansion Stage
Perforate more instructions without additional output error. Instructions that only use results of perforated instructions or loop invariants. Instructions whose results are only used by perforated instructions.

26 Expansion Stage 2. Expansion Stage

27 Expansion Stage 2. Expansion Stage

28 Expansion Stage 2. Expansion Stage

29 Expansion Stage 2. Expansion Stage

30 Expansion Stage 2. Expansion Stage

31 Transformation Stage 3. Transformation Stage
Reduce control divergence overhead with compiler optimization.

32 Transformation Stage 3. Transformation Stage Instruction Re-ordering

33 Transformation Stage 3. Transformation Stage Instruction Re-ordering
Loop Unswitching

34 Transformation Stage 3. Transformation Stage Instruction Re-ordering
Loop Unswitching Loop Unrolling

35 Methodology Selective Instruction Loop Perforation
Dynamic Iteration Loop Perforation Runtime Error Management

36 during program execution.
Dynamic Iteration Loop Perforation Loops are transformed to skip a flexible subset of iterations during program execution.

37 Dynamic Perforation Methodology
Dynamic Perforation Rate Dynamic Start Point

38 Dynamic Rate Adapt approximation aggressiveness through changing skip rates at different circumstances during program execution.

39 Active Function Call Based Dynamic Rate
Loop executions tend to have different accuracy impacts during different function calls.

40 Active Loop Iteration Based Dynamic Rate
Loop executions tend to have different accuracy impacts during different “outer-loop” iterations. Iteration ID

41 Dynamic Start Coverage guarantee each iteration to be executed at least once. Fairness provides each iteration an equal chance to be executed.

42 Methodology Selective Instruction Loop Perforation
Dynamic Iteration Loop Perforation Runtime Error Management

43 Runtime Error Management
A calibration-based aggressiveness adjustment mechanism to perform error management at runtime.

44 Evaluation Evaluation Benchmark: 7 Benchmarks from PARSEC
2 Additional Benchmarks from Rodinia Error Metric: Most Error Metrics are Based on Relative Mean Error Cluster Applications use NMI Score as the Error Metric Evaluation Platform: LLVM 4.0 Clang 4.0, -O3 Ubuntu 16.04 Intel Skylake i GHz

45 Selective & Dynamic Perforation
Selective Dynamic Loop Perforation Speedup with Different Error Budgets (left: 5%, right: 10%) Average speedup improved from 1.47x to 2.89x Average speedup improved from 1.93x to 4.07x

46 Selective / Dynamic Loop Perforation
Selective Loop Perforation Speedup with An Error Budget of 10% Dynamic Loop Perforation Speedup with An Error Budget of 10% Average speedup 2.62x Compared to 4.07x of Selective Dynamic Loop Perforation Average speedup 2.91x Compared to 4.07x of Selective Dynamic Loop Perforation

47 Conclusion Motivation: Space Limitation in Loop Perforation
Time Limitation in Loop Perforation Methodology: Selective Instruction Loop Perforation Dynamic Iteration Loop Perforation Evaluation: Average Speedup 1.93x -> 4.07x

48 Q & A

49 Thank you!


Download ppt "Sculptor: Flexible Approximation with"

Similar presentations


Ads by Google