Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.

Similar presentations


Presentation on theme: "Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning."— Presentation transcript:

1 Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning

2 Predicting Unroll Factors Loop Unrolling sensitive to unroll factor Current solution: expert design –Difficult: Hand-tuned heuristics –Must be rewritten frequently Predict parameters with machine learning –Easy: data collection takes ~1wk No human time –Algorithm does not change with compiler

3 Loop Unrolling Combines multiple iterations loop body Fewer Iterations  Less Branching Allows other transformations: –Exposes adjacent memory locations –Allows instruction reordering across iterations

4 Unroll Factors How many iterations to combine? Too few? –Provides little benefit Too large –Increased cache pressure –Increase live range  register pressure

5 Optimal Unroll Factors

6 Classification Problems Input a vector of features –E.g. nest depth, # of branches, # of ops Output a class –E.g. unroll factor, 1-8 No prior knowledge required –Meaning of features/classes –Relevance of features –Relationships between features

7 Nearest Neighbors Paper describes Kernel Density Estimator All dimensions normalized to [0,1] Given a test point p: –Consider training points “close” to p Within fixed distance, e.g. 0.3 –Majority vote among qualifying training points

8 Nearest Neighbors

9 Support Vector Machine Assume two classes, easily generalized Transform data –Make classes linearly separable Find line to maximize sep. margin For test point: –Perform transformation –Classify based on learned line

10

11 Maximal Margin

12 Non-Linear SVM

13 Some Features # operands Live range size Critical path length # operations Known tripcount # floating point ops Loop nest level # branches # memory ops Instruction fan-in in DAG # instructions Language: C, fortran # memory ops # Implicit instructions & more (38 total)

14 Results: No Software Parallelism

15 Results: With Software Parallelism

16 Big Idea: Easy Maintenance Performance improvements modest –Sometimes worse, sometimes much better –Usually little change Requires no re-tuning to change compiler –Gathering data takes ~1wk, no human time General mechanism –Can be applied to all parameters –No model of system needed Can be applied to new transformations where expert knowledge is unavailable

17 Integrated CPU and L2 Cache Voltage Scaling using Machine Learning

18 Dynamic Voltage Control Monitor system When activity is low, reduce power –Also reduces computational capacity –May need more energy if work takes longer

19 Multiple Clock Domains Adjust separate components independently Better performance/power –E.g. CPU-bound application may be able to decrease power to memory and cache without affecting performance More complex DVM policy

20 Motivation Applications go through phases Frequency/voltages should change too Focus on core, L2 cache – Consume large fraction of total power Best policy may change over time –On battery: conserve power –Plugged in: maximize performance

21 Learning a DVM Policy Compiler automatically instruments code –Insert sampling code to record perf. Counters –Instrument code only to gather data Use machine learning to create policy Implement policy in microcontroller

22 ML Parameters Features –Clock cycles per instruction –L2 accesses per instruction –Memory access per instruction Select voltage to minimize: –Total energy –Energy*delay

23 Machine Learning Algorithm Automatically learn set of if-then rules –E.g: If (L2PI >= 1) and (CPI <=0) then f_cache=1GHz Compact, expressive Can be implemented in hardware

24

25 Results Compared to independently managing core and L2: –Saves 22% on average, 46% max Learns effective rules from few features Compiler modifications instrument code Learned policy offline Implemented policy in microcontroller

26 Conclusion Machine learning derives models from data automatically Allows easy maintenance of heuristics Creates models that are more effective than hand-tuned


Download ppt "Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning."

Similar presentations


Ads by Google