Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning

Predicting Unroll Factors Loop Unrolling sensitive to unroll factor Current solution: expert design –Difficult: Hand-tuned heuristics –Must be rewritten frequently Predict parameters with machine learning –Easy: data collection takes ~1wk No human time –Algorithm does not change with compiler

Loop Unrolling Combines multiple iterations loop body Fewer Iterations  Less Branching Allows other transformations: –Exposes adjacent memory locations –Allows instruction reordering across iterations

Unroll Factors How many iterations to combine? Too few? –Provides little benefit Too large –Increased cache pressure –Increase live range  register pressure

Optimal Unroll Factors

Classification Problems Input a vector of features –E.g. nest depth, # of branches, # of ops Output a class –E.g. unroll factor, 1-8 No prior knowledge required –Meaning of features/classes –Relevance of features –Relationships between features

Nearest Neighbors Paper describes Kernel Density Estimator All dimensions normalized to [0,1] Given a test point p: –Consider training points “close” to p Within fixed distance, e.g. 0.3 –Majority vote among qualifying training points

Nearest Neighbors

Support Vector Machine Assume two classes, easily generalized Transform data –Make classes linearly separable Find line to maximize sep. margin For test point: –Perform transformation –Classify based on learned line

Maximal Margin

Non-Linear SVM

Some Features # operands Live range size Critical path length # operations Known tripcount # floating point ops Loop nest level # branches # memory ops Instruction fan-in in DAG # instructions Language: C, fortran # memory ops # Implicit instructions & more (38 total)

Results: No Software Parallelism

Results: With Software Parallelism

Big Idea: Easy Maintenance Performance improvements modest –Sometimes worse, sometimes much better –Usually little change Requires no re-tuning to change compiler –Gathering data takes ~1wk, no human time General mechanism –Can be applied to all parameters –No model of system needed Can be applied to new transformations where expert knowledge is unavailable

Integrated CPU and L2 Cache Voltage Scaling using Machine Learning

Dynamic Voltage Control Monitor system When activity is low, reduce power –Also reduces computational capacity –May need more energy if work takes longer

Multiple Clock Domains Adjust separate components independently Better performance/power –E.g. CPU-bound application may be able to decrease power to memory and cache without affecting performance More complex DVM policy

Motivation Applications go through phases Frequency/voltages should change too Focus on core, L2 cache – Consume large fraction of total power Best policy may change over time –On battery: conserve power –Plugged in: maximize performance

Learning a DVM Policy Compiler automatically instruments code –Insert sampling code to record perf. Counters –Instrument code only to gather data Use machine learning to create policy Implement policy in microcontroller

ML Parameters Features –Clock cycles per instruction –L2 accesses per instruction –Memory access per instruction Select voltage to minimize: –Total energy –Energy*delay

Machine Learning Algorithm Automatically learn set of if-then rules –E.g: If (L2PI >= 1) and (CPI <=0) then f_cache=1GHz Compact, expressive Can be implemented in hardware

Results Compared to independently managing core and L2: –Saves 22% on average, 46% max Learns effective rules from few features Compiler modifications instrument code Learned policy offline Implemented policy in microcontroller

Conclusion Machine learning derives models from data automatically Allows easy maintenance of heuristics Creates models that are more effective than hand-tuned

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.

Similar presentations

Presentation on theme: "Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.

Similar presentations

Presentation on theme: "Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback