Download presentation
Presentation is loading. Please wait.
Published byHugo Moody Modified over 9 years ago
1
Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1, K. Yotov 2, K. Pingali 2 1 University of Illinois at Urbana-Champaign 2 Cornell University
2
Two approaches to code optimization: Models –E.g., calculate the best tile size for MM as a function of cache size. –Fast –May be inaccurate –No verification through feedback Empirical Search –E.g., execute and measure different versions of MM code with different tile sizes. –Slow –Accurate because of feedback
3
Hybrid Approach Faster than empirical search More accurate than the model –Use the model as a prior –Use active sampling to minimize the amount of searching
4
Why is Speed Important? Adaptation may have to be applied at runtime, where running time is critical. Adaptation may have to be applied at compile time (e.g., with feedback from a fast simulator) Library routines can be used as a benchmark to evaluate alternative machine designs.
5
Problem: Matrix Multiplication Tiling –Improves the locality of references Cache Blocking (NB): Matrix is decomposed into smaller subblocks of size NBxNB Matrix multiplication - illustrative example for testing the hybrid approach Ultimate goal: a learning compiler that specializes itself to its installation environment, user profile, etc.
6
Empirical Search: ATLAS Try tiling parameters NB in the range in steps of 4
7
Model (Yotov et. al.) Compute NB which optimizes the use of the L1 cache. Constructed by analyzing the memory access trace of the matrix multiplication code. Formula: Has been extended to optimize the use of the L2 cache
8
Model in action: Performance curve: Vertical lines: model-predicted L1 and L2 blocking factors Whether to tile for the L1 or the L2 cache depends on the architecture and the application
9
Hybrid approach Model performance with a family of regression curves Regression (nonparam) – minimizing the average error Regression (ML) –Distribution over regression curves –Pick the most likely curve
10
Regression (Bayesian) Prior distribution curve) over regression curves –Make regression curves with model-predicted maxima more likely Posterior distribution given the data (Bayes rule): –P(curve|data)=P(data|curve) (curve)/P(data) Pick the maximum a-posteriori curve –Picks curves with peaks in model-predicted locations when the data sample is small –Picks curves which fit the data best when the sample is large
11
Active sampling Objectives: 1)Sample at lower-tile sizes – takes less time 2)Explore – don’t oversample in the same region 3)Get information about the dominant peak
12
Solution: Potential Fields objectives 1,2 Positive charge at the origin Negative charges at previously sampled points Sample at the point which minimizes the field
13
Potential Fields objective 3 Positive charge in the region of the dominant peak How do we know which peak dominates: –Distribution over regression curves can compute: P(peak1 is located at x), P(peak2 is located at x), P(peak1 is of height h), P(peak2 is of height h) Hence, can compute P(peak1 dominates peak2) Impose a positive charge in the region of each peak proportional to its probability of domination
14
Results I – Regression Curves
15
Results II – Time, Performance ModelHybridATLAS Sparc376.66851.04832.63 SGI499.81553.15505.4 ModelHybridATLAS Sparc0:003:128:59 SGI0:0014:0259:00 Performance (MFLOPS) Time (mins) Sparc – actual improvement due to the hybrid search for NB: ~10% SGI – improvement over both the model and ATLAS due to choosing to tile for the L2 cache
16
Results III – Library Performance
17
Conclusion Approach: incorporates the prior. Active sampling: actively picks to sample in the most informative region. Decreases the search time of the empirical search, improves on the model’s performance.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.