Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas.

Similar presentations


Presentation on theme: "Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas."— Presentation transcript:

1 Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas R. Gross – ETH Zurich October 30 th, 2015 - OOPSLA15 1

2 MemoizeIt 2 Dynamic analysis Memoization opportunities Automatic 9 new real-world memoization opportunities

3 Apache POI – Issue 55611 3 Performance Issue

4 public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3

5 public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 Java profiler Ranked 10 (189), 4000 calls Java profiler Ranked 10 (189), 4000 calls Java profiler No additional bottleneck info Java profiler No additional bottleneck info

6 public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 Research tools Sympthoms are not there* Research tools Sympthoms are not there* No nested loops No memory bloat * [Nistor, ISCE13], [Xu, OOPSLA12]

7 public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 Observation Many calls have the same input and output values! Observation Many calls have the same input and output values! Output Returned value Output Returned value Input Parameters + accessed fields Input Parameters + accessed fields true false 0, “m/d/yy” 1, “h:mm” Memoization ?

8 public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 Purity analysis? Too conservative! Purity analysis? Too conservative! Side effect s Side effect s Side effect s Ignore side effects!

9 public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 MemoizeIt 1 st ranked method! MemoizeIt 1 st ranked method! MemoizeIt Finds calls with the same input and output values. MemoizeIt Finds calls with the same input and output values. Memoization!

10 boolean cache_value; int cache_key1; String cache_key2; public boolean isADateFormatSlow(int idx, String format) { // Slow isADateFormat code } public boolean isADateFormat(int idx, String format) { if (cache_key1 == idx && cache_key2.equals(format)) { return cache_value; } // Update cache keys and value return isADateFormatSlow(idx, format); } Apache POI – Issue 55611 3 Single entry instance cache Up to 25% speed-up!

11 MemoizeIt – Contributions 4 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods

12 MemoizeIt – Contributions 5 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods Challenge boolean DateUtil.isADateFormat(int idx, MyClass format) Heap

13 MemoizeIt – Contributions 6 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods Challenge MemoizeIt == Memoization + Iterative

14 MemoizeIt 7 ProgramProfiling Input CPU-Time Profiling Filtering of methods: 1.Number of executions 2.Average execution time 3.Relative execution time Filtering of methods: 1.Number of executions 2.Average execution time 3.Relative execution time Initial method candidates

15 MemoizeIt 8 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling

16 Input-Output Profiling 9 Input: Parameters + accessed fields Output: Returned value Input-output tuple (T) main … … … 1. For each call of candidate method 3. Select method candidates T1T1 T2T2 multiplicity(T 1 ) = 3 multiplicity(T 2 ) = 2 Repeated Input-Output  Memoization boolean DateUtil.isADateFormat(int idx, String format) 2. Trace method input-output values true false 0, “m/d/yy” 1, “h:mm”

17 Challenge – Complex Objects 10 boolean DateUtil.isADateFormat(int idx, MyClass format)

18 Challenge – Complex Objects 10 … x: 45 MyClass y: 1 z: B a: equals? Structural and content equivalence … x: 45 MyClass y: 0 z: B a:

19 Challenge – Complex Objects 11 flat(object) (MyClass 1, [45, 1, (B 1, [...])]) … x: 45 MyClass y: 1 z: B a:

20 Challenge – Complex Objects 12 Heap … x: 45 MyClass y: 1 z: B a: Can’t keep everything!

21 Challenge – Complex Objects 13 depth = 1depth = 2 x: 45 MyClass y: 0 z: B a: x: 45 MyClass y: 1 z: B a: Heap ref 1 ref 2 equals? Exhaustive traversal is expensive!

22 Solution - Iterative Profiling 14 depth = 1depth = 2 x: 45 MyClass y: 0 z: B a: x: 45 MyClass y: 1 z: B a: Heap ref 1 ref 2 equals? Iterative approach can analyze programs with complex structures

23 MemoizeIt 15 ProgramProfiling input CPU-Time Profiling Input-Output Profiling Candidates ranking Fix suggestions Initial method candidates Input-Output Profiling Filter method candidates if max depth || time limit new candidates depth++ exit() d = 1

24 MemoizeIt 16 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling Ranking of Candidates ! Ranked candidate methods Ranking based 1.Estimated saved time 2.Estimated hit-ratio Ranking based 1.Estimated saved time 2.Estimated hit-ratio

25 MemoizeIt 17 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling Ranking of Candidates Fix Suggestions Optimal cache configuration ! Ranked candidate methods Suggests configuration among: Single Instance Single Global Multi Instance Multi Global + need for invalidation

26 Experimental Setup 18 ProgramDescription DaCapo 2006 MR2antlr, bloat, chart, fop, luindex, pmd Checkstyle - 5.6Source-code style checker Soot – ae0cec69c0Static program analysis / manipulation Apache Tika - 1.3Content analysis toolkit Apache POI - 3.9MS Office documents manipulation

27 Evaluation – Research Question Is MemoizeIt effective at finding new memoization opportunities? 1.Manually select realistic input 2.Execute MemoizeIt 3.Manually inspect methods 4.Implement MemoizeIt’s suggestions Timeout for profiling: 1 hour 19

28 Evaluation – Results 20 9 new opportunities DaCapo-antlr, DaCapo-bloat, DaCapo-fop Soot, Apache-Tika, Apache-POI, Checkstyle 1 duplicate method in Apache-Tika, Apache-POI 31 memoization opportunities Is MemoizeIt effective at finding new memoization opportunities?

29 Evaluation – Results 21 Small workload [speed-up] Large workload [speed-up] DaCapo-antlr 1.04 ± 0.031.05 ± 0.02 DaCapo-bloat 1.08 ± 0.03- DaCapo-fop 1.05 ± 0.01NA Checkstyle -9.95 ± 0.10 Soot 1.27 ± 0.0312.93 ± 0.05 Apache-Tika Excel -1.25 ± 0.02 Apache-Tika Jar 1.09 ± 0.011.12 ± 0.02 Apache-POI (1) 1.11 ± 0.011.92 ± 0.01 Apache-POI (2) 1.07 ± 0.011.12 ± 0.01

30 Evaluation – Research Question 22 Is the iterative or exhaustive approach more efficient?

31 Evaluation – Results 22 Iterative Time [minutes] Exhaustive Time [minutes] DaCapo-antlr timeout DaCapo-bloat timeout DaCapo-chart 22 DaCapo-fop 18timeout DaCapo-luindex 32timeout DaCapo-pmd timeout Checkstyle 622 Soot timeout Apache-Tika Excel 5856 Apache-Tika Jar 4135 Apache-POI 2337 Iterative wins Exhaustive wins Is the iterative or exhaustive approach more efficient?

32 Related Work Performance problems Detecting [Xu, OOPSLA12], [Zaparanuks, PLDI12] Understanding [Song, OOPSLA14], [Yu, ASPLOS14] Fixing [Nistor, ICSE15] 23 Compiler optimizations [Ding, CGO04], [Costa, CGO13], [St-Amour, OOPSLA12] Incremental computations [Pugh, POPL89] Other caching techniques [Ma, WWW15]

33 Conclusions Profiling of memoization opportunities New real-world opportunities Relevant speed-ups Iterative strategy beneficial Suggests cache configurations Suggestions easy to implement Artifact evaluated https://github.com/lucadt/memoizeit 24 Heap Single Global Multi Instance Multi Global Single Instance


Download ppt "Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas."

Similar presentations


Ads by Google