Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross-Module Optimization Thomas Lindgren

Similar presentations


Presentation on theme: "Cross-Module Optimization Thomas Lindgren"— Presentation transcript:

1 Cross-Module Optimization Thomas Lindgren

2 Overview OM - optimization manager – Erlang-to-Erlang optimizer (mostly) – ~20k lines of Erlang – intended to accelerate large applications The rest of this talk – What does OM do? – How well does it work?

3 Profiling code Source code Annotation trees Training exec aggregation Higher-order elimination Apply open-coding Outlining Module splitting Inlining Simplification Production exec Om overview (Other modules)

4 Profiling and annotation Instrument code with profiling counters – standard counters (per function clause, per call site, …) – which modules call each other, how often – which function is used at apply Annotations saved as syntax trees + counters Post-training: read counters, decorate annotation trees, optimize the result

5 Per-module optimizations Higher-order elimination: replace lists:map, lists:foldl, and others with specialized functions where suitable Apply open-coding: replace apply with explicit (open- ended) switch Outlining: cold (seldom-executed) clauses are moved out-of-line Module splitting: cold code moved into new module

6 Higher-order elimination Call: lists:map( fun(X) -> X+Y end, Xs) Call: lists_map_0(Xs,Y) lists_map_0([X|A],Y) -> [X+Y|lists_map_0(A,Y)]; lists_map_0([],Y) -> []. (The equivalent is done for most functions in lists)

7 Per-module optimizations Higher-order elimination: replace lists:map, lists:foldl, and others with specialized functions where suitable Apply open-coding: replace apply with explicit (open- ended) switch Outlining: cold (seldom-executed) clauses are moved out-of-line Module splitting: cold code moved into new module

8 Apply open-coding apply(M,F,[A1,…,An]) Profiling reveals that certain {Mod,Func,Arity} tuples are most common Switch on likely functions Enables inlining of explicit call (e.g., m1:f1(A1,A2)) case {M,F,length(As)} of {m1,f1,2} -> [A1,A2] = As, m1:f1(A1,A2); … _ -> apply(M,F,As) end (most general case; optimization possible when arity known, when call is local, …)

9 Per-module optimizations Higher-order elimination: replace lists:map, lists:foldl, and others with specialized functions where suitable Apply open-coding: replace apply with explicit (open- ended) switch Outlining: cold (seldom-executed) clauses are moved out-of-line Module splitting: cold code moved into new module

10 Outlining Move cold function clauses, switch clauses,... out-of-line Reduces function size => more inlining possible – outlining + inlining = (structured) partial inlining Sometimes improves pattern matching code case read_file(FD,Len) of {error,closed} -> …; {error,prot} -> …; {ok,{incomplete,Data}} -> …; {ok,{complete,Data}} -> …; X ->... end case read_file(FD,Len) of {ok,{complete,Data}} -> …; Else -> ‘OUTLINED’(Else) end

11 Per-module optimizations Higher-order elimination: replace lists:map, lists:foldl, and others with specialized functions where suitable Apply open-coding: replace apply with explicit (open- ended) switch Outlining: cold (seldom-executed) clauses are moved out-of-line Module splitting: cold code moved into new module

12 Module splitting Hot code retained in original module Cold functions moved into “cold module” – currently: duplicate entire original module Calls to cold functions re-routed to cold module – outlined function clauses often end up in cold module Benefit: reduces hot module size => more aggregation – drawback: total code size increases (unimportant?)

13 Aggregation Optimization across module boundaries – but in Erlang, any module can be replaced at any time (“hot code loading”) Merge optimized hot modules into aggregates – optimize each aggregate aggressively – but in Erlang you can replace any module at runtime – how to do it?

14 Hot code loading Remote calls m:f(X) logically do the following: – lookup module named m – lookup function named f/1 in the found module – call the found function A new version of m can be loaded at any time – but occurs seldom in practice (every month? week?) – (an aside: OTP further structures code replacement) we do not take advantage of this

15 Hot code loading (2) Inlining of remote calls is not possible – what if the inlined module subsequently changes? – worse, remote calls are very common Merging two modules into one is problematic – making remote calls into local calls changes behaviour – safe approach: speculate that code has not changed.

16 Hot code loading (3) Remote call is rewritten into test + common-case local call + backup remote call latest(m) can be implemented in linker – initially, always true – when new m loaded, becomes always false m:f(X1,X2) (case latest(m) of true -> local_f(X1,X2); false -> m:f(X1,X2) end)

17 Aggregation Merge modules that call each other often – use module-module call profile – remote calls are rewritten to use latest(m) – aggregation limited by size Widely-shared modules (e.g., lists) are engulfed – copy engulfed module into the calling module – necessary to enable high-quality aggregation without huge aggregates

18 Post-aggregation optimization Profile-guided inlining – consider call sites in order of importance (# calls) – total amount of inlining limited by code size increase – avoids pitfalls of static inlining: working on wrong code, too conservative for important sites Simplification of resulting code – dead function removal (occurs due to engulfing, inlining) – case-of-case, beta reduction,...

19 Results Benchmarks: important subsystems of OTP, daily use – (decode1: protocol processing “inner loop”) – beam: beam compiler on lists.erl – gen_tcp: small messages over local socket – ldapv2: encoding and decoding LDAPv2 ASN.1 PDUs – mnesia: realtime database running simple pseudo-HLR Benchmark suite freely available from author

20 Results (2) Each benchmark compiled with OM – same input used for training and production – latest(m) simulated with cheap test Each benchmark run times for baseline and optimized – removed outliers for gen_tcp and mnesia to get more focussed speedup values

21 Results (3)

22 Conclusions Optimization across modules beneficial Profile-driven optimization practical and beneficial Future work: – try real applications (100s-1000s of modules) – more optimizations – tune optimizations – automate reprofiling/recompilation


Download ppt "Cross-Module Optimization Thomas Lindgren"

Similar presentations


Ads by Google