Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Imperial College London IC-Parc William Penney Laboratory Towards Dynamic Instrumentation for Performance Optimisation Andy Cheadle.

Similar presentations


Presentation on theme: "© Imperial College London IC-Parc William Penney Laboratory Towards Dynamic Instrumentation for Performance Optimisation Andy Cheadle."— Presentation transcript:

1 © Imperial College London IC-Parc William Penney Laboratory Towards Dynamic Instrumentation for Performance Optimisation Andy Cheadle

2 © Imperial College London Page 2 Overview Motivation Profiling Techniques The Aspect Oriented Paradigm Instrumentation Profiling in ECLiPSe New ECLiPSe Tools – instrument – instprofile – modeanalyser – papi Ongoing Work

3 © Imperial College London Page 3 Motivation ECLiPSe Platform Developer – How does ECLiPSe perform on stock hardware? Cache utilisation - instruction / data Emulator instruction profiling Evaluation of core runtime system services implementation ECLiPSe Application Developer – How is my application performing? Where is the majority of the runtime spent? Why does my program spend so much time garbage collecting? More generally what is the pattern of resource usage over program execution? – Optimising at search level is not enough Ease of bottleneck identification is key to program optimisation!

4 © Imperial College London Page 4 Profiling Techniques Sample based profiling – Measurement recorded at a (fixed) interval – Low overhead - non-intrusive Relatively coarse-grained – Indicates trend of resource usage over time Misses spikes in resource usages occurring between samples Hard to accurately ascribe measurement to code location / exact instruction Instrumentation based Profiling – Insert measurement code around mutator code fragments – Greater overhead – code insertion can be highly intrusive Granularity is determined by size of code fragment – Accurate profiling of code fragment Captures all events occurring within fragment Measurements ascribed to exact mutator code location by callsite identifier

5 © Imperial College London Page 5 The Aspect Oriented Paradigm Programming (AOP) – Software Engineering to achieve separation of concerns – Targets cross-cutting concerns – Composition filters and aspects Languages – HyperJ, AspectJ – Domain specific languages - Template Haskell, MetaOCaml – Old hat to logic programmers - metaprogramming! Applications – Policies of distributed computing (security, deployment) – Logging, tracing and error reporting – Legacy  OO code migration – Instrumentation based code profiling

6 © Imperial College London Page 6 Instrumentation Profiling in ECLiPSe Challenges of the imperative world +... – Meta-called arguments, i.e. meta-predicates – Resatisfiability – Cut (!!!!) Box Model of Execution Call Exit Redo Fail

7 © Imperial College London Page 7 Fail events Anonymous events event_create(+Goal, -EventHandle) [eclipse 1]: event_create( writeln('Goodbye cruel world!'), Event), writeln('Hello world!'), event(Event). Hello world! Goodbye cruel world! Event = 'EVENT'(16'503f0238) Yes (0.00s cpu) Garbage collection of embedded handles Timeout library – Supports nested timeouts (time-aware search) – timeout/3, timeout/7, call_timeout_safe/1 Kernel Enhancements

8 © Imperial College London Page 8 :- lib(instrument) Tool for instrumentation of predicate definitions with user-defined predicates – Similar concept to AO instrumentation, but aspects are specified as templates not using language constructs module:foo/n = itemplate with [..] – Arity 25! 19 of which define instrumentation points – clause, block, subgoal and call each with * _start, * _end, * _fail, * _redo points – fact, inbetween instrumented by a single predicate – itemplate with [clause_start:(moda:clstart/2), clause_end:(modb:clend/2)] – clstart(SiteId, AuxVar) :- … – every_module is used as wildcard module qualifier – Fields may be specified as inherit ed from a global template

9 © Imperial College London Page 9 :- lib(instrument) – Meta-predicates can have templates specified for their arguments findall/3 = itemplate with […, meta_args:[_, ITemplateArg2, _], …] – The exclude field prevents instrumentation application to calls / subgoals within a specific predicate or by the global template instrument_recursive option of instrument/3 – itemplate with […,result:(mod:iresult/5),…] Predicate called during pretty-printing to insert results into html Instrumentation may be enabled and disabled at runtime (facilitates bottleneck search) – assert field specifies whether instrumentation is dynamic – Predicate calls made via extra level of indirection – Body of disabled instrumentation replaced by true – compile_term/1 invocation overhead at runtime So far instrumentation has been passive!

10 © Imperial College London Page 10 :- lib(instrument) Tool is also a compile-time code weaver! – itemplate with […,code_weaver:(mod:iweaver/6),…] – During compilation the iweaver is invoked passing the block of code undergoing compilation – File File undergoing compilation – Code Block of code being processed – Type clause, head, body, fact, variable, conjunction, disjunction, conditional, goal – WeavedCode Code processed by iweaver for insertion – Mode Compile or print (pretty-printing) – Module

11 © Imperial College London Page 11 :- lib(instprofile) Instrumentation / sampling based statistics profiler – Both complimentary mechanisms create traces that are currently graphed and analysed offline – Current metrics available are those of statistics/2 To be extended to user defined statistics and IC’s ic_stat_get/1 Sampling profiler – Low-overhead sampling profiles indicate resource usage trends over time – Multiple enabled profiles with different time periods supported – Example usage: AndyE’s Capacitated Shortest Path ?- statsample(“MemoryProfile”, 5, [global_stack_used, trail_stack_used, gc_number, gc_collected], ‘memory.dat’) ?- statsample_control(“MemoryProfile”, on) ?- go(cut, d_70_7, ‘solution.ecl’) ?- statsample_control(“MemoryProfile”, off)

12 © Imperial College London Page 12 :- lib(instprofile)

13 © Imperial College London Page 13 :- lib(instprofile)

14 © Imperial College London Page 14 :- lib(instprofile) Instrumentation based profiler – Accurate profiling of code fragments tied to callsite identifier – Higher overhead, more intrusive – clause, block, subgoal and call instrumentation points ?- statprofile('queens_gfc.pl', [global_stack_used, trail_stack_used]) ?- my_query(X, Y, Z) ?- – Delta values for the metrics across the code fragment can be recorded to file ( open_delta_file/1, close_delta_file/1, delta_results:on ) – Aggregate results can be dumped to a trace file using aggregate_result/1

15 © Imperial College London Page 15 :- lib(modeanalyser) Instrumentation based mode analyser Suggests mode/1 directives for predicate definitions – ‘ ++ ’ ground, ‘ + ’ nonvar, ‘ - ’ fresh var, ‘ ? ’ unknown Compiler generates compact and / or faster code Static (compile-time) analyses are slower and not so capable in a constraint (coroutined) system Note: – Incorrect mode specifier results in potentially incorrect or undefined behaviour – For ‘-’ mode specifier, the analyser cannot detect aliased variables (manually check)

16 © Imperial College London Page 16 :- lib(modeanalyser) [eclipse 1]: mode_analyser:analyse('queens_gfc.pl'). queens_gfc.pl compiled traceable bytes in 0.10 seconds [eclipse 2]: nqueens(8, Qs). L = [1, 5, 8, 6, 3, 7, 2, 4] Yes (0.00s cpu, solution 1, maybe more) ?... [eclipse 5]: mode_analyser:result. nqueens(++, -) noattack(?, ?) safe(+) noattack(?, +, ++) mode_analyser:result([verbose:on]) is very useful!

17 © Imperial College London Page 17 :- lib(papi) PAPI is a specification of a cross-platform interface to hardware performance counters on modern microprocessors – Standard set of events for application performance analysis – Both high- and low-level set of routines for accessing counters – L instruction and data cache statistics – Instruction and cycle counts (load, stores, FPU, branches, etc) – Microsecond timers – Per-process counters (from processor-wide registers) Example use of high-level interface (L1 data cache) papi_start_counters([papi_l1_dch, papi_l1_dca], 2) garbage_collect, papi_stop_counters(([L1DCH, L1DCA], 2) papi_read_counters([L1DCH, L1DCA], 2) papi_accum_counters([L1DCH, L1DCA], 2)

18 © Imperial College London Page 18 Ongoing Work Instrumentation of included files and modules Accurate cost model for instprofile and papi Reduction in book-keeping overhead – Avoid box/unbox of value during aggregation of results – Reduce stack usage of fail event trail frames (via merging?) Is profile strategy for recursive predicates sufficient? – Tail-recursion and last call optimisation must be preserved Dynamic instrumentation engine – Enable / disable instrumentation at a specific callsite – Drive instrumentation through call graph to locate bottlenecks Visualisation / graphing support


Download ppt "© Imperial College London IC-Parc William Penney Laboratory Towards Dynamic Instrumentation for Performance Optimisation Andy Cheadle."

Similar presentations


Ads by Google