Download presentation
Presentation is loading. Please wait.
Published byRosalind Tucker Modified over 8 years ago
1
OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001
2
Runtime Techniques Assume can only write system code turbochargers –No sophisticated compiler available –Can only minimally perturb user code
3
Q: What are the Biggest Inefficiencies? Imagine trying to get Proto to run faster
4
Hint: Most Popular Operations
5
Running Example (dg + ((x ) (y ) => )) (dm + ((x ) (y ) => ) (%ib (%i+ (%iu x) (%iu y))) (dm + ((x ) (y ) => ) (%fb (%f+ (%fu x) (%fu y))) (dm x2 ((x ) => ) (+ x x)) (dm x2 ((x ) => ) (+ x x))
6
A: What are the Biggest Inefficiencies? Boxing Method dispatch * Type checks Slot access Object creation * Today
7
Outline Overview Inline call caches Table Decision tree Variations Open Problems
8
Method Distributions Distribution can be measured –At generic –At call site Distribution can be –Monomorphic –Polymorphic –Megamorphic Distribution can be –peaked –uniform
9
Expense of Dispatch Problem: expensive if computed naively –Find applicable methods –Sort applicable methods –Call most applicable method –Three outcomes One most applicable method => ok No applicable methods => not understood error Many applicable methods => ambiguous error
10
Mapping View of Dispatch Dispatch can be thought of as a mapping from argument types to a method –(t1, t2, …, tn) => m
11
Solutions Caching Fast mapping
12
Table-based Approach N-dimensional tables –Keys are concrete classes of actual arguments –Values are methods to call –Must address size explosion –Talk a bit about this later Nested tables –Keys are concrete classes of actual arguments –Values are either other tables or methods to call
13
Table Example One
14
Table Example Two
15
Table Example Three
16
Table-based Critique Pros –Simple –Amenable to profile guided reordering Cons –Too many indirections –Very big demand build it Sharing of subtables –Only works for class types can use multiple tables
17
Engine Node Dispatch Glenn Burke and myself at Harlequin, Inc. circa 1996- –Partial Dispatch: Optimizing Dynamically-Dispatched Multimethod Calls with Compile-Time Types and Runtime Feedback, 1998 Shared decision tree built out of executable engine nodes Incrementally grows trees on demand upon miss Engine nodes are executed to perform some action typically tail calling another engine node eventually tail calling chosen method Appropriate engine nodes can be utilized to handle monomorphic, polymorphic, and megamorphic discrimination cases corresponding to single, linear, and table lookup
18
Engine Node Dispatch Picture Define method \+ (x ::, y :: ) … end; Seen (, ) and (, ) as inputs.
19
Engine Dispatch Critique Pros: Portable Introspectable Code Shareable Cons: Data and Code Indirections Sharing overhead Hard to inline Less partial eval opps
20
Lookup DAG Input is argument values Output is method or error Lookup DAG is a decision tree with identical subtrees shared to save space Each interior node has a set of outgoing class- labeled edges and is labeled with an expression Each leaf node is labeled with a method which is either user specified, not-understood, or ambiguous.
21
Lookup DAG Picture From Chambers and Chen OOPSLA-99
22
Lookup DAG Evaluation Formals start bound to actuals Evaluation starts from root To evaluate an interior node –evaluate its expression yielding v and –then search its edges for unique edge e whose label is the class of the result v and then edge's target node is evaluated recursively To evaluate a leaf node –return its method
23
Lookup DAG Evaluation Picture From Chambers and Chen OOPSLA-99
24
Lookup DAG Construction function BuildLookupDag (DF: canonical dispatch function): lookup DAG = create empty lookup DAG G create empty table Memo cs: set of Case := Cases(DF) G.root := buildSubDag(cs, Exprs(cs)) return G function buildSubDag (cs: set of Case, es: set of Expr): set of Case = n: node if (cs, es)->n in Memo then return n if empty?(es) then n := create leaf node in G n.method := computeTarget(cs) else n := create interior node in G expr:Expr := pickExpr(es, cs) n.expr := expr for each class in StaticClasses(expr) do cs': set of Case := targetCases(cs, expr, class) es': set of Expr := (es - {expr}) ^ Exprs(cs') n': node := buildSubDag(cs', es') e: edge := create edge from n to n' in G e.class := class end for add (cs, es)->n to Memo return n function computeTarget (cs: set of Case): Method = methods: set of Method := min<=(Methods(case)) if |methods| = 0 then return m-not-understood if |methods| > 1 then return m-ambiguous return single element m of methods
25
Single Dispatch Binary Search Tree Label classes with integers using inorder walk with goal to get subclasses to form a contiguous range Implement Class => Target Map as binary search tree balancing execution frequency information
26
Class Numbering
27
Binary Search Tree Picture From Chambers and Chen OOPSLA-99
28
Critique of Decision Tree Pros –Efficient to construct and execute –Can incorporate profile information to bias execution –Amenable to on demand construction –Amenable to partial evaluation and method inlining –Can easily incorporate static class information –Amenable to inlining into call-sites –Permits arbitrary predicates –Mixes linear, binary, and array lookups –Fast on modern CPU’s Cons –Requires code gen / compiler to produce best ones
29
Inline Call Caches Assumption: –method distribution is usually peaked and call-site specific Each call-site has its own cache Use call instruction as cache –Calls last taken method –Method prologue checks for correct arguments –Calls slow lookup on miss which also patches call instruction Deutsch and Schiffman, 1984
30
Inline Caching Example One
31
Inline Caching Two
32
Inline Caching Three
33
Inline Caching Critique Pros –Fast dispatch sequence for hit –Usually high hit rate (90-95% for Smalltalk) Cons –Uses self-modifying code –Slow for misses –Depends on method distribution spike –Might be less beneficial for multimethods
34
Polymorphic Inline Caching Handles polymorphically peaked distribution Generate call-site specific dispatch stub Holzle et al., 1991
35
Polymorphic Inline Caching Example One
36
Polymorphic Inline Caching Example Two
37
Polymorphic Inline Caching Example Three
38
Polymorphic Inline Cache Critique Pros –Faster for multiple peaked distributions Cons –Slow for uniform distribution –Requires runtime code generation –Doesn’t scale quite as well for multimethods and predicate types
39
Other Multimethod Approaches Hash table indexed by N keys, –Kiczales and Rodriguez 1989 Compressed N+1 dimensional dispatch table –Amiel et al. 1994 –Pang et al. 1999
40
Variations Inline method bodies into leaves of decision tree Reorder decision tree based on method distributions Fold slot access into dispatch
41
Open Problems Feed static information into dynamic dispatch Smaller Faster More adaptive
42
Readings Deutsch and Schiffman 1984 Kiczales and Rodriguez 1989 Dussud 1989 Moon and Cypher 19?? Amiel et al. 1994 Pang et al. 1999 Holzle and Ungar 1994 Chen and Turau 1994 Peter Lee Advanced Language Implementation 1991
43
Acknowledgements This lecture includes some material from Craig Chambers’ OOPSLA course on OO language implementation.
44
Assignment 3 Hint Create methods with the following construction: (dm make-method ((n ) (types ) (body ) => ) (select n ((0) (fun ()...)) ((1) (fun ((a0 (elt types 0)))...)) ((2) (fun ((a0 (elt types 0)) (a1 (elt types 1)))...))...)
45
Assignment 4 Write an associative dispatch cache Use linear lookup Include profile-guided reordering Don’t need to handle singleton dispatch
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.