Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.

Similar presentations


Presentation on theme: "Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi."— Presentation transcript:

1 Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi Bu

2 Collections Abstract data types Many implementations Different space/time tradeoffs Incompatible selection might lead to runtime degradation Space bloat – wasted space ArraySet HashSet LinkedSet Set ArrayMap HashMap LinkedMap Map ArrayList LinkedList List LazySet LazyMap LazyList

3 Collection Bloat Collection bloat is a non justified space overhead for storing data in collections List s = new ArrayList(); s.add(1); 1 Bloat for s is 9

4 Collection Bloat Collection-bloat is a serious problem in practice Observed to occupy 90% of the heap in real-world applications Hard to detect and fix Accumulation: death by a thousand cuts Correction: Need to correlate bloat to program code How to pick the right implementation? Minimize bloat But without degrading running time

5 Our Vision Programmer declares the ADT to be used Set s = new Set(); Programmer defines what metric to optimize e.g. space-time Runtime automatically selects implementation based on metric Online: detect application usage of Set Online: select appropriate implementation of Set ArraySetHashSetLinkedSet Set …

6 This Work Programmer defines the implementation to be used Set s = new HashSet(); Programmer defines what metric to optimize space-time product Space = Bloat Runtime suggests implementation based on metric Online: automatically detect application usage of HashSet() Online: automatically suggest alternative to HashSet() Offline: programmer modifies program accordingly e.g. Set s = new ArraySet();

7 How Can We Calculate Bloat ? Data structure Bloat Occupied Data – Used Data Example: List s = new ArrayList(); s.add(1); Bloat for s is 9 1

8 How to Detect Collection Bloat? Each collection maintains a field for used data Language runtime can find out actually occupied data Bloat = Occupied Data – Used Data Solution: Garbage Collector Computes Bloat Online Reads used data fields from collections Low-overhead: can work online in production

9 ArrayList … int size … Object[] Array … Semantic Maps How Collections Communicate Information to GC Includes size and pointers to actual data fields Allows for trivial support of Custom Collections GC Used Data Occupied Data ArrayList Semantic map ArrayList Semantic map HashMap … elementCount … elementData … Used Data Occupied Data HashMap Semantic map HashMap Semantic map

10 Example: Collections Bloat in TVLA

11

12 Lower bound for bloat Example: Collections Bloat in TVLA

13 Fixing Bloat Must correlate all bloat stats to program point Need Trace Information Remember: do not want to degrade time

14 Correlating Code and Bloat public final class ConcreteKAryPredicate extends ConcretePredicate { … public void modify() { … values = HashMapFactory.make(this.values); } … } public class GenericBlur extends Blur { … public void blur(TVS structure) { … Map invCanonicName =HashMapFactory.make(structure.nodes().size()); … } public class HashMapFactory { public static Map make(int size) { return new HashMap(size); } Ctx1 40% Ctx2 11% Ctx3 5% Ctx4 7% Ctx5 5% Ctx6 3% Ctx7 7% Ctx8 3% Aggregate bloat potential per allocation context Done by the garbage collector

15 Trace Information Track Collection Usage in Library: Distribution of operations Distribution of size Aggregated per allocation context ctx1 Size = 7 Get = 3 Add = 9 …. ctx2 Size = 1 Contains = 100 Insert = 1 …. ctx3 Size = 103 Contains = 10041 Insert = 140 Remove = 20 … ctxi ….

16 But how to choose the new Collection ? Rule Engine: user defined rules Input: Heap and Trace Statistics per-context Output: Suggested Collection for that context Rules based on trace and heap information HashMap: #contains < X  CollmaxSize < Y → ArrayMap HashMap: #contains Z → ArrayMap Hashmap: maxSize < X → ArrayMap LinkedList: NoListOp → ArrayList Hashmap: (#contains Z ) → ArrayMap … Rule Engine

17 Overall Picture Hashmap: maxSize < X → ArrayMap LinkedList: NoListOp → ArrayList Hashmap: (#contains Z ) → ArrayMap … Rule Engine ctx1 Size = 7 Get = 3 Add = 9 …. ctx2 Size = 1 Contains = 100 Insert = 1 …. Semantic Profiler Program Semantic maps Rules Recommendations Potential report

18 Correct Collection Bloat – Typical Usage Step 1: Profile for Bloat without Context Low-overhead, can run in production If problem detected, go to step 2 Automatic Step 2: Combine heap information with trace information per context Can switch automatically to step 2 from step 1 Higher-overhead than step 1 Automatic: prior to Chameleon - a manual step (very hard) Step 3: Suggest fixes to user based on rules Automatic Step 4: Programmer applies suggested fixes Manual

19 Chameleon on TVLA 1: HashMap:tvla...HashMapFactory:31 ;tvla.core.base.BaseTVS:50 replace with ArrayMap … 4: ArrayList:BaseHashTVSSet:112; tvla...base.BaseHashTVSSet:60 set initial capacity Potential Operations Size Max 15 26 7 7 Avg 11.33 6.31 4.8 4.8 Stddev 1.36 5.05 1.17 1.17 Potential Operations Size Max 15 26 7 7 Avg 11.33 6.31 4.8 4.8 Stddev 1.36 5.05 1.17 1.17

20 Implementation Built on top of IBM’s JVM Modifications to Parallel Mark and Sweep GC Modular changes, readily applicable to other GCs Modifications to collection libraries Runtime overhead Detection Phase: Negligible Correction Phase: ~2x (due to cost of getting context) Can Use PCC by Bond & McKinley

21 Experimental Results – Memory

22 Experimental Results – Time

23 Related Work Large volume of work on SETL Automatic data structure selection in SETL [Schonberg et. al., POPL'79] SETL representation sublanguage [Dewar et. al, TOPLAS'79] … Bloat The Causes of Bloat, The Limits of Health [ Mitchell and Sevitsky, OOPSLA’07]

24 Summary Collection selection is a real problem Runtime penalty Bloat Chameleon integrates trace and heap information for choosing a collection implementation based on predefined rules Using Chameleon, reduced the footprint of several applications Never degrading running time, often improving it First step towards automatic collection selection as part of the runtime system


Download ppt "Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi."

Similar presentations


Ads by Google