Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.

Slides:



Advertisements
Similar presentations
Introduction to Memory Management. 2 General Structure of Run-Time Memory.
Advertisements

“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.
Runtime checking of expressive heap assertions Greta Yorsh, Martin Vechev, Eran Yahav, Bard Bloom.
Names and Bindings.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Memory allocation in computer science, is the act of allocating memory to a program for its usage, typically for storing variables, code or data. Memory.
Software Engineering CSE470: Process 15 Software Engineering Phases Definition: What? Development: How? Maintenance: Managing change Umbrella Activities:
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
LOW-OVERHEAD MEMORY LEAK DETECTION USING ADAPTIVE STATISTICAL PROFILING WHAT’S THE PROBLEM? CONTRIBUTIONS EVALUATION WEAKNESS AND FUTURE WORKS.
CORK: DYNAMIC MEMORY LEAK DETECTION FOR GARBAGE- COLLECTED LANGUAGES A TRADEOFF BETWEEN EFFICIENCY AND ACCURATE, USEFUL RESULTS.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
CS 1114: Data Structures – memory allocation Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
Finding Low-Utility Data Structures Guoqing Xu 1, Nick Mitchell 2, Matthew Arnold 2, Atanas Rountev 1, Edith Schonberg 2, Gary Sevitsky 2 1 Ohio State.
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
Pointers and Dynamic Variables. Objectives on completion of this topic, students should be able to: Correctly allocate data dynamically * Use the new.
Reference Types. 2 Objectives Introduce reference types –class –array Discuss details of use –declaration –allocation –assignment –null –parameter –aggregation.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Precise Memory Leak Detection for Java Software Using Container Profiling Guoqing Xu, Atanas Rountev Program analysis and software tools group Ohio State.
Data Structures and Programming.  John Edgar2.
UPC Runtime Layer Jason Duell. The Big Picture The Runtime layer handles everything that is both: 1) Platform/Environment specific —So compiler can output.
JAVA: An Introduction to Problem Solving & Programming, 5 th Ed. By Walter Savitch and Frank Carrano. ISBN © 2008 Pearson Education, Inc., Upper.
Java Security. Topics Intro to the Java Sandbox Language Level Security Run Time Security Evolution of Security Sandbox Models The Security Manager.
M1G Introduction to Programming 2 4. Enhancing a class:Room.
Dillon: CSE470: SE, Process1 Software Engineering Phases l Definition: What? l Development: How? l Maintenance: Managing change l Umbrella Activities:
EE4E. C++ Programming Lecture 1 From C to C++. Contents Introduction Introduction Variables Variables Pointers and references Pointers and references.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Marcus Frean.
Lecture 10 : Introduction to Java Virtual Machine
Verifying Atomicity via Data Independence Ohad Shacham Yahoo Labs, Israel Eran Yahav Technion, Israel Guy Gueta Yahoo Labs, Israel Alex Aiken Stanford.
Testing and Verifying Atomicity of Composed Concurrent Operations Ohad Shacham Tel Aviv University Nathan Bronson Stanford University Alex Aiken Stanford.
Real-Time Java on JOP Martin Schöberl. Real-Time Java on JOP2 Overview RTSJ – why not Simple RT profile Scheduler implementation User defined scheduling.
1 Object Oriented Programming Lecture IX Some notes on Java Performance with aspects on execution time and memory consumption.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Parallel Checking of Expressive Heap Assertions Greta YorshMartin VechevEran YahavBard Bloom IBM T.J. Watson Research Center.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.
1 Dynamic Memory Allocation –The need –malloc/free –Memory Leaks –Dangling Pointers and Garbage Collection Today’s Material.
11/26/2015IT 3271 Memory Management (Ch 14) n Dynamic memory allocation Language systems provide an important hidden player: Runtime memory manager – Activation.
CSS446 Spring 2014 Nan Wang.  To understand the implementation of linked lists and array lists  To analyze the efficiency of fundamental operations.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
CoCo: Sound and Adaptive Replacement of Java Collections Guoqing (Harry) Xu Department of Computer Science University of California, Irvine.
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Thomas Kuehne.
Efficient Detection of All Pointer and Array Access Errors Todd M.Austin Scott E.Breach Gurindar S.Sohi Computer Sciences Department University of Wisconsin-Madison.
Introduction to Garbage Collection. Garbage Collection It automatically reclaims memory occupied by objects that are no longer in use It frees the programmer.
Java & C++ Comparisons How important are classes and objects?? What mechanisms exist for input and output?? Are references and pointers the same thing??
JAVA: An Introduction to Problem Solving & Programming, 6 th Ed. By Walter Savitch ISBN © 2012 Pearson Education, Inc., Upper Saddle River,
JAVA COLLECTIONS LIBRARY School of Engineering and Computer Science, Victoria University of Wellington COMP T2, Lecture 2 Marcus Frean.
CMSC 2021 Software Development. CMSC 2022 Software Development Life Cycle Five phases: –Analysis –Design –Implementation –Testing –Maintenance.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Using the Java Collection Libraries COMP 103 # T2
Topic: Java Garbage Collection
Cork: Dynamic Memory Leak Detection with Garbage Collection
Dynamic Memory Allocation
JAVA COLLECTIONS LIBRARY
JAVA COLLECTIONS LIBRARY
Concepts of programming languages
Speculative Region-based Memory Management for Big Data Systems
Storage.
Sets, Maps and Hash Tables
Arrays and Collections
Introduction to Data Structure
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
RUN-TIME STORAGE Chuen-Liang Chen Department of Computer Science
CMPE 152: Compiler Design May 2 Class Meeting
Presentation transcript:

Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi Bu

Collections Abstract data types Many implementations Different space/time tradeoffs Incompatible selection might lead to runtime degradation Space bloat – wasted space ArraySet HashSet LinkedSet Set ArrayMap HashMap LinkedMap Map ArrayList LinkedList List LazySet LazyMap LazyList

Collection Bloat Collection bloat is a non justified space overhead for storing data in collections List s = new ArrayList(); s.add(1); 1 Bloat for s is 9

Collection Bloat Collection-bloat is a serious problem in practice Observed to occupy 90% of the heap in real-world applications Hard to detect and fix Accumulation: death by a thousand cuts Correction: Need to correlate bloat to program code How to pick the right implementation? Minimize bloat But without degrading running time

Our Vision Programmer declares the ADT to be used Set s = new Set(); Programmer defines what metric to optimize e.g. space-time Runtime automatically selects implementation based on metric Online: detect application usage of Set Online: select appropriate implementation of Set ArraySetHashSetLinkedSet Set …

This Work Programmer defines the implementation to be used Set s = new HashSet(); Programmer defines what metric to optimize space-time product Space = Bloat Runtime suggests implementation based on metric Online: automatically detect application usage of HashSet() Online: automatically suggest alternative to HashSet() Offline: programmer modifies program accordingly e.g. Set s = new ArraySet();

How Can We Calculate Bloat ? Data structure Bloat Occupied Data – Used Data Example: List s = new ArrayList(); s.add(1); Bloat for s is 9 1

How to Detect Collection Bloat? Each collection maintains a field for used data Language runtime can find out actually occupied data Bloat = Occupied Data – Used Data Solution: Garbage Collector Computes Bloat Online Reads used data fields from collections Low-overhead: can work online in production

ArrayList … int size … Object[] Array … Semantic Maps How Collections Communicate Information to GC Includes size and pointers to actual data fields Allows for trivial support of Custom Collections GC Used Data Occupied Data ArrayList Semantic map ArrayList Semantic map HashMap … elementCount … elementData … Used Data Occupied Data HashMap Semantic map HashMap Semantic map

Example: Collections Bloat in TVLA

Lower bound for bloat Example: Collections Bloat in TVLA

Fixing Bloat Must correlate all bloat stats to program point Need Trace Information Remember: do not want to degrade time

Correlating Code and Bloat public final class ConcreteKAryPredicate extends ConcretePredicate { … public void modify() { … values = HashMapFactory.make(this.values); } … } public class GenericBlur extends Blur { … public void blur(TVS structure) { … Map invCanonicName =HashMapFactory.make(structure.nodes().size()); … } public class HashMapFactory { public static Map make(int size) { return new HashMap(size); } Ctx1 40% Ctx2 11% Ctx3 5% Ctx4 7% Ctx5 5% Ctx6 3% Ctx7 7% Ctx8 3% Aggregate bloat potential per allocation context Done by the garbage collector

Trace Information Track Collection Usage in Library: Distribution of operations Distribution of size Aggregated per allocation context ctx1 Size = 7 Get = 3 Add = 9 …. ctx2 Size = 1 Contains = 100 Insert = 1 …. ctx3 Size = 103 Contains = Insert = 140 Remove = 20 … ctxi ….

But how to choose the new Collection ? Rule Engine: user defined rules Input: Heap and Trace Statistics per-context Output: Suggested Collection for that context Rules based on trace and heap information HashMap: #contains < X  CollmaxSize < Y → ArrayMap HashMap: #contains Z → ArrayMap Hashmap: maxSize < X → ArrayMap LinkedList: NoListOp → ArrayList Hashmap: (#contains Z ) → ArrayMap … Rule Engine

Overall Picture Hashmap: maxSize < X → ArrayMap LinkedList: NoListOp → ArrayList Hashmap: (#contains Z ) → ArrayMap … Rule Engine ctx1 Size = 7 Get = 3 Add = 9 …. ctx2 Size = 1 Contains = 100 Insert = 1 …. Semantic Profiler Program Semantic maps Rules Recommendations Potential report

Correct Collection Bloat – Typical Usage Step 1: Profile for Bloat without Context Low-overhead, can run in production If problem detected, go to step 2 Automatic Step 2: Combine heap information with trace information per context Can switch automatically to step 2 from step 1 Higher-overhead than step 1 Automatic: prior to Chameleon - a manual step (very hard) Step 3: Suggest fixes to user based on rules Automatic Step 4: Programmer applies suggested fixes Manual

Chameleon on TVLA 1: HashMap:tvla...HashMapFactory:31 ;tvla.core.base.BaseTVS:50 replace with ArrayMap … 4: ArrayList:BaseHashTVSSet:112; tvla...base.BaseHashTVSSet:60 set initial capacity Potential Operations Size Max Avg Stddev Potential Operations Size Max Avg Stddev

Implementation Built on top of IBM’s JVM Modifications to Parallel Mark and Sweep GC Modular changes, readily applicable to other GCs Modifications to collection libraries Runtime overhead Detection Phase: Negligible Correction Phase: ~2x (due to cost of getting context) Can Use PCC by Bond & McKinley

Experimental Results – Memory

Experimental Results – Time

Related Work Large volume of work on SETL Automatic data structure selection in SETL [Schonberg et. al., POPL'79] SETL representation sublanguage [Dewar et. al, TOPLAS'79] … Bloat The Causes of Bloat, The Limits of Health [ Mitchell and Sevitsky, OOPSLA’07]

Summary Collection selection is a real problem Runtime penalty Bloat Chameleon integrates trace and heap information for choosing a collection implementation based on predefined rules Using Chameleon, reduced the footprint of several applications Never degrading running time, often improving it First step towards automatic collection selection as part of the runtime system