Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond

Slides:



Advertisements
Similar presentations
Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani.
Advertisements

TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Michael Bond (Ohio State) Milind Kulkarni (Purdue)
Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang Ohio State Purdue.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Transactional Memory – Implementation Lecture 1 COS597C, Fall 2010 Princeton University Arun Raman 1.
Reporter:PCLee With a significant increase in the design complexity of cores and associated communication among them, post-silicon validation.
SOS: Saving Time in Dynamic Race Detection with Stationary Analysis Du Li, Witawas Srisa-an, Matthew B. Dwyer.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.
DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris,
1 Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee, Samuel P. Midkiff,
Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan November 29,
Atom-Aid: Detecting and Surviving Atomicity Violations Brandon Lucia, Joseph Devietti, Karin Strauss and Luis Ceze LBA Reading Group 7/3/08 Slides by Michelle.
Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.
FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
Prescient Memory: Exposing Weak Memory Model Behavior by Looking into the Future MAN CAO JAKE ROEMER ARITRA SENGUPTA MICHAEL D. BOND 1.
Lecture 20: Consistency Models, TM
vCAT: Dynamic Cache Management using CAT Virtualization
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
Lightweight Data Race Detection for Production Runs
On-Demand Dynamic Software Analysis
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Tong Zhang, Dongyoon Lee, Changhee Jung
Alex Kogan, Yossi Lev and Victor Luchangco
PHyTM: Persistent Hybrid Transactional Memory
Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni
Effective Data-Race Detection for the Kernel
SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
Energy-Efficient Address Translation
Application Slowdown Model
Changing thread semantics
Jipeng Huang, Michael D. Bond Ohio State University
Lecture 6: Transactions
Chapter 10 Transaction Management and Concurrency Control
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Design and Implementation Issues for Atomicity
Kernel Synchronization II
Lecture 23: Transactional Memory
Department of Computer Science University of California, Santa Barbara
CSE 542: Operating Systems
Rethinking Support for Region Conflict Exceptions
DMP: Deterministic Shared Memory Multiprocessing
Presentation transcript:

Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond Drinking from Both Glasses: Combining Pessimistic and Optimistic Tracking of Cross-Thread Dependences Optimistic tracking *means* something different from optimistic locking in the previous two talks. Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond PPoPP 2016

Dynamic Analyses for Parallel Programs Error detection Data Race Detector Atomicity Violation Detector Programming model Transactional Memory Enforcement of Strong Memory Model Debugging Record & Replay Deterministic Execution Software only

Dynamic Analyses for Parallel Programs Bad performance! Error detection Data Race Detector Atomicity Violation Detector Programming model Transactional Memory Enforcement of Strong Memory Model Debugging Record & Replay Deterministic Execution Software only Bad performance in commodity systems; custom hardware can have good performance but is unrealistic

Dynamic Analyses for Parallel Programs Bad performance! Error detection Data Race Detector Atomicity Violation Detector Programming model Transactional Memory Enforcement of Strong Memory Model Debugging Record & Replay Deterministic Execution Software only Bad performance in commodity systems; custom hardware can have good performance but is unrealistic Difficulties?

Cross-thread dependences o.f = … … = o.f T1 T2 o.f is a shared memory location Crucial for dynamic analyses and systems

Cross-thread dependences Tracking cross-thread dependences o.f = … … = o.f T1 T2 o.f is a shared memory location Crucial for dynamic analyses and systems

Cross-thread dependences Tracking cross-thread dependences Detecting Controlling o.f = … … = o.f T1 T2 Crucial for dynamic analyses and systems To help understand the problem, suppose we are trying to build a dependence recorder.

Outline Dynamic Analyses and Cross-thread Dependences Pessimistic Tracking Optimistic Tracking Our approach Hybrid Tracking Evaluation Motivates combining PT and OT to build hybrid tracking Evaluation shows better performance from HT

Pessimistic Tracking Per-object metadata: o.state last writer/reader thread Per-variable vs per-object

Pessimistic Tracking Per-object metadata: o.state last writer/reader thread At each object access: rd/wr o.f

Analysis-specific work Pessimistic Tracking Per-object metadata: o.state last writer/reader thread At each object access: Check o.state Blue box: Instrumentation Analysis-specific work rd/wr o.f Update o.state

Analysis-specific work Pessimistic Tracking Per-object metadata: o.state last writer/reader thread At each object access: Check o.state atomicity is the key performance issue. Analysis-specific work Atomic rd/wr o.f Update o.state

Analysis-specific work Pessimistic Tracking Per-object metadata: o.state last writer/reader thread At each object access: Lock o.state Check o.state Create small critical section around each access Used by most existing work Pessimistically assumes accesses are involved in cross-thread dependences Analysis-specific work rd/wr o.f Update o.state Unlock o.state

Analysis-specific work Pessimistic Tracking Per-object metadata: o.state last writer/reader thread At each object access: Lock o.state Check o.state Special locked state bit pattern Analysis-specific work NOT program locks rd/wr o.f Update o.state Unlock o.state

Pessimistic Tracking Lock o.state … wr o.f Unlock o.state Lock o.state By using lock and unlock, analyses can soundly track cross-thread deps … rd o.f Unlock o.state

Synchronization and write metadata Pessimistic Tracking Synchronization and write metadata Lock o.state … wr o.f Unlock o.state Lock o.state Remote cache misses Serialize concurrent read … rd o.f Unlock o.state

Performance of Pessimistic Tracking Alone 340% Describe experiment methodology later Low overhead for some programs

Outline Dynamic Analyses and Cross-thread Dependences Pessimistic Tracking Optimistic Tracking Our approach Hybrid Tracking Evaluation

Optimistic Tracking Biased reader-writer lock for o.state Most accesses are non-conflicting and has no state change.

Optimistic Tracking Biased reader-writer lock for o.state Avoid synchronization for non-conflicting accesses Most accesses are non-conflicting and has no state change.

Optimistic Tracking Biased reader-writer lock for o.state Avoid synchronization for non-conflicting accesses Heavyweight coordination for conflicting accesses Define conflicting access. Most accesses are non-conflicting and has no state change – A worthwhile tradeoff. Introduce coordination – inter-thread communication/handshake protocol

Optimistic Tracking T1 T2 write check wr o.f o.state: WrExT1 Suppose we are trying to build a dependence recorder.

Optimistic Tracking T1 T2 write check wr o.f read check o.state: WrExT1 read check o.state: WrExT1 Suppose we are trying to build a dependence recorder.

Analysis-specific work Optimistic Tracking T1 T2 read check Analysis-specific work wr o.f write check o.state: WrExT1 o.state: WrExT1 Request Suppose we are trying to build a dependence recorder.

Analysis-specific work Optimistic Tracking T1 T2 safe point read check Analysis-specific work change state wr o.f write check rd o.f o.state: WrExT1 o.state: WrExT1 Request Suppose we are trying to build a dependence recorder. Response o.state ← RdExT2

Performance of Optimistic Tracking Alone

Performance of Optimistic Tracking Alone 0.5% conflicting access, >100% overhead For most, < 0.1% accesses conflicting.

Cost of Different Tracking Pessimistic Optimistic Same state Coordination 150 47 9200 PT’s cost is similar for both conflicting and non-conflicting access In CPU cycles Averaged across all programs

Optimistic tracking performs best if there are few conflicting accesses.

Pessimistic tracking is cheaper for conflicting accesses.

Drink from both glasses? Goal Optimistic tracking for most non-conflicting accesses Pessimistic tracking for most conflicting accesses Sweet spot between PT and OT.

Outline Dynamic Analyses and Cross-thread Dependences Pessimistic Tracking Optimistic Tracking Our approach Hybrid Tracking Evaluation

Outline Dynamic Analyses and Cross-thread Dependences Pessimistic Tracking Optimistic Tracking Our approach Hybrid Tracking Evaluation Hybrid State Model Adaptive Policy Soundly and efficiently combines PT and OT. Online profiling to decide which tracking should be used for each object.

Outline Dynamic Analyses and Cross-thread Dependences Pessimistic Tracking Optimistic Tracking Our approach Hybrid Tracking Evaluation Challenging! Hybrid State Model Adaptive Policy Challenging to build sound and efficient model that is suitable to build other analyses.

Outline Dynamic Analyses and Cross-thread Dependences Pessimistic Tracking Optimistic Tracking Our approach Hybrid Tracking Evaluation Hybrid State Model Deferred Unlocking Adaptive Policy

Pessimistic-Optimistic Mismatch Pessimistic Tracking Optimistic Tracking … Lock o.state rd/wr o.f Unlock o.state safe point read check … change state wr o.f write check rd o.f

Pessimistic-Optimistic Mismatch (#1) Pessimistic Tracking Optimistic Tracking … Lock o.state rd/wr o.f Unlock o.state safe point read check … change state wr o.f write check rd o.f Each access could use pess or opt tracking Seems need conditional unlock OT seems always locked, transfer access privilege on demand No unlock!

Pessimistic-Optimistic Mismatch (#1) Pessimistic Tracking Optimistic Tracking … Lock o.state rd/wr o.f Unlock o.state safe point read check … change state wr o.f write check rd o.f Each access could use pess or opt tracking Seems need conditional unlock OT seems always locked, transfer access privilege on demand Conditional unlock? No unlock!

Pessimistic-Optimistic Mismatch (#2) Pessimistic Tracking Optimistic Tracking … Lock o.state rd/wr o.f Unlock o.state safe point read check … change state wr o.f write check rd o.f Atomic Atomic Atomicity could be interrupted at points that are statically unpredictable For T2, atomicity till next safe point

Pessimistic-Optimistic Mismatch (#2) Pessimistic Tracking Optimistic Tracking … Lock o.state rd/wr o.f Unlock o.state safe point read check … change state wr o.f write check rd o.f Atomic Atomic Atomicity granularity *does* affect analyses, but just not going to explain it on this slide. Suppose dep recorder, need to maintain T.counter for mistic tracking, killing the benefit of optimistic tracking Atomicity granularity may affect specific analysis

Pessimistic-Optimistic Mismatch (#2) Pessimistic Tracking Optimistic Tracking … Lock o.state rd/wr o.f Unlock o.state safe point read check … change state wr o.f write check rd o.f Atomic Atomic Unable to improve OT’s performance with conditional unlock. Instrumentation overhead. Significant modification to specific analysis, which adds more overhead and kills the performance benefit from hybridization. Atomicity granularity may affect specific analysis Conditional unlock

Key Insights Coarsening atomicity granularity for pessimistic tracking

Key Insights Coarsening atomicity granularity for pessimistic tracking Program synchronization may hint at cross-thread dependences

Addressing Pessimistic-Optimistic Mismatch Defer unlocking of pessimistic state Till program synchronization release operation (PSRO) Lock release, thread fork

Addressing Pessimistic-Optimistic Mismatch Defer unlocking of pessimistic state Till program synchronization release operation (PSRO) Reader-writer locking Lock release, thread fork

Addressing Pessimistic-Optimistic Mismatch Defer unlocking of pessimistic state Till program synchronization release operation (PSRO) Reader-writer locking Fall back to coordination on contention when locking a state Lock release, thread fork

Deferred Unlocking Example 1 synchronized (m) { … } Lock o.state wr o.f Unlock all states synchronized (m) { … Lock o.state rd o.f

Deferred Unlocking Example 2 synchronized (m) { … } Lock o.state wr o.f Lock o.state safe point Avoids deadlock and ensure responsiveness With deferred unlocking, we build hybrid state model. Unlock all states rd o.f

Hybrid State Model Pessimistic locked and unlocked, optimistic Deferred unlocking Transition between pessimistic and optimistic Introduce adaptive policy

Hybrid State Model Pessimistic locked and unlocked, optimistic Deferred unlocking Transition between pessimistic and optimistic Introduce adaptive policy

Outline Dynamic Analyses and Cross-thread Dependences Pessimistic Tracking Optimistic Tracking Our approach Hybrid Tracking Evaluation Hybrid State Model Deferred Unlocking Adaptive Policy Motivates combining PT and OT to build hybrid tracking

Adaptive Policy Decide when to transition between pessimistic and optimistic states Formalize the problem, approximate the model. Aggregate profiling, limit study shows per-object profiling gets nearly all possible benefits

Adaptive Policy Decide when to transition between pessimistic and optimistic states Cost—benefit model Formalize the problem, approximate the model. Aggregate profiling, limit study shows per-object profiling gets nearly all possible benefits

Adaptive Policy Decide when to transition between pessimistic and optimistic states Cost—benefit model Boil down to counting state transitions Formalize the problem, approximate the model. Aggregate profiling, limit study shows per-object profiling gets nearly all possible benefits

Adaptive Policy Decide when to transition between pessimistic and optimistic states Cost—benefit model Boil down to counting state transitions Online profiling Per-object Simple yet effective Formalize the problem, approximate the model. Aggregate profiling, limit study shows per-object profiling gets nearly all possible benefits. More advanced adaptive policy is orthogonal.

Application of Hybrid Tracking Two dynamic analyses Hybrid dependence recorder and replayer (detect) Hybrid region serializability (RS) enforcer (control) Only recorder uses hybrid tracking

Application of Hybrid Tracking Two dynamic analyses Hybrid dependence recorder and replayer (detect) Hybrid region serializability (RS) enforcer (control) Deferred unlocking helps overcome key challenges! Due to mismatch

Outline Dynamic Analyses and Cross-thread Dependences Pessimistic Tracking Optimistic Tracking Our approach Hybrid Tracking Evaluation

Implementation Jikes RVM 3.1.3

Implementation Jikes RVM 3.1.3 Pessimistic tracking, optimistic tracking [Octet, Bond et al. OOPSLA’13] Optimistic recorder and replayer [Replay, Bond et al. PPPJ’15] Optimistic RS enforcer [EnfoRSer, Sengupta et al, ASPLOS’15]

Implementation Jikes RVM 3.1.3 Pessimistic tracking, optimistic tracking [Octet, Bond et al. OOPSLA’13] Optimistic recorder and replayer [Replay, Bond et al. PPPJ’15] Optimistic RS enforcer [EnfoRSer, Sengupta et al, ASPLOS’15] Hybrid tracking, hybrid recorder and replayer, hybrid RS enforcer publicly available DaCapo Benchmarks 2006 & 2009 SPEC JBB 2000 & 2005 32 cores (Intel Xeon E5 4620)

Performance of Tracking 28->22% DaCapo Benchmarks 2006 & 2009 SPEC JBB 2000 & 2005 32 cores (Intel Xeon E5 4620) 22%

Performance of Tracking Adds low-overhead on top of optimistic tracking for low-conflict programs

Performance of Tracking Most programs are low-conflict 22%

Performance of Recorders and RS enforcers 46->41% Reduction is less significant because still need to recorder similar amount of cross-thread dependences Recorders RS enforcers

Additional Materials Please check the paper Stress Tests Complete State Transition Table Run-time Characteristics

Related work Analyses that use pessimistic tracking [FastTrack, Flanagan & Freund, PLDI’09] [Velodrome, Flanagan et al., PLDI’08] [Chimera, Lee et al., PLDI’12] [Lightweight Transactions, Harris & Fraser, OOPSLA’03] [DMP, Devietti et al., ASPLOS’09] Analyses that use optimistic tracking [Shasta, Scales et al. ASPLOS’96] [Object Race Detection, von Praun & Gross, OOPSLA’01] [DoubleChecker, Biswas et al. PLDI’14] [LarkTM, Zhang et al, PPoPP’15] Adaptive Mechanisms [Adaptive Locks, Usui et al. PACT’09] [Strong Atomicity TM, Abadi et al. PPoPP’09] [Adaptive Lock Elision, Dice et al, SPAA’14] [Concurrency Control, Ziv et al, PLDI’15] Existing adaptive approaches do not address the problems with dep tracking.

Contributions Hybrid tracking combines pessimistic tracking and optimistic tracking effectively and efficiently Hybrid tracking achieves better overall performance Never significantly degrades performance Sometimes improves performance substantially Suitable for workload of diverse communication patterns