Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.

Slides:



Advertisements
Similar presentations
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Pallavi Joshi  Chang-Seo Park  Koushik Sen  Mayur Naik ‡  Par Lab, EECS, UC Berkeley‡
Advertisements

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Michael Bond (Ohio State) Milind Kulkarni (Purdue)
Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang Ohio State Purdue.
Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar Electrical & Computer Engineering ISCA 2010.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Resurrector: A Tunable Object Lifetime Profiling Technique Guoqing Xu University of California, Irvine OOPSLA’13 Conference Talk 1.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Asynchronous Assertions Eddie Aftandilian and Sam Guyer Tufts University Martin Vechev ETH Zurich and IBM Research Eran Yahav Technion.
SOS: Saving Time in Dynamic Race Detection with Stationary Analysis Du Li, Witawas Srisa-an, Matthew B. Dwyer.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.
Cormac Flanagan and Stephen Freund PLDI 2009 Slides by Michelle Goodstein 07/26/10.
CS 582 / CMPE 481 Distributed Systems Concurrency Control.
Mayur Naik Alex Aiken John Whaley Stanford University Effective Static Race Detection for Java.
Transaction Management and Concurrency Control
Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.
DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,
Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
- 1 - Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor Chimera: Hybrid Program Analysis for Determinism * Chimera.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.
Breadcrumbs: Efficient Context Sensitivity for Dynamic Bug Detection Analyses Michael D. Bond University of Texas at Austin Graham Z. Baker Tufts / MIT.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Michael Bond Katherine Coons Kathryn McKinley University of Texas at Austin.
Parallelism without Concurrency Charles E. Leiserson MIT.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Computer Architecture Lab at Evangelos Vlachos, Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Phillip B. Gibbons, Babak Falsafi and Todd C. Mowry.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan November 29,
9 1 Chapter 9_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.
Soyeon Park, Shan Lu, Yuanyuan Zhou UIUC Reading Group by Theo.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
Using Escape Analysis in Dynamic Data Race Detection Emma Harrington `15 Williams College
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Prescient Memory: Exposing Weak Memory Model Behavior by Looking into the Future MAN CAO JAKE ROEMER ARITRA SENGUPTA MICHAEL D. BOND 1.
Optimistic Hybrid Analysis
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
Lightweight Data Race Detection for Production Runs
On-Demand Dynamic Software Analysis
PHyTM: Persistent Hybrid Transactional Memory
Transaction Management and Concurrency Control
Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni
Automatic Detection of Extended Data-Race-Free Regions
Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond
Jipeng Huang, Michael D. Bond Ohio State University
Chapter 10 Transaction Management and Concurrency Control
Inlining and Devirtualization Hal Perkins Autumn 2011
Transactions in Distributed Systems
Rethinking Support for Region Conflict Exceptions
Presentation transcript:

Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang Michael D. Bond 1

Dynamic Analyses for Parallel Programs Data Race Detector, Record & Replay, Transactional Memory, Deterministic Execution, etc. Performance is usually bad! – several times slower Fundamental difficulties? 2

Cross-thread dependences Crucial for dynamic analyses and systems Capturing cross-thread dependences – Detecting e.g. data race detector, dependence recorder – Controlling e.g. transactional memory, deterministic execution 3 o.f = … … = o.f T1 T2

Typical approach Per-object metadata (state) – E.g. last writer/reader thread At each object access: – Check current state – Analysis-specific action – Update state if needed – Perform the access Atomically 4

Typical approach Per-object metadata (state) – E.g. last writer/reader thread At each object access: – Check current state – Analysis-specific action – Update state if needed – Perform the access Atomically 5 How to guarantee?

Pessimistic Synchronization Used by most existing work – Data Race Detector [FastTrack, Flanagan & Freund, 2009] – Atomicity Violation Detector [Velodrome, Flanagan et al., 2008] – Record & Replay [Instant Replay, LeBlanc et al., 1987] [Chimera, Lee et al., 2012] 6

Pessimistic Synchronization 7 LockMetadata()

Pessimistic Synchronization 8 LockMetadata() Check and compute new metadata

Pessimistic Synchronization 9 LockMetadata() Check and compute new metadata Analysis-specific actions

Pessimistic Synchronization 10 LockMetadata() Check and compute new metadata Program access Analysis-specific actions

Pessimistic Synchronization 11 LockMetadata() Check and compute new metadata Program access UnlockAndUpdateMetadata() Analysis-specific actions

Pessimistic Synchronization Synchronization on every access 6X slowdown on average 12 LockMetadata() Check and compute new metadata Program access UnlockAndUpdateMetadata() Analysis-specific actions

Optimistic Synchronization 13 Used to improve performance – Biased Locking [Lock Reservation, Kawachiya et al., 2002] [Bulk Rebiasing, Russell & Detlefs, 2006] – Distributed Memory System [Shasta, Scales et al. 1996] – Framework Support [Octet, Bond et al. 2013]

Optimistic Synchronization 14 Avoid synchronization for non-conflicting accesses Heavyweight coordination for conflicting accesses

Optimistic Synchronization (Cont.) T1 T2 wr o.f write check wr o.f write check 15

Optimistic Synchronization (Cont.) T1 T2 wr o.f write check read check wr o.f write check 16

Optimistic Synchronization (Cont.) T1 T2 wr o.f safe point write check read check Analysis-specific action read check Analysis-specific action wr o.f write check 17

Optimistic Synchronization (Cont.) T1 T2 wr o.f safe point write check read check Analysis-specific action change metadata read check Analysis-specific action change metadata wr o.f write check rd o.f 18

Optimistic Synchronization (Cont.) T1 T2 wr o.f safe point write check read check Analysis-specific action change metadata read check Analysis-specific action change metadata wr o.f write check rd o.f 19 26% on average with outliers – Expensive if there are many conflicting accesses

Optimistic synchronization performs best if there are few conflicting accesses. 20

Pessimistic synchronization is cheaper for conflicting accesses. 21

Drink from both glasses? Goal: – Optimistic sync. for most non-conflicting accesses – Pessimistic sync. for most conflicting accesses Our approach: – Hybrid state model – Adaptive policy – Support for detecting and controlling cross-thread dependences 22

Adaptive Policy Decides when to perform Pess → Opt and Opt → Pess transitions Cost—Benefit model – Formulates the problem Online profiling – Efficiently collects information and approximates the Cost-Benefit model 23

Cost—Benefit model 24

Evaluation Implementation – Jikes RVM Parallel programs – DaCapo Benchmarks 2006 & 2009 – SPEC JBB 2000 & 2005 Platform – 32 cores (AMD Opteron 6272) 25

Performance 26

Performance 27

Performance 28

Framework support Detecting cross-thread dependences – dependence recorder Key challenge – Identify the source location of a happens-before edge for a pessimistic conflicting transition – Current solution requires acquiring a lock and writing to remote thread’s log 29

Framework support (Cont.) Controlling cross-thread dependences – enforcing Region Serializability (in progress) Key challenge – Need to keep locking pessimistic objects until the end of a region Possible solution – Defer unlocking of pessimistic objects until program lock releases Helps dependence recorder Simplifies instrumentation 30

Framework support (Cont.) Controlling cross-thread dependences – enforcing Region Serializability (in progress) Key challenge – Need to keep locking pessimistic objects until the end of a region Possible solution – Defer unlocking of pessimistic objects until program lock releases Helps dependence recorder Simplifies instrumentation 31

Conclusion & Future work Hybrid, adaptive synchronization achieves better performance – never significantly degrades performance – sometimes improves performance substantially Future directions – Explore different adaptive policies (e.g. aggregate profiling) – Reduce instrumentation cost by deferring unlock operations of pessimistic synchronization – Apply to control cross-thread dependences 32