DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.

Slides:



Advertisements
Similar presentations
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Pallavi Joshi  Chang-Seo Park  Koushik Sen  Mayur Naik ‡  Par Lab, EECS, UC Berkeley‡
Advertisements

The complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan Univ of Illinois at Urbana Champaign.
Michael Bond (Ohio State) Milind Kulkarni (Purdue)
Effective Static Deadlock Detection
Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang Ohio State Purdue.
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Efficient, Context-Sensitive Dynamic Analysis via Calling Context Uptrees Jipeng Huang, Michael D. Bond Ohio State University.
Race Detection for Android Applications
Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,
Verification of Multithreaded Object- Oriented Programs with Invariants Bart Jacobs, K. Rustan M. Leino, Wolfram Schulte.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Resurrector: A Tunable Object Lifetime Profiling Technique Guoqing Xu University of California, Irvine OOPSLA’13 Conference Talk 1.
/ PSWLAB Concurrent Bug Patterns and How to Test Them by Eitan Farchi, Yarden Nir, Shmuel Ur published in the proceedings of IPDPS’03 (PADTAD2003)
SOS: Saving Time in Dynamic Race Detection with Stationary Analysis Du Li, Witawas Srisa-an, Matthew B. Dwyer.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
CS 263 Course Project1 Survey: Type Systems for Race Detection and Atomicity Feng Zhou, 12/3/2003.
Atomicity Violation Detection Prof. Moonzoo Kim CS492B Analysis of Concurrent Programs.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
Mayur Naik Alex Aiken John Whaley Stanford University Effective Static Race Detection for Java.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Threads. Processes and Threads  Two characteristics of “processes” as considered so far: Unit of resource allocation Unit of dispatch  Characteristics.
Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
University of Michigan Electrical Engineering and Computer Science 1 Practical Lock/Unlock Pairing for Concurrent Programs Hyoun Kyu Cho 1, Yin Wang 2,
C. FlanaganType Systems for Multithreaded Software1 Cormac Flanagan UC Santa Cruz Stephen N. Freund Williams College Shaz Qadeer Microsoft Research.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
- 1 - Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor Chimera: Hybrid Program Analysis for Determinism * Chimera.
School of Information Technologies Michael Cahill 1, Uwe Röhm and Alan Fekete School of IT, University of Sydney {mjc, roehm, Serializable.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
Deadlock Detection Nov 26, 2012 CS 8803 FPL 1. Part I Static Deadlock Detection Reference: Effective Static Deadlock Detection [ICSE’09]
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
1 Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee, Samuel P. Midkiff,
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Michael Bond Katherine Coons Kathryn McKinley University of Texas at Austin.
Bugs (part 1) CPS210 Spring Papers  Bugs as Deviant Behavior: A General Approach to Inferring Errors in System Code  Dawson Engler  Eraser: A.
Motivation  Parallel programming is difficult  Culprit: Non-determinism Interleaving of parallel threads But required to harness parallelism  Sequential.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Detecting Atomicity Violations via Access Interleaving Invariants
Effective Static Deadlock Detection Mayur Naik* Chang-Seo Park +, Koushik Sen +, David Gay* *Intel Research, Berkeley + UC Berkeley.
Aditya Thakur Rathijit Sen Ben Liblit Shan Lu University of Wisconsin–Madison Workshop on Dynamic Analysis 2009 Cooperative Crug Isolation.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
Effective Static Deadlock Detection Mayur Naik (Intel Research) Chang-Seo Park and Koushik Sen (UC Berkeley) David Gay (Intel Research)
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.
Demand-Driven Software Race Detection using Hardware Performance Counters Joseph L. Greathouse †, Zhiqiang Ma ‡, Matthew I. Frank ‡ Ramesh Peri ‡, Todd.
Using Escape Analysis in Dynamic Data Race Detection Emma Harrington `15 Williams College
FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.
Prescient Memory: Exposing Weak Memory Model Behavior by Looking into the Future MAN CAO JAKE ROEMER ARITRA SENGUPTA MICHAEL D. BOND 1.
Presenter: Godmar Back
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
Lightweight Data Race Detection for Production Runs
CSC 591/791 Reliable Software Systems
Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni
Effective Data-Race Detection for the Kernel
Lazy Diagnosis of In-Production Concurrency Bugs
Threads and Memory Models Hal Perkins Autumn 2011
Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond
Jipeng Huang, Michael D. Bond Ohio State University
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Rethinking Support for Region Conflict Exceptions
Presentation transcript:

DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University PLDI 2014

Impact of Concurrency Bugs

Northeastern blackout, 2003

Impact of Concurrency Bugs

Atomicity Violations ● Constitute 69% 1 of all non-deadlock concurrency bugs 1. S. Lu et al. Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics. In ASPLOS, 2008.

Atomicity ● Concurrency correctness property ● Synonymous with serializability o Program execution must be equivalent to some serial execution of the atomic regions

Atomicity Violation Example Thread 1Thread 2 void execute() { while (...) { prepareList(); processList(); resetList(); } void execute() { while (...) { prepareList(); processList(); resetList(); }

Atomicity Violation Example Thread 1Thread 2 void prepareList() { synchronized (l1) { list.add(new Object()); } void processList() { synchronized (l1) { Object head = list.get(0); } void resetList() { synchronized (l1) { list = null; }

Atomicity Violation Example Thread 1Thread 2 void prepareList() { synchronized (l1) { list.add(new Object()); } void processList() { synchronized (l1) { Object head = list.get(0); } void resetList() { synchronized (l1) { list = null; } Null pointer dereference Data-race-free program

Atomicity Violation Example Thread 1Thread 2 void execute() { while (...) { prepareList(); processList(); resetList(); } void execute() { while (...) { prepareList(); processList(); resetList(); } atomic

● Check for conflict serializability o Build a transactional dependence graph o Check for cycles ● Existing work o Velodrome, Flanagan et al., PLDI 2008 o Farzan and Parthasarathy, CAV 2008 Detecting Atomicity Violations

Transactional Dependence Graph wr o.f wr o.g wr o.f acq lock rel lock time Thread 1Thread 2Thread 3 transaction

Transactional Dependence Graph wr o.f wr o.g wr o.f acq lock rel lock time Thread 1Thread 2Thread 3 transaction

Cycle means Atomicity Violation wr o.f wr o.g rd o.f wr o.f acq lock rel lock time Thread 1Thread 2Thread 3 transaction

Velodrome 1 ●Paper reports 12.7X overhead ●6.1X in our experiments Prior Work is Slow 1. C. Flanagan et al. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, 2008.

● Precise tracking is expensive o “last transaction(s) to read/write” for every field o Need atomic updates in instrumentation High Overheads of Prior Work

Instrumentation Approach Program access Uninstrumented program Instrumented program

Precise Tracking is Expensive! Program access Update metadata Program access Analysis-specific work Uninstrumented program Instrumented program Precise tracking of dependences Can lead to remote cache misses for mostly read-only variables

Synchronized Updates are Expensive! Lock metadata access Program access Unlock metadata access Program access Uninstrumented program Instrumented program atomic

Synchronized Updates are Expensive! Lock metadata access Program access Unlock metadata access Program access Uninstrumented program Instrumented program atomic synchronization on every access slows programs atomic

DoubleChecker

● Dynamic atomicity checker based on conflict serializability ● Precise o Sound and unsound operation modes ● Incurs 2-4 times lower overheads ● Makes dynamic atomicity checking more practical DoubleChecker’s Contributions

Key Insights ● Avoid high costs of precise tracking of dependences at every access o Common case: no dependences  Most accesses are thread local

● Tracks dependences imprecisely o Soundly over-approximates dependences o Recovers precision when required o Turns out to be a lot cheaper Key Insights

Staged Analysis ● Imprecise cycle detection (ICD) ● Precise cycle detection (PCD)

Imprecise Cycle Detection ● Processes every program access ● Soundly overapproximates dependences, is cheap ● Could have false positives Program execution ICD atomicity specifications Imprecise cycles sound tracking

Precise Cycle Detection ● Processes a subset of program accesses ● Performs precise analysis ● No false positives PCD Precise violations Imprecise cycles access information static program locations

Staged Analyses: ICD and PCD Program execution ICD atomicity specifications Imprecise cycles PCD sound tracking Precise violations access information static program locations

ICD is Sound Program execution ICD atomicity specifications Imprecise cycles PCD sound tracking Precise violations access information true atomicity violations static program locations

Role of ICD ● Most accesses in a program are thread-local o Uses Octet 1 for tracking cross-thread dependences ● Acts as a dynamically sound transaction filter 1. M. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOPSLA, Program execution ICD atomicity specifications Imprecise cycles sound tracking

Role of PCD ● Processes transactions involved in an ICD cycle o Performs precise serializability analysis o PCD has to do much less work  Program conforming to its atomicity specification will have very few cycles PCD Precise violation Imprecise cycles access information static program locations

Different Modes of Operation ●Single-run mode ●Multi-run mode

Single-Run Mode ICD ICD cycles read/write logs Program execution ICD+PCD atomicity specifications PCD Atomicity violations

Multi-run Mode Program execution ICD+PCD Atomicity violations monitored transactions First run Second run Program execution ICD Potentially imprecise cycles atomicity specifications Static transaction information sound tracking

● Multi-run mode o Conditionally instruments non-transactional accesses  Otherwise overhead increases by 29% o Could use Velodrome for the second run  But performance is worse ● Second run has to process many accesses ● ICD is still effective as a dynamic transaction filter Design Choices

Examples ●Imprecise analysis ●Precise analysis

Imprecise Analysis time wr o.f (WrEx T1 ) Thread 1Thread 4Thread 2Thread 3 transaction

Imprecise Analysis time wr o.f (WrEx T1 ) Thread 1Thread 4Thread 2Thread 3

Imprecise Analysis time wr o.f (WrEx T1 ) Thread 1Thread 4Thread 2Thread 3 rd o.g (RdEx T2 )

Imprecise Analysis time wr o.f (WrEx T1 ) Thread 1Thread 4Thread 2Thread 3 rd o.g (RdEx T2 ) rd o.f (RdSh c )

Imprecise Analysis time wr o.f (WrEx T1 ) Thread 1Thread 4Thread 2Thread 3 rd o.g (RdEx T2 ) rd o.f (RdSh c ) rd o.h (fence)

Imprecise Analysis time wr o.f (WrEx T1 ) Thread 1Thread 4Thread 2Thread 3 rd o.g (RdEx T2 ) rd o.f (RdSh c ) rd o.h (fence) wr o.f (WrEx T1 )

Precise Analysis time Thread 1Thread 4Thread 2Thread 3 rd o.g rd o.f rd o.h wr o.f

No Precise Violation time Thread 1Thread 4Thread 2Thread 3 rd o.g rd o.f rd o.h wr o.f

ICD Cycle time wr o.f (WrEx T1 ) Thread 1Thread 4Thread 2Thread 3 rd o.g (RdEx T2 ) rd o.h (RdEx T2 ) rd o.f (RdSh c ) rd o.h (fence) wr o.f (WrEx T1 )

Precise analysis time wr o.f Thread 1Thread 4Thread 2Thread 3 rd o.g rd o.h rd o.f rd o.h wr o.f

Precise Violation time wr o.f Thread 1Thread 4Thread 2Thread 3 rd o.g rd o.h rd o.f rd o.h wr o.f

●Implementation ●Atomicity specifications ●Experiments Evaluation Methodology

Implementation ● DoubleChecker and Velodrome o Developed in Jikes RVM o Artifact successfully evaluated o Code shared on Jikes RVM Research Archive

Experimental Methodology ● Benchmarks o DaCapo 2006, 9.12-bach, Java Grande, other benchmarks used in prior work 1 ● Platform: 3.30 GHz 4-core Intel i5 processor 1. C. Flanagan et al. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, 2008.

Atomicity Specifications ● Assume provided by the programmers ● We reuse prior work’s approach to infer the specifications DoubleChecker/ Velodrome atomicity specification All methods except main(), run(), callers of join(), wait(), etc. new violations reported? Yes No considered non-atomic

Soundness Experiments ● Generated atomicity violations with o Velodrome - sound and precise o DoubleChecker  Single-run mode - sound and precise  Multi-run mode - unsound ● Results match closely for Velodrome and the single-run mode o Multi-run mode finds 83% of all violations

Performance Experiments

● Single-run mode times faster than Velodrome ● Multi-run mode o First run times faster o Second run times faster

●2-4 times lesser overhead than current state-of-art ●Makes dynamic atomicity checking more practical DoubleChecker

Related Work ● Type systems  Flanagan and Qadeer, PLDI 2003  Flanagan et al., TOPLAS 2008 ● Model checking  Farzan and Madhusudan, CAV 2006  Flanagan, SPIN 2004  Hatcliff et al., VMCAI 2004

Related Work ● Dynamic analysis o Conflict-serializability-based approaches  Flanagan et al., PLDI 2008; Farzan and Madhusudan, CAV 2008 o Inferring atomicity  Lu et al., ASPLOS 2006; Xu et al., PLDI 2005; Hammer et al., ICSE 2008 o Predictive approaches  Sinha et al., MEMOCODE 2011; Sorrentino et al., FSE 2010 o Other approaches  Wang and Stoller, PPoPP 2006; Wang and Stoller, TSE 2006

What Has DoubleChecker Achieved? ● Improved overheads over current state-of- art o Makes dynamic atomicity checking more practical ● Cheaper to over-approximate dependences o Showcases a judicious separation of tasks to recover precision