An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
PEREGRINE: Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang Software.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Monitoring Data Structures Using Hardware Transactional Memory Shakeel Butt 1, Vinod Ganapathy 1, Arati Baliga 2 and Mihai Christodorescu 3 1 Rutgers University,
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
Detecting and surviving data races using complementary schedules
Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit University of Wisconsin–Madison Automated Atomicity- Violation Fixing.
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
Designing a thread-safe class  Store all states in public static fields  Verifying thread safety is hard  Modifications to the program hard  Design.
An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Microsoft Research Faculty Summit Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign.
Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn University of Michigan, Ann Arbor Respec: Efficient.
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University.
Synchronization in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
K. Rustan M. Leino RiSE, Microsoft Research, Redmond joint work with Peter Müller and Jan Smans Lecture 0 1 September 2009 FOSAD 2009, Bertinoro, Italy.
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
University of Michigan Electrical Engineering and Computer Science 1 Practical Lock/Unlock Pairing for Concurrent Programs Hyoun Kyu Cho 1, Yin Wang 2,
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Synchronization (Barriers) Parallel Processing (CS453)
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh
- 1 - Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor Chimera: Hybrid Program Analysis for Determinism * Chimera.
LOOM: Bypassing Races in Live Applications with Execution Filters Jingyue Wu, Heming Cui, Junfeng Yang Columbia University 1.
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
What Change History Tells Us about Thread Synchronization RUI GU, GUOLIANG JIN, LINHAI SONG, LINJIE ZHU, SHAN LU UNIVERSITY OF WISCONSIN – MADISON, USA.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.
Ali Kheradmand, Baris Kasikci, George Candea Lockout: Efficient Testing for Deadlock Bugs 1.
Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang.
Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University.
Detecting Atomicity Violations via Access Interleaving Invariants
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Atom-Aid: Detecting and Surviving Atomicity Violations Brandon Lucia, Joseph Devietti, Karin Strauss and Luis Ceze LBA Reading Group 7/3/08 Slides by Michelle.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
Soyeon Park, Shan Lu, Yuanyuan Zhou UIUC Reading Group by Theo.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Optimistic Hybrid Analysis
Lecture 20: Consistency Models, TM
Healing Data Races On-The-Fly
Concurrency 2 CS 2110 – Spring 2016.
Maple: A Coverage-Driven Testing Tool for Multithreaded Programs
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Specifying Multithreaded Java semantics for Program Verification
Automatic Detection of Extended Data-Race-Free Regions
Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang
Changing thread semantics
Lecture 22: Consistency Models, TM
Background and Motivation
Memory Consistency Models
Dynamic Verification of Sequential Consistency
Lecture: Consistency Models, TM
Concurrent Cache-Oblivious B-trees Using Transactional Memory
Presentation transcript:

An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan

Why is Parallel Programming Hard? Is single-threaded programming relatively easy? – Verification is NP-hard – BUT, properties such as a function’s pre/post-conditions, loop invariants are verifiable in polynomial time Parallel programming is harder – Verifying properties for even small code regions is NP-hard – Reason: Unbounded number of legal thread interleavings exposed to the parallel runtime – Impractical to test/verify properties for all legal interleavings

Legal Thread Interleavings Too much freedom given to parallel runtime? Tested Correct Interleavings Incorrect interleavings found during testing Incorrect interleavings eliminated by adding synchronization constraints Untested interleavings - cause for concurrency bugs

Solution : Limit Freedom Programmer tests as many legal interleavings as practically possible Interleaving constraints from correct test runs are encoded in the program binary Runtime System Avoids Untested Interleavings i.e. avoid corner cases

Result of Constraining Interleavings A majority of the concurrency bugs are avoidable – Data races, atomicity violations, and also order violations Performance overhead is low – Untested interleavings in well-tested programs are likely to manifest rarely – Processor support helps reduce the cost of enforcing interleaving constraints

Challenges How to encode tested interleavings in a program’s binary? – Predecessor Set (PSet) interleaving constraints How to efficiently enforce interleaving constraints at runtime? Detect violations of PSet constraints using processor support Avoid violations by stalling or using rollback-and-re-execution support

Outline Overview Encoding and Enforcing tested interleavings – Predecessor Set (PSet) Interleaving Constraints – Processor Support Results Conclusion

Encoding Tested Interleavings Interleaving Constraints from Test Runs – Too specific to a test input  Performance loss for a different input – Too generic  Might allow untested interleavings Predecessor Set (Pset) – PSet(m) defined for each static memory operation m – pred  PSet( m ), if m is immediately and remotely memory dependent on pred in at least one tested execution

A Test Run Thread 1Thread 2Thread 3 R2 W1 R1 R3 W2 R4 W3 { W1 } { } { W1 } { W2 } { } { R3, R4 } PSet(W1) = {} PSet(R1) = {} PSet(R2) = {W1} PSet(R3) = {W1} PSet(R4) = {} PSet(W2) = {R3,R4} PSet(W3) = {W2} R2 R4 W1

Enforcing Tested Interleaving Processor support for detecting and avoiding PSet constraints Detecting PSet constraint violations – For each memory location, track its last accessor Cache extension – Detect PSet constraint violation Piggyback cache coherence reply with last accessor Processor executes PSet membership test by executing additional micro- ops Overcoming a PSet Constraint violation – Stall – Re-execute using checkpoint-and-rollback support E.g. SafetyNet, ReVive, etc.

Two Case Studies Case Study 1 – An Atomicity Violation Bug in MySQL – Avoided using stall Case Study 2 – An order violation bug in Mozilla neither a data race nor an atomicity violation – Avoided using rollback and re-execution

Two Case Studies Case Study 1 – An Atomicity Violation Bug in MySQL – Avoided using stall Case Study 2 – An order violation bug in Mozilla neither a data race nor an atomicity violation – Avoided using rollback and re-execution

An Atomicity Violation Bug in MySQL MYSQL_LOG::new_file() { … close(); open(…); … } mysql_insert(…) { … if (log_status != LOG_CLOSED) { // write into a log file } … } … log_status = LOG_CLOSED; … log_status = LOG_OPEN; … Thread 1 sql/log.ccsql/sql_insert.cc W2 W1 R1 Thread 2

Correct Interleaving #1 -- “frequent”, therefore likely to be tested Thread 1Thread 2 log_status = LOG_CLOSED log_status = LOG_OPEN W2 log_status != LOG_CLOSED ? W1 R1 { R1 } { } PSet(W1) = {R1} PSet(W2) = {} PSet(R1) = {}

Correct Interleaving #2 -- “frequent”, therefore likely to be tested Thread 1Thread 2 log_status = LOG_CLOSED log_status = LOG_OPEN W2 log_status != LOG_CLOSED ? W1 R1 { R1 } { } { W2 } PSet(W1) = {R1} PSet(W2) = {} PSet(R1) = {W2}

log_status != LOG_CLOSED ? Incorrect Interleaving -- rare, and therefore likely to be untested Thread 1Thread 2 log_status = LOG_CLOSED log_status = LOG_OPEN W2 W1 R1 { R1 } { } { W2 } Constraint Violation

Two Case Studies Case Study 1 – An Atomicity Violation Bug in MySQL – Avoided using stall Case Study 2 – An order violation bug in Mozilla neither a data race nor an atomicity violation – Avoided using rollback and re-execution

Correct Test Run TimerThread::Run() {... Lock(lock); mProcessing = TRUE; while (mProcessing) {... mWaiting = TRUE; Wait(cond, lock); mWaiting = FALSE; } Unlock(lock);... } TimerThread.cpp TimerThread::Shutdown() {... Lock(lock); mProcessing = FALSE; if (mWaiting) Notify(cond, lock); Unlock(lock);... mThread->Join(); return NS_OK; } TimerThread.cpp mWaiting = TRUE if (mWaiting) ? Thread 1Thread 2 W R W R { } { W } PSet(W) = {} PSet(R) = {W}

Avoiding Order Violation TimerThread::Run() {... Lock(lock); mProcessing = TRUE; while (mProcessing) {... mWaiting = TRUE; Wait(cond, lock); mWaiting = FALSE; } Unlock(lock);... } TimerThread.cpp TimerThread::Shutdown() {... Lock(lock); mProcessing = FALSE; if (mWaiting) Notify(cond, lock); Unlock(lock);... mThread->Join(); return NS_OK; } TimerThread.cpp mWaiting = TRUE if (mWaiting) ? W R Thread 1Thread 2 W R { } { W } Constraint Violation Rollback

Outline Overview Encoding and enforcing tested interleavings – Predecessor Set (PSet) – Processor Support Results Conclusion

Methodology Pin based analysis 17 documented bugs analyzed – MySQL, Apache, Mozilla, pbzip, aget, pfscan + Parsec, Splash for performance study Applications tested using regression test suites when available or random test input

PSet Constraints from Test Runs Concurrent workload – MySQL: run regression test suite in parallel with OSDB – FFT, pbzip2: random test input

Bug Avoidance Capability 17 bugs from MySQL, Apache, Mozilla, pbzip, aget, pfscan 15/17 bugs avoided by enforcing PSet contraints – Including a bug that is neither a data race nor an atomicity violation bug 2/17 false negatives – a multi-variable atomicity violation – a context sensitive deadlock bug 6 bugs are avoided using stalling mechanism. Other require rollback mechanism.

PSet violations in Bug Free Execution 2 PSet constraint violations in MySQL not avoided – MySQL, bmove512 unrolls a loop 128 times

PSet Size of Instructions  Over 95% of the inst. have PSets of size zero  Less than 2% of static memory inst. have a PSet of size greater than two

Summary Multi-threaded programming is hard – Existing shared-memory programming model exposes too many legal interleavings to the runtime – Most interleavings remain untested in production code Interleaving constrained shared-memory multiprocessor – Avoids untested (rare) interleavings to avoid concurrency bugs Predecessor Set interleaving constraints – 15/17 concurrency bugs are avoidable – Acceptable performance and space overhead

Thanks Q & A

Memory Space Overhead ProgramApp. Size # PSet Pairs Overhead w.r.t App. Pbzip239KB % Aget90KB % Pfscan17KB % Apache2435KB % MySQL4284KB % FFT24KB % FMM73KB % LU24KB % Radix21KB % Blackscholes54KB410.32% Canneal59KB % Space Overhead  In the worst case, 10% code size increase