LOOM: Bypassing Races in Live Applications with Execution Filters Jingyue Wu, Heming Cui, Junfeng Yang Columbia University 1.

Slides:



Advertisements
Similar presentations
PEREGRINE: Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang Software.
Advertisements

Operating Systems: Monitors 1 Monitors (C.A.R. Hoare) higher level construct than semaphores a package of grouped procedures, variables and data i.e. object.
Operating Systems Part III: Process Management (Process Synchronization)
Understanding and Detecting Real-World Performance Bugs
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Pallavi Joshi  Chang-Seo Park  Koushik Sen  Mayur Naik ‡  Par Lab, EECS, UC Berkeley‡
50.003: Elements of Software Construction Week 6 Thread Safety and Synchronization.
UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Mutual Exclusion.
Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit University of Wisconsin–Madison Automated Atomicity- Violation Fixing.
1 Mutual Exclusion: Primitives and Implementation Considerations.
Concurrency.
Synchronization in Java Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Synchronization in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Parrot: A Practical Runtime for Deterministic, Stable, and Reliable threads HEMING CUI, YI-HONG LIN, HAO LI, XINAN XU, JUNFENG YANG, JIRI SIMSA, BEN BLUM,
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Synchronization CSCI 444/544 Operating Systems Fall 2008.
02/19/2007CSCI 315 Operating Systems Design1 Process Synchronization Notice: The slides for this lecture have been largely based on those accompanying.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
University of Michigan Electrical Engineering and Computer Science 1 Practical Lock/Unlock Pairing for Concurrent Programs Hyoun Kyu Cho 1, Yin Wang 2,
Discussion Week 3 TA: Kyle Dewey. Overview Concurrency overview Synchronization primitives Semaphores Locks Conditions Project #1.
Concurrency Recitation – 2/24 Nisarg Raval Slides by Prof. Landon Cox.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Programming Language C++ Xulong Peng CSC415 Programming Languages.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
What Change History Tells Us about Thread Synchronization RUI GU, GUOLIANG JIN, LINHAI SONG, LINJIE ZHU, SHAN LU UNIVERSITY OF WISCONSIN – MADISON, USA.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Quick training session.  There will always be bugs  But many bugs are easy to fix  Updating an extension takes a long time ◦ Download and install ◦
Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Ali Kheradmand, Baris Kasikci, George Candea Lockout: Efficient Testing for Deadlock Bugs 1.
StealthTest: Low Overhead Online Software Testing Using Transactional Memory Jayaram Bobba, Weiwei Xiong*, Luke Yen †, Mark D. Hill, and David A. Wood.
Diagnosing and Fixing Concurrency Bugs Credits to Dr. Guoliang Jin, Computer Science, NC STATE Presented by Tao Wang.
Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.
CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Synchronization.
Agenda  Quick Review  Finish Introduction  Java Threads.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Conditional Branch Example
Transaction Management and Concurrency Control
Atomic Operations in Hardware
Atomic Operations in Hardware
Operating Systems CMPSC 473
Chapter 5: Process Synchronization
Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang
Module 7a: Classic Synchronization
Introduction of Week 13 Return assignment 11-1 and 3-1-5
CSE 153 Design of Operating Systems Winter 19
Chapter 6: Synchronization Tools
Lecture 20: Synchronization
Problems with Locks Andrew Whitaker CSE451.
Process/Thread Synchronization (Part 2)
CSE 542: Operating Systems
Presentation transcript:

LOOM: Bypassing Races in Live Applications with Execution Filters Jingyue Wu, Heming Cui, Junfeng Yang Columbia University 1

Mozilla Bug # void js_DestroyContext( JSContext *cx) { JS_LOCK_GC(cx->runtime); MarkAtomState(cx); if (last) { // last thread?... FreeAtomState(cx);... } JS_UNLOCK_GC(cx->runtime); } 2 if (last) // return true FreeAtomState MarkAtomState A buggy interleaving Non-last Thread Last Thread bug

Complex Fix void js_DestroyContext() { if (last) { state = LANDING; if (requestDepth == 0) js_BeginRequest(); while (gcLevel > 0) JS_AWAIT_GC_DONE(); js_ForceGC(true); while (gcPoke) js_GC(true); FreeAtomState(); } else { gcPoke = true; js_GC(false); } void js_BeginRequest() { while (gcLevel > 0) JS_AWAIT_GC_DONE(); } void js_ForceGC(bool last) { gcPoke = true; js_GC(last); } void js_GC(bool last) { if (state == LANDING && !last) return; gcLock.acquire(); if (!gcPoke) { gcLock.release(); return; } if (gcLevel > 0) { gcLevel++; while (gcLevel > 0) JS_AWAIT_GC_DONE(); gcLock.release(); return; } gcLevel = 1; gcLock.release(); restart: MarkAtomState(); gcLock.acquire(); if (gcLevel > 1) { gcLevel = 1; gcLock.release(); goto restart; } gcLevel = 0; gcPoke = false; gcLock.release(); } 3 4 functions; 3 integer flags Nearly a month Not the only example

LOOM: Live-workaround Races Execution filters: temporarily filter out buggy thread interleavings 4 void js_DestroyContext(JSContext *cx) { MarkAtomState(cx); if (last thread) {... FreeAtomState(cx);... } js_DestroyContext <> self Declarative, easy to write A mutual-exclusion execution filter to bypass the race on the left

LOOM: Live-workaround Races Execution filters: temporarily filter out buggy thread interleavings Installs execution filters to live applications – Improve server availability – STUMP [PLDI ‘09], Ginseng [PLDI ‘06], KSplice [EUROSYS ‘09] Installs execution filters safely – Avoid introducing errors Incurs little overhead during normal execution 5

Summary of Results We evaluated LOOM on nine real races. – Bypasses all the evaluated races safely – Applies execution filters immediately – Little performance overhead (< 5%) – Scales well with the number of application threads (< 10% with 32 threads) – Easy to use (< 5 lines) 6

Outline Architecture – Combines static preparation and live update Safely updating live applications Reducing performance overhead Evaluation Conclusion 7

Architecture 8 LLVM Compiler LOOM Compiler Plugin Application Source LOOM Update Engine Application Binary LOOM Update Engine Buggy Application LOOM Update Engine Patched Application Execution Filter LOOM Controller Static Preparation Live Update $ llvm-gcc $ opt –load $ llc $ gcc js_DestroyContext <> self $ loomctl add

Outline Architecture – Combines static preparation and live update Safely updating live applications Reducing performance overhead Evaluation Conclusion 9

Safety: Not Introducing New Errors 10 PC Mutual Exclusion Lock Unlock Order Constraints PC Up Down PC Up Down

Evacuation Algorithm 11 1.Identify the dangerous region using static analysis 2.Evacuate threads that are in the dangerous region 3.Install the execution filter

Control Application Threads 12 1 : // database worker thread 2 : void handle_client(int fd) { 3 : for(;;) { 4 : struct client_req req; 5 : int ret = recv(fd, &req,...); 6 : if(ret <= 0) break; 7 : open_table(req.table_id); 8 :... // do real work 9 : close_table(req.table_id); 10: } 11: }

Control Application Threads (cont’d) 13 // not the final version void cond_break() { read_unlock(&update); read_lock(&update); } // not the final version void loom_update() { write_lock(&update); install_filter(); write_unlock(&update); }

Pausing Threads at Safe Locations 14 void cond_break() { if (wait[backedge_id]) { read_unlock(&update); while (wait[backedge_id]); read_lock(&update); } void loom_update() { identify_safe_locations(); for each safe backedge E wait[E] = true; write_lock(&update); install_filter(); for each safe backedge E wait[E] = false; write_unlock(&update); } cmpl 0x0, 0x845208c je 0x804b56d

Outline Architecture – Combines static preparation and live update Safely updating live applications Reducing performance overhead Evaluation Conclusion 15

Hybrid Instrumentation 16 void slot(int stmt_id) { op_list = operations[stmt_id]; foreach op in op_list do op; }

Bare Instrumentation Overhead 17 Performance overhead < 5%

Bare Instrumentation Overhead 18 Performance overhead < 5%

Scalability core machine with 4 CPUs; Each CPU has 12 cores. Pin the server to CPU 0, 1, 2, and the client to CPU 3. Performance overhead does not increase

Conclusion LOOM: A live-workaround system designed to quickly and safely bypass races – Execution filters: easy to use and flexible (< 5 lines) – Evacuation algorithm: safe – Hybrid instrumentation: fast (overhead < 5%) and scalable (overhead < 10% with 32 threads) Future work – Generic hybrid instrumentation framework – Extend the idea to other classes of errors 20

Questions? 21

Related Work Live update – most of them target general fixes either do not ensure safety or need annotation Error recovery – Not on races Instrumentation 22

Language Summary ConstructsSyntax Event (short as e)file:line file:line (expr) e{n}: n is # of occurences Region (short as r){e 1,..., e i ; e i + 1,..., e n } func(args) Mutual Exclusionr 1 <> r 2 <>... <> r n Unilateral Exclusionr <> * Execution Ordere 1 > e 2 >... > e n 23

Current practice – Performance – Backward compatability Safely live update 24

Reliability 25 Race IDMutual Exclusion TPUTRESP MySQL %0.15% MySQL %0.20% MySQL %0.32% Apache %-0.03% Apache %0.55% Race IDOverhead PBZip21.26% SPLASH2-fft0.08% SPLASH2-lu1.68% SPLASH2-barnes1.99% Data races and atomicity violations Order violations

Reliability 26 Race IDMutual ExclusionUnilateral Exclusion TPUTRESPTPUTRESP MySQL %0.15%3.28%3.37% MySQL %0.20%32.58%48.34% MySQL %0.32%0.33%0.48% Apache %-0.03%54.03%118.16% Apache %0.55%86.04%637.03% Race IDOverhead PBZip21.26% SPLASH2-fft0.08% SPLASH2-lu1.68% SPLASH2-barnes1.99% Data races and atomicity violations Order violations

Backup 27

Execution Filter Examples 1: // log.cc. thread T1 2: void MYSQL_LOG::new_file(){ 3: lock(&LOCK_log); 4:... 5: close(); // log is closed 6: open(...); 7:... 8: unlock(&LOCK_log); 9: } 1: // sql_insert.cc. thread T2 2: // [race] may return false 3: if (mysql_bin_log.is_open()){ 4: lock(&LOCK_log); 5: if (mysql_bin_log.is_open()){ 6:... // write to log 7: } 8: unlock(&LOCK_log); 9: } // Execution filter 1: unilateral exclusion {log.cc:5, log.cc:6} <> * // Execution filter 2: mutual exclusion of code {log.cc:5, log.cc:6} <> MYSQL_LOG::is_open // Execution filter 3: mutual exclusion of code and data {log.cc:5 (this), log.cc:6 (this)} <> MYSQL_LOG::is_open(this) Bug: MySQL #791 28

Hybrid Instrumentation 29

Overview 30

Execution Filter Examples 1: // log.cc. thread T1 2: void MYSQL_LOG::new_file(){ 3:... 4: close(); // log is closed 5: open(...); 6:... 7: } 1: // sql_insert.cc. thread T2 2: 3: if (mysql_bin_log.is_open()){ 4:... // write to log 5: } // Execution filter 1: unilateral exclusion {log.cc:4, log.cc:5} <> * // Execution filter 2: mutual exclusion of code {log.cc:4, log.cc:5} <> MYSQL_LOG::is_open // Execution filter 3: mutual exclusion of code and data {log.cc:4 (this), log.cc:5 (this)} <> MYSQL_LOG::is_open(this) Bug: MySQL #791 31

Timeliness 32

Safety Challenge 1 : // database worker thread 2 : void handle_client(int fd) { 3 : for(;;) { 4 : struct client_req req; 5 : int ret = recv(fd, &req,...); 6 : if(ret <= 0) break; 7 : open_table(req.table_id); 8 :... // do real work 9 : close_table(req.table_id); 10: } 11: } 12: void open_table(int table_id) { 13: // fix: acquire table lock 14:... // actual code to open table 15: } 16: void close_table(int table_id) { 17:... // actual code to close table 18: // fix: release table lock 19: } 33

Control Application Threads 34

Pause a Thread at Safe Locations void before_blocking() { atomic_inc(&counter[callsite_id]); read_unlock(&update); } void after_blocking() { read_lock(&update); atomic_dec(&counter[callsite_id]); } void cycle_check() { if (wait[stmt_id]) { read_unlock(&update); while (wait[stmt_id]); read_lock(&update); } 35

Evaluation Race IDType MySQL-791Atomicity MySQL-169Atomicity MySQL-644Atomicity Apache-21287Atomicity Apache-25520Atomicity PBZip2Order SPLASH2-fftOrder SPLASH2-luOrder SPLASH2-barnesOrder 36

Scalability core machine with 4 CPUs; Each CPU has 12 cores. Pin the server to CPU 0, 1, 2, and the client to CPU 3.

Overhead During Normal Executions 38

Long Delay in Race Fixing History – Reported on – Considered to be fixed on – Finally resolved on Race IDReportDiagnosisFixRelease Apache /15/0312/18/0301/17/0403/19/04 Apache /02/03N/A12/18/0303/19/04 MySQL-16903/19/03N/A03/24/0306/20/03 MySQL-64406/12/03N/A 05/30/04 MySQL-79107/04/03 07/14/0307/22/03 Mozilla /28/01 04/09/0105/07/01 Mozilla /07/03 04/16/0301/08/04 Mozilla /27/02 12/01/0901/21/10 Hidden in the system for more than 7 years! Totally 39 messages back and forth

Hybrid Instrumentation 40 Function Entry Loop Body Loop Body Function Exit cycle_check() Function Entry Loop Body Loop Body Function Exit cycle_check() switch Loop Body Loop Body Fast Version Slow Version

An Example: Mozilla # void js_DestroyContext() { if (last) { state = LANDING; wait for gcLevel = 0; gcPoke = true; js_GC(true); while (gcPoke) js_GC(true); FreeAtomState(); } else { gcPoke = true; js_GC(false); } void js_GC(bool last) {... MarkAtomState();... } New threadNon-last threadLast thread 41 gcPoke = TRUE state == LANDING? No gcPoke? Yes gcLevel > 0? No 1 state = LANDING wait for gcLevel == 0 gcPoke = TRUE 2 gcPoke = FALSE 3 FreeAtomState() 4 MarkAtomState() 5 Hidden in the system for years! Totally 39 messages back and forth

Problems Software update requires restarts. Conventional live update is unsafe. 42 ReportDiagnosisReleaseRestartDetect Want their code very efficient Test the fixes thoroughly before release...

Pause/Resume Application Threads 43 Function Entry Loop Body Loop Body Function Exit Function Entry Loop Body Loop Body Function Exit cycle_check() void cycle_check() { read_unlock(&update); read_lock(&update); } void loom_update() { write_lock(&update); install_filter(); write_unlock(&update); }

Pause Threads at Safe Locations 44 Function Entry Loop Body Loop Body Function Exit cycle_check() void cycle_check() { if (wait[backedge_id]) { read_unlock(&update); while (wait[backedge_id]); read_lock(&update); } void loom_update() { identify_safe_locations(); for each safe backedge E wait[E] = true; write_lock(&update); install_filter(); write_unlock(&update); for each safe backedge E wait[E] = false; }

Evaluation (Benchmarks) Race IDBug TypeApplication MySQL-169AtomicitySQL Server MySQL-644Atomicity MySQL-791Atomicity Apache-21287AtomicityHTTP Server Apache-25520Atomicity PBZip2OrderParallel BZip2 SPLASH2-fftOrderScientific computation programs SPLASH2-luOrder SPLASH2-barnesOrder 45

Evaluation (Metrics) Overhead – Does LOOM incur low overhead? Scalability – Does LOOM scale well with the number of threads? Reliability – Can LOOM fix all the races evaluated? – What are the trade-offs between reliability and performance? 46

Scalability core machine with 4 CPUs; Each CPU has 12 cores. Pin the server to CPU 0, 1, 2, and the client to CPU 3.

Problems 48 Gap Detection Diagnosis FixRestart Gap

Execution Filters 49 jscntxt.c 1 : void js_DestroyContext(JSContext *cx) { 2 : if (last thread) { 3 :... 4 : FreeAtomState(); 5 :... 6 : } else { 7 :... 8 : MarkAtomState(); 9 :... 10: } 11: } // unilateral exclusion {jscntxt.c:2; jscntxt.c:10} <> * // unilateral exclusion {jscntxt.c:2; jscntxt.c:10} <> * // mutual exclusion of code and data {jscntxt.c:2 (cx->runtime); jscntxt.c:10 (cx->runtime)} <> self // mutual exclusion of code and data {jscntxt.c:2 (cx->runtime); jscntxt.c:10 (cx->runtime)} <> self // mutual exclusion of code {jscntxt.c:2; jscntxt.c:10} <> self // mutual exclusion of code {jscntxt.c:2; jscntxt.c:10} <> self // order constraint jscntxt.c:8 (cx->runtime) > jscntxt.c:4 (cx->runtime) // order constraint jscntxt.c:8 (cx->runtime) > jscntxt.c:4 (cx->runtime) Flexible Declarative Easy to write (< 5 lines)

Problems Fixing races: complex – Performance pressure – Time-consuming testing  Applications remain vulnerable Deploying race fixes: needs restarts Live update: unsafe 50

Execution Filters: Temporarily Filter out Unwanted Interleavings 51 jscntxt.c 1 : void js_DestroyContext(JSContext *cx) { 2 : if (last thread) { 3 :... 4 : FreeAtomState(cx); 5 :... 6 : } else { 7 :... 8 : MarkAtomState(cx); 9 :... 10: } 11: } // mutual exclusion of code js_DestroyContext <> self // mutual exclusion of code js_DestroyContext <> self Declarative, easy to write