What Change History Tells Us about Thread Synchronization RUI GU, GUOLIANG JIN, LINHAI SONG, LINJIE ZHU, SHAN LU UNIVERSITY OF WISCONSIN – MADISON, USA.

Slides:



Advertisements
Similar presentations
Concurrency Issues Motivation, Problems, Directions Dennis Kafura - CS Operating Systems1.
Advertisements

Understanding and Detecting Real-World Performance Bugs
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.
Test-Driven Development and Refactoring CPSC 315 – Programming Studio.
SOFTWARE MAINTENANCE 24 March 2013 William W. McMillan.
Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit University of Wisconsin–Madison Automated Atomicity- Violation Fixing.
/ PSWLAB Concurrent Bug Patterns and How to Test Them by Eitan Farchi, Yarden Nir, Shmuel Ur published in the proceedings of IPDPS’03 (PADTAD2003)
Microsoft Research Faculty Summit Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign.
Building Secure Software Chapter 9 Race Conditions.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 1 Software Test and Analysis in a Nutshell.
Applied Software Project Management Andrew Stellman & Jennifer Greene Applied Software Project Management Applied Software.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
Chapter 9 – Software Evolution and Maintenance
Applied Software Project Management Andrew Stellman & Jennifer Greene Applied Software Project Management Applied Software.
Exceptions and Mistakes CSE788 John Eisenlohr. Big Question How can we improve the quality of concurrent software?
© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
Facts and Fallacies of Software Engineering (Rob Glass) CSE301 University of Sunderland Discussed by Harry R. Erwin, PhD.
1 Specialization Tools and Techniques for Systematic Optimization of System Software McNamee, Walpole, Pu, Cowan, Krasic, Goel, Wagle, Consel, Muller,
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
Which Configuration Option Should I Change? Sai Zhang, Michael D. Ernst University of Washington Presented by: Kıvanç Muşlu.
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
Games Development 2 Concurrent Programming CO3301 Week 9.
Making Good Code AKA: So, You Wrote Some Code. Now What? Ray Haggerty July 23, 2015.
ITEC 370 Lecture 23 Maintenance. Review Questions? Project update on F, next F give prototype demonstration Maintenance –Reactive –Proactive.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies.
DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.
Unix Security Assessing vulnerabilities. Classifying vulnerability types Several models have been proposed to classify vulnerabilities in UNIX-type Oses.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Programming Fundamentals Lecture No. 2. Course Objectives Objectives of this course are three fold 1. To appreciate the need for a programming language.
+ Moving Targets: Security and Rapid-Release in Firefox Presented by Carlos Bernal-Cárdenas.
HXY Debugging Made by Contents 目录 History of Java MT Sequential & Parallel Different types of bugs Debugging skills.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov August 30, 2011.
By: Rob von Behren, Jeremy Condit and Eric Brewer 2003 Presenter: Farnoosh MoshirFatemi Jan
CAPP: Change-Aware Preemption Prioritization Vilas Jagannath, Qingzhou Luo, Darko Marinov Sep 6 th 2011.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Overview of DMLite Ricardo Rocha ( on behalf of the LCGDM team.
Soyeon Park, Shan Lu, Yuanyuan Zhou UIUC Reading Group by Theo.
CS 5150 Software Engineering Lecture 22 Reliability 3.
Embedded System Lab. 최 진 화최 진 화 Kilmo Choi 최길모 A Study of Linux File System Evolution L. Lu, A. C. Arpaci-Dusseau, R. H. ArpaciDusseau,
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
CIS-NG CASREP Information System Next Generation Shawn Baugh Amy Ramirez Amy Lee Alex Sanin Sam Avanessians.
 Software reliability is the probability that software will work properly in a specified environment and for a given amount of time. Using the following.
Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, Haryadi S. Gunawi.
Learning from Mistakes: A Comprehensive Study on Real-World Concurrency Bug Characteristics Ben Shelton.
Healing Data Races On-The-Fly
Regression Testing with its types
Chapter 18 Maintaining Information Systems
CSC 591/791 Reliable Software Systems
Threads Cannot Be Implemented As a Library
Atomic Operations in Hardware
Atomic Operations in Hardware
CS533 Concepts of Operating Systems Class 3
Effective Data-Race Detection for the Kernel
Automatic Detection of Extended Data-Race-Free Regions
Design and Programming
A Comprehensive Study on Real World Concurrency Bugs in Node.js
CS533 Concepts of Operating Systems Class 3
Chapter 8 Software Evolution.
Userspace Synchronization
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CS333 Intro to Operating Systems
Understanding Real-World Concurrency Bugs in Go
Presentation transcript:

What Change History Tells Us about Thread Synchronization RUI GU, GUOLIANG JIN, LINHAI SONG, LINJIE ZHU, SHAN LU UNIVERSITY OF WISCONSIN – MADISON, USA

Proper Thread Synchronization is Hard Very easy to get things wrong ◦Hard-to-find concurrency bugs(Data race, Atomicity Violation, Dead Lock) Hard to debug ◦Enforce complex thread scheduling between threads(GDB, LLDB) Unexpected performance degradation ◦Lock Contention

Information Beyond Bug Reports Unreported bugs, optimizations ◦Small but non-trivial concurrency bugs ◦Synchronization related performance optimizations(premature, mature?) Scattered information throughout revision history ◦How is a concurrency bug introduced? Are those locks needed when the synchronization body is created? Hidden trending information ◦Is synchronization problems(correctness, performance) getting worse when software evolves? ◦Do critical sections more likely to be modified when it ages?

Key Questions ◦How often synchronization is an afterthought for developers? ◦How many critical section related changes throughout software history? Why did developers make these changes? ◦Is synchronization related problem(correctness, performance) getting worse when software evolves? ◦Is over-synchronization a real problem? ◦How software changes lead to synchronization-related correctness problems(Concurrency Bugs)?

Impact Previous studies have looked at bug databases only. But not all the information can be obtained from bug databases. Better understanding of synchronization issues faced by real-world developers is highly needed. New Language Features – Place Holder Analysis Tools – Efficient Race Detection Run-time System – Place Holder Code Development Tools – Synchronization Adjustment

Overview An empirical study on: ◦Critical Section Evolution ◦Over Synchronizations ◦Concurrency Bug Origins

Difficulties Encountered Large revision numbers( years) ◦Apache - 26,000, Mozilla Firefox - 180,000,... Different kinds of synchronization primitives ◦Different kinds of Mutex Locks, Condition Variables Different kinds of synchronization primitive changes ◦Critical Section Getting Larger, Smaller, Synchronization Type Change, Synchronization Variable Change etc.

Contribution 1.Comprehensive empirical study on how lock-protected critical sections are changed when software evolves. 2.Detailed case study on over-synchronization issues. 3.Detailed case study on how concurrency bugs are introduced.

Methodology for Critical Section Study ◦4 representative C/C++ open source projects ◦Hierarchical taxonomy for all critical section changes ◦Regular-expression based Python scripts for structural patterns ◦Manual study on 500+ cases found by the scripts for purpose patterns

Software 4 representative C/C++ open-source software projects including data base, web server, web browser, media player ApplicationsPeriod# of RevisionsLatest Version Size(LoC) Apache Web Server K Mozilla Browser M MPlayer Media Player K MySQL Database M

Critical Section Change Structural Patterns Adding critical sections: ◦Add Synchronization /* Empty the accept queue of completion contexts */ + apr_lock_acquire(qlock); while (qhead) { CloseHandle(qhead->Overlapped.hEvent); closesocket(qhead->accept_socket); qhead = qhead->next; } + apr_lock_release(qlock); ◦Add All + pthread_mutex_lock(&rli->log_space_lock); + rli->log_space_total -= rli->relay_log_pos; + pthread_mutex_unlock(&rli->log_space_lock);

Critical Section Change Structural Patterns Removing critical sections: ◦Remove Synchronization - pthread_mutex_lock(&THR_LOCK_keycache); + /* pthread_mutex_lock(&THR_LOCK_keycache); */ pthread_mutex_lock(&LOCK_status); for (i=0; variables[i].name; i++) { … pthread_mutex_unlock(&LOCK_status); - pthread_mutex_unlock(&THR_LOCK_keycache); + /* pthread_mutex_unlock(&THR_LOCK_keycache); */ ◦Remove All wait_for_refresh(thd); } - pthread_mutex_lock(&thd->mysys_var->mutex); - thd->mysys_var->current_mutex=0; - thd->mysys_var->current_cond=0; - pthread_mutex_unlock(&thd->mysys_var->mutex); return result; }

Critical Section Change Structural Patterns Modifying existing critical sections ◦Modify Body ◦Modify Synchronization(Variable, Primitive, Boundary, Split, Unlock) - VOID(pthread_mutex_lock(&hostname_cache->lock)); + VOID(pthread_mutex_lock(&LOCK_hostname)); if (!(hp=gethostbyaddr((char*) in,sizeof(*in), AF_INET))) { - VOID(pthread_mutex_unlock(&hostname_cache->lock)); + VOID(pthread_mutex_unlock(&LOCK_hostname)); DBUG_PRINT("error",("gethostbyaddr returned %d",errno)); goto err; }

Observations ApacheMozillaMPlayerMySQL Add Rem Mod Total # AddRemMod Percentage22%21%56%

Observations

How many changes for each critical section? ApacheMozillaMPlayerMySQL Mod-Body Mod-Sync Total Mod-BodyMod-Sync Percentage80%50%

Observations

MySQL Critical section change number Line of code change number

Critical Section Change Purpose Patterns PurposePatterns CorrectnessFixing functional bugs FunctionalityAdding or changing code functionality MaintainabilityCode refactoring PerformanceImproving performance RobustnessAdding sanity checks, logs.

Observations

Mozilla

Over Synchronization Study ◦What is over synchronization? ◦Unnecessary synchronizations that will cause performance degradation ◦Why do we care about it ? ◦Performance considerations ◦Code maintenance considerations

Methodology and Threats to Validity Randomly picked up 20 cases from Mod Sync Variable, Mod Sync Split, Mod Sync Boundary that fix/relieve over synchronization issues. They do not cover all over-synchronization issues fixed by these three types of changes nor other over-synchronization issues fixed by other changes.

Observations +static pthread_mutex_t LOCK_hostname;... DBUG_RETURN(0); // out of memory #else - VOID(pthread_mutex_lock(&hostname_cache->lock)); + VOID(pthread_mutex_lock(&LOCK_hostname)); if (!(hp=gethostbyaddr((char*) in,sizeof(*in), AF_INET))) { - VOID(pthread_mutex_unlock(&hostname_cache->lock)); + VOID(pthread_mutex_unlock(&LOCK_hostname)); DBUG_PRINT("error",("gethostbyaddr returned %d",errno)); goto err; }

Observations Where to Apply the Changes? ◦Critical sections with highly contended locks How to Conduct Mod Sync Variable? ◦Select a new lock ◦Replace locks with existing ones ◦Among 11 cases under study, original locks are replaced by new ones

Concurrency Bug Origins

Methodology and Threats to Validity Manual studies on 40 bugs from two bug sources: Bug Source One: 28 real-world concurrency bugs coming from more than 10widely used C/C++ software projects, repeated and evaluated in 12 recent concurrency-bug related papers. Bug Source Two: 12 randomly sampled concurrency bugs from our critical-section change study.

Taxonomy Our categorization is based on three key ingredients: 1.Shared variables 2.Instructions that accessing these shared variables 3.Synchronization contexts(surrounding locks, preceding barriers)

Observations Categorization of how concurrency bugs are introduced: Type 1: One Thread, New Instructions in Old Context Type 2: One Thread, New Instructions in New Context

Observations Categorization of how concurrency bugs are introduced: Type 3: Multiple Threads, New Variables Accessed in Old Contexts Type 4: Multiple Threads, New Instructions in New Contexts

Discussion Understanding how concurrency bugs are introduced can helps us in: ◦Improving the performance and accuracy of existing concurrency-bug analysis techniques ◦Synchronization analysis ◦Memory-access analysis ◦Concurrency Bug Prevention

Conclusions