CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
A Lock-Free Multiprocessor OS Kernel1 Henry Massalin and Calton Pu Columbia University June 1991 Presented by: Kenny Graunke.
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Cpr E 308 Spring 2004 Recap for Midterm Introductory Material What belongs in the OS, what doesn’t? Basic Understanding of Hardware, Memory Hierarchy.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.
Paul E. McKenney, IBM Linux Technology Center
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan,
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
CS533 - Concepts of Operating Systems 1 CS533 Concepts of Operating Systems Class 8 Synchronization on Multiprocessors.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Introduction to Embedded Systems
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
View-Oriented Parallel Programming for multi-core systems Dr Zhiyi Huang World 45 Univ of Otago.
The Linux Kernel: A Challenging Workload for Transactional Memory Hany E. Ramadan Christopher J. Rossbach Emmett Witchel Operating Systems & Architecture.
QoS-enabled Tree-based Distributed Mutexes James Edmondson Brian Sulcer.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
WG5: Applications & Performance Evaluation Pascal Felber
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Debugging Threaded Applications By Andrew Binstock CMPS Parallel.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Adaptive Software Lock Elision
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
James Larus and Christos Kozyrakis
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
By Michael Greenwald and David Cheriton Presented by Jonathan Walpole
PHyTM: Persistent Hybrid Transactional Memory
Håkan Sundell Philippas Tsigas
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.
Faster Data Structures in Transactional Memory using Three Paths
Challenges in Concurrent Computing
Changing thread semantics
Lecture 6: Transactions
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Software Transactional Memory Should Not be Obstruction-Free
CSCI1600: Embedded and Real Time Software
Operating Systems : Overview
Operating Systems : Overview
Why Threads Are A Bad Idea (for most purposes)
CSCI1600: Embedded and Real Time Software
Lecture 23: Transactional Memory
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Presentation transcript:

CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory

Why Do Concurrent Programming? Hardware has been forced down the path of concurrency: – can’t make cores much faster – but we can have more of them Concurrent programming is required for scalable performance of software on many- core platforms – synchronization performance and scalability has become a critical issue

Lock-Based Synchronization Pessimistic concurrency control Simple approach – identify shared data – associate locks with data – acquire before use – release after use Enforces mutual exclusion – why? – is this scalable?

Why Is Locking Tricky? Lock contention, especially for non- partitionable data Difficulty in partitioning data Lock overhead Inter-thread dependencies

Non-Blocking Synchronization Lock-free “optimistic” concurrency control Charge ahead, and if necessary roll-back and retry With sufficient programming effort can support fault tolerance properties – wait-freedom – obstruction-freedom, etc Avoids the need to access locks and data

Problems With Non-Blocking Synchronization Programming complexity Heavy use of atomic instructions Useless parallelism Excessive retries or helping under contention Excessive memory contention under heavy load

Strengths of Locking Simple and elegant No special hardware needed Wide-spread support Localized contention effects Power-friendly Good interaction with debuggers and other synchronization mechanisms Supports I/O and non-idempotent operations Ability to support disjoint access parallelism (with sufficient effort)

Problems and Their Solutions Lock contention – partition data, redesign algorithms (if you can!) – non-partitionable data structures are a problem Lock overhead – use RCU for readers – don’t over-partition (but this conflicts with lock contention solution!) – update-heavy workloads are a problem

Problems and Their Solutions Deadlock among threads and among interrupt handlers and threads – locking hierarchy – try lock primitive with surrender and retry – detection and recovery – RCU for readers – special primitives to combine lock acquisition and interrupt disabling

Problems and Their Solutions Priority inversion – priority inheritance – RCU for readers Convoying – synchronization-aware scheduling – RCU for readers

Problems and Their Solutions Composability – expose synchronization at the interface level Thread failures – reboot

Transactional Memory Simple and elegant Enforces atomicity and isolation Composable Generally non-blocking – can be implemented in pessimistic form Automatic disjoint access parallelism – but doesn't help you ensure accesses are disjoint! Deadlock free Limited hardware support is possible

Transactional Memory Weaknesses Hardware support non-portable – need portable transaction virtualization support Hardware generally unavailable – need to wait for industry to provide it Software implementations have poor performance – solutions tend to weaken properties

Transactional Memory Weaknesses Does not support I/O, non-idempotent operations, or distributed systems – use pessimistic concurrency control – reintroduces problems of locking Contention management – need good contention management policy – and integration with scheduler – could use relativistic reads Privatization support sacrifices isolation

Transactional Memory Weaknesses High overhead – use TM for heavy weight operations Debugging – break points cause unconditional aborts in HTM – need better integration between HTM and STM

Scenario Best Technique Why? Partitionable data structuresLockingDisjoint Access Parallelism Large Non Partitionable data structures TMAutomatic Disjoint Access Parallelism Read Mostly Situations Locking/TM with Hazard Pointers/ RCU Readers Scalable Update Heavy SituationsTMWriters Scalable Complex fine grain locking design, No clear lock hierarchy exists TMDeadlock Avoidance Atomic operations spanning multiple independent data structures, eg pop from one stack and push to another TMComposability Single threaded software having embarrassingly parallel core containing only idempotent operations TM Performance benefits without much programming effort Non Idempotent OperationsLockingSupportability of non idempotent operations. Large Critical SectionsLockingLock acquisition cost small compared to retry Commodity HardwareLocking Commodity HW suffices. HTM requires specialized H/W and depends on cache geometry details. Else performance limited by STM

Conclusion Use the right tool for the job Understand all the techniques, their strengths and weaknesses, and potential interactions