Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
CAS3SH3 Midterm Review. The midterm 50 min, Friday, Feb 27 th Materials through CPU scheduling closed book, closed note Types of questions: True & False,
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
CS5270 Lecture 31 Uppaal, and Scheduling, and Resource Access Protocols CS 5270 Lecture 3.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
Previously… Processes –Process States –Context Switching –Process Queues Threads –Thread Mappings Scheduling –FCFS –SJF –Priority scheduling –Round Robin.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Mutual Exclusion.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Real-time concepts Lin Zhong ELEC424, Fall Real time Correctness – Logical correctness – Timing Hard vs. Soft – Hard: lateness is intolerable Pass/Fail.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
Chapter 2: Processes Topics –Processes –Threads –Process Scheduling –Inter Process Communication (IPC) Reference: Operating Systems Design and Implementation.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
KAUSHIK LAKSHMINARAYANAN MICHAEL ROZYCZKO VIVEK SESHADRI Transactional Memory: Hybrid Hardware/Software Approaches.
An Introduction to Software Transactional Memory
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Hardware and Software transactional memory and usages in MRE
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
Unit 4: Processes, Threads & Deadlocks June 2012 Kaplan University 1.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Unit - I Real Time Operating System. Content : Operating System Concepts Real-Time Tasks Real-Time Systems Types of Real-Time Tasks Real-Time Operating.
Big Picture Lab 4 Operating Systems C Andras Moritz
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
PHyTM: Persistent Hybrid Transactional Memory
Challenges in Concurrent Computing
Lecture 14: Reducing Cache Misses
Hardware Multithreading
Lecture 6: Transactions
Transactional Memory An Overview of Hardware Alternatives
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
Software Transactional Memory Should Not be Obstruction-Free
CS333 Intro to Operating Systems
Lecture 23: Transactional Memory
Lecture: Consistency Models, TM
CSE 542: Operating Systems
CSE 542: Operating Systems
Presentation transcript:

Transactional- Memory Real Time Systems Leeor Peled, Advanced topics Technion, December 2014

Lock-freedom Shared data that does not require mutual exclusion. –Avoid common problems as deadlocks, livelocks, priority inversion, convoying, fail- tolerance, async signal safety –Allow interruption/preemption without blocking the objects being operated upon. LF Algorithms vs LF data structures

Lock-Free Wait-Free Wait-Free bounded Synchronization Paradigms Classification: –Blocking Blocking Starvation-Free –Obstruction-Free –Lock-Free –Wait-Free Wait-Free Wait-Free Bounded Wait-Free Population Oblivious Wait-Free population oblivious

Synchronization for lawyers Starvation-Free : As long as one thread is in the critical section, then some other thread that wants to enter in the critical section will eventually succeed (even if the thread in the critical section has halted). Obstruction-Free: A function is Obstruction-Free if, from any point after which it executes in isolation, if finishes in a finite number of steps. Lock-Free: A method is Lock-Free if it guarantees that infinitely often some thread calling this method finishes in a finite number of steps. Wait-Free: A method is Wait-Free if it guarantees that every call finishes its execution in a finite number of steps. Wait-Free Bounded: A method is Wait-Free Bounded if it guarantees that every call finishes its execution in a finite and bounded number of steps. This bound may depend on the number of threads. Wait-Free Population Oblivious: A Wait-Free method whose performance does not depend on the number of active threads.

Synchronization Paradigms (2) Are lock-free algorithms completely useless in RT context? –Bounded number of retries in priority-based systems (Anderson, ’97) Hard-RT scheduler based on lock-free objects often incurs less overhead than wait-free implementation –NonBlocking serialization for RT systems (Hohmuth & Härtig ‚‘01) Implement linux kernel benchmarks with LF/WF algorithms, demonstrating RT capabilities

Alternative: Transactional Memory Originally proposed by Herlihy & Moss, ’93 –earlier idea by Knight, ’86 HW concept based on cache coherency extension –Speculative work, writes are marked in cache and can’t become external/visible until commit Upon commit, allow snoops/WB Upon abort – invalidate spec lines and rollback Reads are also marked to monitor conflicts

Example – deadlock prevention consider implementations of move(A,B, elem) –moves a single element from data structure A to B Drawbacks? Think of a linked-list Lock A Lock B A.remove(elem) B.insert(elem) Unlock B Unlock A atomic { A.remove(elem) B.insert(elem) } Non TMTM

Overflow… Way 0Way 1Way 2Way 3 store 0,[a] TX_begin store 1,[a] store 1,[b] store 1,[c] store 1,[d] store 1,[e] TX_end TX_begin ld [b+10] ld [b+20] ld [b+30] ld [b+40] ld [b+50] TX_end [a], 1, w [b], 1, w [c], 1, w[d], 1, w[a], 0, M 4-way L1 cache [e], 1 What happens if a write hits a spec/non-spec line? Other resources are also limited Assume [a]..[e] all map to the same L1 set – Limited capacity – Worse - non determinism

Software Transactional Memory Proposed by Shavit and Touitou (‘95) –Manage data structure through a SW intermediate layer –Log all reads/writes to track conflicts Enhanced in TL2 –Rely on versioned clock for commits Standalone approach or temporary solution until HW catches up?

TM flavors TM (Herlihy, Moss, ‘93) - original design, best effort SLE (Rajwar, Goodman, ’01) - simplify interface: avoid locks, no TM ISA required LTM (Ananian, ’03) - physical memory spilling by HW UTM (“) - virtual memory, context switch support, very heavy (virtualizes each line) VTM (Rajwar, Herlihy, ’05) – another unbounded flavor, virtualizes Txs like virt-mem HyTM (Moir, Sun Labs, ’05) - attempt HTM, fall back on STM. Special consideration to syncing between instances of both types. DSTM (Koomar) - similar to HyTM (although both are trying hard to deny it) TL2 (Dice, Shavit ’06) – another hybrid, very popular as baseline for others PhTM (Lev, ’07) – another hybrid, no simultaneous HW/SW Transactions USTM (Baugh, ’08) - another hybrid - user fault-on STM, with unbounded HTM based on HW memory protection TLE (Dice, ’08) – TM version of SLE TTM, LogTM, etc (Moore) Bottom line: Most of the above are still best-effort HTMs – no success (forward progress) guaranteed, some level of SW support required

HTM: Industry Trends Sun Microsystems: Rock CPU –Feat. Hybrid-TM and lots of other goodies such as spec-lookahead, OOO retirement, and a built in desk warmer (250W!). Allows mix of Tx and non-Tx code inside Tx boundaries, but retains TSO. –R.I.P as of May 2010 Azul: Vega 2/3 - “Java Compute Appliance (JCA)”. –Release 2007/8. RISC, in order, CMP (48/54 cores per die) –JVM oriented, >100k threads –Simple HTM, no regs rollbacks (rely on SW), no STM fallback AMD: Advanced Synchronization Facility (ASF) –Spec released on ISA includes Speculate/commit, locked-mov –Very resource constrained (4 atomic lines), flat nesting, also allows mix of Tx and non-Tx code inside tx boundaries, but may break x86 mem consistency. Intel: –TM compiler with HW support (HASTM based on RSM) –TSX on Haswell! Oops, sorry - only as of HSW-EX due to errata  Sun: Azul: AMD: ali.cs.umass.edu/~moss/transact-2010/public-papers/08.pdf DresdenTM.pdfhttp:// ali.cs.umass.edu/~moss/transact-2010/public-papers/08.pdfhttp://llvm.org/pubs/ EUROSYS- DresdenTM.pdf

RTTM (Schöeberl ‘10)- premise “RTTM brings the benefits of transactional memories into the real-time systems world”. Paper contributions: –Design of a time-predictable hardware transactional memory –Analysis of the worst-case number of retries in a periodic thread model –suggestions for analysis to reduce the number of possible conflicting transactions –First evaluation of RTTM on a simulation within a Java based CMP. Optimized for WCET, not avg performance Implemented on Java optimized processor(JOP)

Java optimized processor ( Schoeberl ‘07) Unlike JVM, JOP is "a RISC stack architecture”

WCET-friendly CPU Time-predictable computer Architecture, Schoeberl ‘08 –A collection of simplifications for CPU design to reduce the bounds on WCET, at small penalty to ACET/BCET –Provides some reasoning (but no concrete proof)

WCET-friendly CPU - 2 Time Division Multiple Access (TDMA) memory access scheduling (Pitter and Schoeberl, ’09, Rosen ‘07) Memory access allows a slot per core –Transactions may only start during the access window –Gap allows completion (depends on memory access time)

Memory access WCET

OS scheduling “Real Time Specification for Java” –RT threads are assigned a deadline –Scheduler is preemptive based on priority Same priority behaves like fifo –Scheduler guarantees all threads hit their deadline Estimation on blocking boundaries

RTTM - proposal Transaction buffering - fully assoc. Read set caching (tags only) Word granularity (no false conflicts) Commit in bursts –All other cores listen (conflict checks) –Protected by global lock (“commit token”) (what is the overhead for short transactions?) No aborts on overflow! Grab the commit token on the fly On true abort – mark as zombie transaction

RTTM Analysis

RTTM Analysis (2)

Preliminary analysis Possible directions –Context-sensitive points-to analysis –Static detection of race conditions –Simulation-based analysis of buffer overflows RTTM’s Analysis was based on WALA analyzer (open source from IBM, 06’)WALA

Experiment methodology Implemented over JOP simulated on JVM 3 tasks –Producer enqueues into a buffer –Consumer removes elements from its buffer –Mover atomically moves elements between Buffer types –Standard Java vector –Bounded queue

Results

STM example (Fahmy, ‘09) EDF scheduling Response time analysis –Predicted vs simulated w/ random alignments (> 1) –Utilization: task time vs period (< 1)

Bibliography Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. ISCA ‘93. J.H. Anderson, S. Ramamurthy, K. Jeffay. Real-time computing with lock-free shared objects. ACM ToCS, May ‘97 M. Hohmuth H. Härtig, Pragmatic nonblocking synchronization for real- time systems, USENIX ‘01 M. Schoeberl, F. Brandner, J. Vitek, RTTM: Real-Time Transactional Memory, SAC ’10 M. Schoeberl, A Java processor architecture for embedded real-time systems, Journal of Systems Architecture, volume 54, Jan 2008, M.Schoeberl. Time-predictable computer architecture. EURASIP J. Embedded Syst. 2009, Article 2 (January 2009) C. Pitter and M. Schoeberl. A real-time Java chip-multiprocessor. Trans. on Embedded Computing Sys., accepted for publication Manson (‘05) – Preemptible atomic regions (uni-processor)

Memory ordering rules TypeAlphaARMv7 PA- RISC POWER SPARC RMO SPARC PSO SPARC TSO x86 x86 oostore AMD64IA-64zSeries Loads reordered after loads YYYYYYY Loads reordered after stores YYYYYYY Stores reordered after stores YYYYYYYY Stores reordered after loads YYYYYYYYYYYY Atomic reordered with loads YYYYY Atomic reordered with stores YYYYYY Dependent loads reordered Y Incoherent instruction cache pipeline YYYYYYYYYY Source: