Design and Implementation Issues for Atomicity Dan Grossman University of Washington Workshop on Declarative Programming Languages for Multicore Architectures.

Slides:

Advertisements

Similar presentations

Declarative Programming Languages for Multicore Architecures Workshop 15 January 2006.

Advertisements

Analysis of Computer Algorithms

Copyright © 2002 Pearson Education, Inc. Slide 1.

Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.

Chapter 6 Structures and Classes. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-2 Learning Objectives Structures Structure types Structures.

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.

Transactional Memory Challenges: Workloads, Execution Model, and Implementation Colin Blundell & Milo M. K. Martin University of Pennsylvania {blundell,

TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.

Software Transactional Objects Guy Eddon Maurice Herlihy TRAMP 2007.

Now You C It. Now You Dont! Rob Ennals Intel Research Cambridge.

Combining Like Terms. Only combine terms that are exactly the same!! Whats the same mean? –If numbers have a variable, then you can combine only ones.

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,

Credit hours: 4 Contact hours: 50 (30 Theory, 20 Lab) Prerequisite: TB143 Introduction to Personal Computers.

CS4026 Formal Models of Computation Running Haskell Programs – power.

Dr Alwyn Barry Dr Joanna Bryson

McGill University School of Computer Science Ph.D. Candidate in the Modelling, Simulation and Design Lab MSDL09 De-/Re-constructing Model Transformation.

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.

SE-292 High Performance Computing

State Machines March 18, Compositional Systems | Summary Composition is a powerful way to build complex systems. PCAP framework to manage complexity.

David Evans cs302: Theory of Computation University of Virginia Computer Science Lecture 17: ProvingUndecidability.

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.

User Defined Functions

1 CS Tutorial 2 Architecture Document Tutorial.

1 © 1999 Citrix Systems Inc Using type information in garbage collection Tim Harris.

Addition 1’s to 20.

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Distributed DBMS©M. T. Özsu & P. Valduriez Ch.15/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.

Chapter 14 The User View of Operating Systems

Reliable and Efficient Programming Abstractions for Sensor Networks Nupur Kothari, Ramki Gummadi (USC), Todd Millstein (UCLA) and Ramesh Govindan (USC)

Multiprocessor Architectures for Speculative Multithreading Josep Torrellas, University of Illinois The Bulk Multicore Architecture for Programmability.

Two for the Price of One: A Model for Parallel and Incremental Computation Sebastian Burckhardt, Daan Leijen, Tom Ball (Microsoft Research, Redmond) Caitlin.

ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto

SEP1 - 1 Introduction to Software Engineering Processes SWENET SEP1 Module Developed with support from the National Science Foundation.

Presented by, MySQL & O’Reilly Media, Inc. Falcon from the Beginning Jim Starkey

Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.

Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.

Instruction Level Parallelism (ILP) Colin Stevens.

Advanced Compilers CSE 231 Instructor: Sorin Lerner.

Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.

1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.

Concurrency and Software Transactional Memories Satnam Singh, Microsoft Faculty Summit 2005.

PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University.

Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.

Project Proposal Implementing library support for the Virgil programming language Ryan Hall Advisor: Jens Palsberg January 23, 2007.

Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.

ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.

Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 25: May 27, 2005 Transactional Computing.

Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.

WG5: Applications & Performance Evaluation Pascal Felber

The ATOMOS Transactional Programming Language Mehdi Amirijoo Linköpings universitet.

CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.

Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.

NetTM FPGA-based processors Martin Labrecque Connections 2010.

AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.

Serialization Sets A Dynamic Dependence-Based Parallel Execution Model Matthew D. Allen Srinath Sridharan Gurindar S. Sohi University of Wisconsin-Madison.

On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)

4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:

NB-FEB: A Universal Scalable Easy- to-Use Synchronization Primitive for Manycore Architectures Phuong H. Ha (Univ. of Tromsø, Norway) Philippas Tsigas.

Squeak, Newsqueak, Obliq PL Seminar 10/24/2007. Of mice and menus Cardelli and Pike ’85: Squeak: a language for communicating with mice Proof-of-concept.

Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam

15-740/ Computer Architecture Lecture 3: Performance

Challenges in Concurrent Computing

Dan Grossman University of Washington 18 July 2006

Changing thread semantics

Design and Implementation Issues for Atomicity

Presentation transcript:

Design and Implementation Issues for Atomicity Dan Grossman University of Washington Workshop on Declarative Programming Languages for Multicore Architectures 15 January 2006

2 Atomicity Overview Atomicity: what, why, and why relevant Implementation approaches (hw & sw, me & others) 3 semi-controversial language-design claims 3 semi-controversial language-implementation claims Summary and discussion (experts are lurking)

3 Atomic An easier-to-use and harder-to-implement primitive: withLock: lock->(unit->α)->α let dep acct amt = withLock acct.lk (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt) lock acquire/release (behave as if) no interleaved execution No deadlock or unfair scheduling (e.g., disabling interrupts) atomic: (unit->α)->α let dep acct amt = atomic (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt)

4 Why better 1.No whole-program locking protocols –As code evolves, use atomic with any data –Instead of what locks to get (races) and in what order (deadlock) 2.Bad code doesnt break good atomic blocks: With atomic, the protocol is now the runtimes problem (c.f. garbage collection for memory management) let bad1() = acct.bal <- 123 let bad2() = atomic (fun()->«diverge») let good() = atomic (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt)

5 Declarative control For programmers who will see: threads & shared-memory & parallelism atomic directly declares what schedules are allowed (without sacrificing pre-emption and fairness) Moreover, implementations perform better with immutable data, encouraging a functional style

6 Implementing atomic Two basic approaches: 1.Compute using shadow memory then commit Fancy optimistic-concurrency protocols for parallel commits with progress (STMs) [Harris et al. OOPSLA03, PPoPP05,...] 2.Lock data before access, log changes, rollback and back-off on contention My research focus Key performance issues: locking granularity, avoiding unneeded locking Non-issue: any granularity is correct

7 An extreme case One extreme: One lock for all data Acquire lock on context-switch-in Release lock only on context-switch-out –(after rollback if necessary) Per data-access overhead: Ideal on uniprocessors [ICFP05, Manson et al. RTSS05] Not in atomicIn atomic Readnone Writenonelogging

8 In general Naively, locking approach with parallelism looks bad (but note: no communication if already hold lock) Not in atomicIn atomic Readlocklock, maybe rollback Writelocklock, maybe rollback, logging Active research: 1.Hardware: lock = cache-line ownership [Kozyrakis, Rajwar, Leiserson, …] 2.Software (my work-in-progress for Java): Static analysis to avoid locking Dynamic lock coarsening/splitting

9 Atomicity Overview Atomicity: what, why, and why relevant Implementation approaches (hw & sw, me & others) 3 semi-controversial language-design claims 3 semi-controversial language-implementation claims Summary and discussion

10 Claim #1 Strong atomicity is worth the cost Weak says only atomics not interleaved with each other –Says nothing about interleaving with non-atomic So: But back to bad synchronization breaking good code! Caveat: Weak=strong if all thread-shared data accessed within atomic (other ways to enforce this) Not in atomicIn atomic Readnonelock, maybe rollback Writenonelock, maybe rollback, logging

11 Claim #2 Adding atomic shouldnt change sequential meaning That is, e and atomic (fun()-> e) should be equivalent in a single-threaded program But it means exceptions must commit, not rollback! –Can have two kinds of exceptions Caveats: Tough case is input after output Not a goal in Haskell (already a separate monad for transaction variables)

12 Claim #3 Nested transactions are worth the cost Allows parallelism within atomic –Participating threads see uncommitted effects Currently most prototypes (mine included) punt here, but I think many-many-core will drive its need Else programmers will hack up buggy workarounds

13 Claim #4 Hardware implementations are too low-level and opaque Extreme case: ISA of start_atomic and end_atomic Rollback does not require RAM-level rollback! –Example: logging a garbage collection –Example: rolling back thunk evaluation All I want from hardware: fast conflict detection Caveats: –Situation improving fast (were talking!) –Focus has been on chip design (orthogonal?)

14 Claim #5 Simple whole-program optimizations can give strong atomicity for close to the price of weak Lots of data doesnt need locking: (2/3 of diagram well-known) Caveat: unproven; hopefully numbers in a few weeks Thread local Immutable Not used in atomic

15 Claim #6 Serialization and locking are key tools for implementing atomicity Particularly in low-contention situations STMs are great too –I predict best systems will be hybrids –Just as great garbage collectors do some copying, some mark-sweep, and some reference-counting

16 Summary 1.Strong atomicity is worth the cost 2.Atomic shouldnt change sequential meaning 3.Nested transactions are worth the cost 4.Hardware is too low-level and opaque 5.Program analysis for strong for the price of weak 6.Serialization and locks are key implementation tools Lots omitted: Alternative composition, wait/notify idioms, logging techniques, …

17 Plug Relevant workshop before PLDI 2006: TRANSACT: First ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Computing