Presentation on theme: "Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based."— Presentation transcript:
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based
Concurrent Programs obj.x = 7; find_primes(); // intrusion test if (obj.x != 7) fireMissiles() obj.x = 7; find_primes(); // intrusion test if (obj.x != 7) fireMissiles() do_stuff(); obj.x = 42; do_stuff(); obj.x = 42; Thread 1Thread 2 handle with care lock_acquire(critical_zone); lock_release(critical_zone); Deadlock Starvation Complex Program Lock-based Approaches
Transactional Memory Atomicity in the face of concurrency. Isolation from other transactions. Consistency across the whole system. Programmer: enclose instructions in a transaction. System: execute transactions concurrently, and if conflict, do something intelligent (e.g., abort, restart) obj.x = 7; find_primes(); // intrusion test if (obj.x != 7) fireMissiles() obj.x = 7; find_primes(); // intrusion test if (obj.x != 7) fireMissiles() do_stuff(); obj.x = 42; do_stuff(); obj.x = 42; x_begin(); x_finish();
Different strokes for different folks Common Case: 98% transactions fit in L1 => hardware What to do with the rest 2%? Fast… Easy conflict detection… Easy commit and abort Goal: Hide platform/resource limitations from programmers Challenges & Opportunities
VTM – Virtual Transactional Memory On overflow, use process’s virtual memory Tracking at cache-line granularity Per process state (tag and store virtual addresses) Flatten nested transactions Implemented in specialized hardware (dedicated cache, search logic, …) Drawbacks? – Modifications to hardware. Costly?
XTM – eXtended Transactional Memory “Complete TM Virtualization without complex hardware” Page table per transaction Allows arbitrary nesting – no flattening The only hardware support – raise an exception on overflow Drawbacks? – Page granularity on overflows – Potentially higher memory usage than VTM – Software commit is costlier than VTM’s hardware commit – can stall other xactions of the process
An observation Small transactions get things done in the hardware Large transactions spill the buffers and TM switches to virtual mode What about varyingly large transactions? – What if everything fits again in the buffers? – Can we switch back to hardware mode?
Towards improving virtualization Permissions-only cache – reduces the chance of overflowing buffers significantly – At the cost of a little extra hardware The already less frequent (assumed to be!) large transactions are even lesser Large transactions are serialized and handled one-at-a-time.
Do we always have only a few large transactions? For now: yes In the future: maybe not I/O and blocking system calls might wish to be atomic How do the earlier discussed approaches fare? – VTM – complex hardware – XTM – complications with OS and page granularity – OneTM – can lead to starvation!
TokenTM Uses tokens to monitor memory blocks – To read, you get a token – To write, you need to get every token Rigorous bookkeeping – blocks are tracked in caches, memory and disk Handles large transactions gracefully – Except for conflicts, transaction speed is unaffected by large transactions in other threads
TokenTM Downsides Small transactions suffer(?) – L1 cache sized transactions can work at hardware speed….BUT: Need flash-clear and flash-OR circuits in L1 cache Requires a very involved ad hoc representation …or taking a 3% overhead hit Optimizes the rare large case to the detriment of the frequent small case?
Think! Think! Think! Do we really have large transactions in future? Can virtualization techniques be used instead of TokenTM? What are the tradeoffs? In the end, hardware is faster (and cheaper) than software. But hardware is not flexible. Who wins?
Conclusion Sun Research’s Transactional Memory Spotlight: More recent proposals for “unbounded” HTM aim to overcome these disadvantages, but Sun Labs researchers came to the conclusion that the proposals were sufficiently complex and risky that they were unlikely to be adopted in mainstream commercial processor designs in the near future.