Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.

Similar presentations


Presentation on theme: "Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture."— Presentation transcript:

1 Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture Fall 2006- Prof. Burger

2 Motivation What would a realistic HTM system actually support? (primitives/design choices) Current Transactional Memory proposals make architectural design choices with inadequate information: –shared counter, linked list benchmarks –focus on user mode: avoids OS issues

3 HTM + OS: are you nuts? Large concurrent program with complex data access patterns Complex code: simplify programming model Many apps spend a lot of time in kernel Diverse synchronization primitives –spinlocks, semaphores, per-CPU variables, RCU, seqlocks, completions, mutexes

4 Our HTM System Basic primitives: –xbegin, xend OS-specific primitives: –xpush, xpop –stack management: interrupts on x86 re-use stack Configurable Hardware Parameters –Conflict detection granularity –Commit & abort penalties –Overflow costs Configurable contention management –Conflict resolution policies: which tx restarts? –Backoff policies: how long to wait before restart

5 An Issue Unique to an OS: Using transactions in interrupt handlers 0x10 0x20 0x30 0x40 TX #1 { 0x10 } system_call() { XBEGIN modify 0x10 XEND } intr_handler() { XPUSH XBEGIN modify 0x30 XEND XPOP } No tx in interrupts TX #1 { 0x10 } TX #2 { 0x30 } Interrupts abort active tx TX #1 { 0x10, 0x30 } Nest the transactions TX #1 { 0x10 } TX #2 { 0x30 } Multiple active transactions TX #1 { 0x10 } interrupt

6 Converting Linux to TxLinux TxLinux based on kernel 2.6.16.1 Converted “core” primitives to use transactions –spin-locks, RCU primitives, r/w locks –critical sections become transactions Converted high traffic subsystems –memory allocators, FS directory cache, mapping addresses to pages data structures, memory mapping files into address spaces, ip routing, and socket locking Modified interrupt-handling code to use primitives in our HTM model (xpush, xpop)

7 HTM Implementation Implemented HTM model as x86 extensions Simulation environment –Simics 3.0.17 machine simulator –transactional L1 cache (variable: 4k-32k) –4MB L2 ; 1GB RAM –1 cycle/instruction, 16 cycle/L1 miss, 200 cycle/L2 miss –4 & 8 processors

8 Experimental Setup Benchmarks –micro: kernalloc, Counter, directory cache “punisher” –macro: pmake, netcat, MAB, configure, find Measurements –Execution time –Transactions statistics: created/restarted/overflowed, working sets, footprint –Cache statistics (e.g. miss rate) Variables –Contention management (conflict/backoff policies) –Transactional cache size –Commit, abort, overflow penalties –Conflict granularity (byte vs. word vs. cache line)

9 TxLinux Results (4 processors) Performance change minimal, lots of transactions Unique Transaction restarts were < 0.07% Data cache miss rates do not change appreciably Transactions Created 105,972425,888475,8601,810,6021,408,610243,934

10 Contention Management Matters! linear back off policy, 4 processors

11 Conclusions TxLinux is cooler than, and has comparable performance to Linux Cache line granularity is good enough 16KB Transactional cache covers the vast majority of transactions Best contention management policy is workload dependent. Exponential back off is too conservative

12 Backup Slides

13 Contention Management Restart Rates

14 Conflict Granularity & Backoff Policy

15 Stack Management Issue Treating the Stack as a shared resource –Checkpoint –Partition

16 Tx’l Memory Allocator Investigation Examine Tx complexity/performance trade-off The “slab” is the default Kernel memory allocator –Highly tuned for performance –Avoids contention/locks, uses per-CPU structures –About ~3,880 lines of code The “slob” is a drop-in replacement –Designed for minimal bookkeeping memory overhead –Uses two coarse-grained locks (386 lines) The “slob-opt” is “slob” with modifications –Removed “obvious” transaction bottlenecks –Only a couple of dozen lines of code changed

17 Tx’l Memory Allocator Results (4 proc) KernallocPmakeMABconfigureFind slab1.413.98.014.11.8 0%0.04%0.07%0.04%0% slob-14.321.316.31.8 -1.78%19.72%5.93%0.71% slob- optimized 16.714.112.714.91.8 18.17%0.45%8.48%1.42%0.12% Execution time (in seconds) Unique restarts

18 Transactional Memory Issues Hardware vs. Software –Different interfaces –strong (HW) vs. weak (SW) atomicity Will transactions make programming easier? Transactions for blocking primitives? Using transactions for security?


Download ppt "Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture."

Similar presentations


Ads by Google