Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Texas at Austin

Similar presentations


Presentation on theme: "University of Texas at Austin"— Presentation transcript:

1 University of Texas at Austin
MetaTM & TxLinux Hany Ramadan, Christopher Rossbach, Donald Porter, Owen Hofmann, Aditya Bhandari, Emmett Witchel University of Texas at Austin Good afternoon. My name is Hany Ramadan, Today I’ll be presenting MetaTM & TxLinux, which are systems built at UT Austin by Dr. Witchel’s group. At a high level, this talk is about Transactional Memory & Operating Systems.

2 TM Background Transactional programming is an emerging alternative to locks Avoids problems such as deadlock Avoids performance-complexity tradeoffs HTM holds the promise of simpler programming and good performance I’ll begin with Transactional Memory. It is no secret that Tx Prog is a promising new technique for managing concurrency in parallel systems, with great potential compared to existing techniques such as coarse and fine-grained locking. It is very exciting to us that Programmers can be free’d from having to choose between bad performance, or programming complexity. Now I also mentioned Operating Systems, so you might wonder, what on earth could such dinosaurs have to do with TM?

3 TM: “What’s the OS got to do with it?”
Lack of realistic workloads (counter, splash-2) Will current results hold on real programs? Unclear design tradeoffs; Feature set unsettled OS is a real-life, parallel workload OS will benefit from transactions Reduces synchronization complexity System-call and interrupt control paths will benefit Architectural support is needed for OS Well, I’m glad you asked. We believe there is a mutually beneficial relationship bw TM & OS On one hand, the lack of realistic workloads has been a chronic weakness of TM research, with implications for the validity of results and the appropriateness of designs. We argue that the OS is a realistic workload. It’s a parallel program, which we all use. So it will benefit TM research to take a sustained look at the OS as a workload. In return, OS [complex sw] stand to benefit from what TM offers. To develop reltn. Additional Architectural Support is needed [rel simpl] It’s worth taking a look at this relationship in a sustained fashion. Spiderman: With great realistic workloads, comes great number of Tx.

4 Average Transaction Count
This is just to give you a rough idea of what we’re talking about It’s hard to find transaction counts, and especially rates in existing work, but this is from what we did find, because they only provide normalized results. Obviously it’s not about how long you run your experiments, but more seriously, 1) are there any functional issues or requirements that will not come out from smaller programs, and 2) Results and intuitions from small progs. hold in realistic programs?

5 Outline TxLinux MetaTM Issue: Stack memory Experimental results Goals
Features Interrupt handling Issue: Stack memory Experimental results Stack memory = To give you a flavor of the kinds of issues we had to deal with, that arise from the interaction of TxLinux & MetaTM.

6 TxLinux 2.6.16.1 Sequence locks RCU (read-copy- update) Spin-locks
Converted ~30% of dynamic synchronization to transactions Sequence locks RCU (read-copy- update) Spin-locks TxLinux is our transactional variant of the Linux kernel. It’s based on a recent 2.6 version of Linux. Kernel has ~ a dozen different sync. primitives used throughout code base. Our approach in developing TxLinux was to focus on some of the heavily used primitives, such as spinlocks, and introduce tx variants of them. We then modified components in several subsystems (such as FS, networking), to use the transactional APIs. Overall, we converted about 30% of dynamic synchronization to tx’s. We didn’t reach 100% for variety of reasons, for instance we didn’t allow transactions to do I/O. Stress that in TxLinux Tx co-exist with locks. Slab allocator Directory cache IP routing Pathname translation Socket locking Zone allocator Various MM structures File system Networking Memory management

7 MetaTM: Design goals HTM model co-designed with TxLinux
Extensions to x86 ISA Architectural support for OS Execution-driven simulation A platform for TM research Multiple HTM design points Eager & lazy version management Eager conflict detection Platform: Configurable policies, costs

8 MetaTM: Model features
xbegin xend Tx demarcation xpush xpop Multiple Tx polite karma eruption Contention management (eager) timestamp polka sizematters Here are some of the main features. Xpush & Xpop provide support for suspending transactions, I’ll give more details on them shortly. exponential linear random Backoff policy Version management commit cost (lazy) abort cost (eager)

9 TxLinux: Interrupt handling
Question: What happens to active tx on an interrupt? Interrupt handlers allowed to use transactions Factors weighing against abort Transaction length growing Interrupt frequency Answer: Active transactions are suspended on interrupt (~900 cycles) cycles (~25 kcycles) interrupts

10 MetaTM: Multiple Tx support
Multiple active transactions on a processor At most one running, all others are suspended Interface xpush suspends current transaction xpop resumes suspended transaction Suspended transactions maintained in LIFO order New execution context is unrelated to old one Same conflict semantics with all other transactions May start new transactions New exec context: Cannot see isolated state from the xpush’d transaction. This distinguishes it from various nesting paradigms.

11 Outline TxLinux MetaTM Issue: Stack memory Experimental results Goals
Features Interrupt handling Issue: Stack memory Experimental results

12 Issue: Stack memory Transactions can span stack frames
Why: Retain same flexibility as locks Problem: Live stack overwrite (correctness) Solution: Stack Pointer Checkpoint foo() { atomic } foo() { bar() baz() } bar() { xbegin } baz() { xend } We support this in MetaTM, because: Linux does this, and We didn’t want to rewrite the OS There’s another stack problem addressed in the paper. Stack included in transaction working set Why: Stack shared (e.g. interrupts, etc.) Problem: Dead stack restart (performance) Solution: Stack-based early release

13 Live stack overwrite 0xC0 locals locals 0x80 foo+8 intr state 0x40
Error: invalid return address locals foo locals foo StkPtr foo+4: call bar foo+8: <work> foo+12:xend bar+0: xbegin bar+4: ret do_irq: iret 0x80 foo+8 bar intr state do_IRQ 0x40 0x00 Tx Reg. Checkpoint PC: bar+4 StkPtr: 0x40 (other regs..) Conflict Only interrupts that arrive in kernel mode have this problem

14 Live stack overwrite, fixed
0xC0 locals foo locals foo StkPtr foo+4: call bar foo+8: <work> foo+12:xend bar+0: xbegin bar+4: ret do_irq: iret 0x80 foo+8 bar 0x40 intr state do_irq This can happen in user mode when using signals; Detailed, but this is kind of stuff that you see and must fix to boot & run an OS. 0x00 Tx Reg. Checkpoint PC: bar+4 StkPtr: 0x40 (other regs..) Conflict Fixed by setting ESP to Checkpointed ESP on interrupt

15 Outline TxLinux MetaTM Issue: Stack memory Experimental results Goals
Features Interrupt handling Issue: Stack memory Experimental results

16 Experiments Setup Workloads System characteristics Studies
Execution time Transaction rates Transaction origins Studies Contention management Commit & Abort penalties

17 Setup Simics 3.0.17 8-processor, x86 system (1 Ghz) Memory hierarchy
L1: sep D/I, 16KB, 4-way, 1-cycle hit L2: 4MB, 8-way, 16-cycle hit, MESI protocol Main memory: 1GB, 200-cycle hit Other devices Disk device (DMA, 5.5ms latency) Tigon3 gigabit nic (DMA,0.1ms latency)

18 Workloads to exercise TxLinux
counter shared counter micro- benchmark (8 threads) pmake Runs make -j 8 to compile files from libFLAC 1.1.2 netcat streams data over TCP network conn. MAB simulates software development file system workloads configure 8 instances of configure for tetex find 8 instances of find on a 78MB directory searching for text Note: Only TxLinux creates transactions

19 Kernel Execution Time We measure Kernel Execution time (since that is what we modified) Benchmarks run from 1 to 10 seconds of Kernel time, does not include Boot. Counter shows a good 2x speedup with Tx. This is similar to speedups others have reported with micro-benchmarks. On the up-side, performance did not degrade. Linux doesn’t spend a lot of time in sychronization at 8 processors. We also show proportion of CPU time in Kernel. While it differs from app to app, at least on these applications, ~ 1/2 time in kernel One more reason for allowing OS to use transaction; capabilities of processor. [Idle time in Pmake & netcat] counter %Kern. time 91% 13% 54% 57% 43% 50% High kernel time justifies transactions in the OS

20 Transaction Rates Restart Rate 2.6% 3.1% 1.7% 2.1% 10.2%
Log scale Restart Rate 2.6% 3.1% 1.7% 2.1% 10.2% Find workload has highest contention in TxLinux

21 Transaction Origins This IS the Xpush / Xpop slide. TODO HANY must mention this. We put no special effort into putting these. Interrupt handling path, which other proposals in this slide. Kernel locks accessed from both system call and interrupt handling contexts

22 Contention Management Study
counter Normalized to TIMESTAMP Counter example, how it doesn’t show variances. [Restart rate -> Perf ] Polka best, SizeMatters viable. Polka best performer, but complex to implement; SizeMatters viable Stall-on-conflict – reduces conflicts, but not always performance

23 Commit & Abort Study Commit Cost Abort Cost
Normalized Kernel Time Abort Cost Normalized kernel time (to 0 cost). Y-Axis is different. And here we see in the Abort cost, the counterintuitive result that high abort cost can function as a backoff, yielding situation where larger abort cost results in an improvement cost. Normalized Kernel Time Performance sensitive to commit penalty, not abort Confirms benefit of eager version management (fast commits)

24 Related Work TM Models Suspension techniques Interrupt handling
TCC [Hammond04], UTM [Anaian05], LogTM [Moore06], VTM [Rajwar05] Suspension techniques Escape actions [Zilles06] – can’t start tx Interrupt handling XTM [Chung06] – also tries to avoid aborts Contention management Scherer & Scott [PODC’05] – in STM context TM Models haven’t allowed OS transactions. UTM took traces from Linux 2.4 Escape actions = limited, can’t start tx XTM = multi-pronged approach, try to avoid aborts

25 Conclusions TM needs realistic workloads OS needs TM
TxLinux the largest TM benchmark OS needs TM Complex synchronization; large % of runtime Building & running TxLinux reveals much Architectural support needed (Tx suspension) Contention management is important Cost studies confirm fast commits … more in the paper


Download ppt "University of Texas at Austin"

Similar presentations


Ads by Google