© 2010 Ippokratis Pandis Aether: A Scalable Approach to Logging VLDB 2010 Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki.

© 2010 Ippokratis Pandis Aether: A Scalable Approach to Logging VLDB 2010 Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki Carnegie Mellon University École Polytechnique Fédérale de Lausanne @ Carnegie Mellon Databases

© 2010 Ippokratis Pandis Scalability is key! Modern hardware needs software parallelism OLTP is inherently parallel at the request level Very good on providing high concurrency But, internal serializations limit execution parallelism 2 Need for scalable OLTP components

© 2010 Ippokratis Pandis Logging is crucial for OLTP Fault tolerance Crash recovery Transaction abort/rollback Performance Log changes for durability (no in-place updates) Write dirty pages back asynchronously 3 * http://www.datacenterknowledge.com/archives/2010/05/13/car-crash-triggers-amazon-power-outage/ (e.g., Amazon outage*) $$$ Need efficient and scalable logging solution

© 2010 Ippokratis Pandis Logging is bottleneck for scalability Working around the bottlenecks: Asynchronous commit Replace logging with replication and fail-over 4 (1) At commit, must yield for log flush synchronous I/O at critical path locks held for long time two context switches per commit (2) Must insert records to the log buffer centralized main-memory structure source of contention CPU-1 L1 L2 CPU-2 L1 CPU-N L1 DataLog CPU RAM HDD Workarounds compromise durability

© 2010 Ippokratis Pandis Does correct logging have to be so slow? Locks held for long time Not actually used during the flush Indirect way to enforce isolation Two context switches per commit Transactions nearly stateless at commit time Easy to migrate transactions between threads Log buffer is source of contention Log orders incoming requests, not threads Log records can be combined 5 No! Aether: uncompromised, yet scalable logging

© 2010 Ippokratis Pandis Early Lock Release in case of a single log Finish transaction Release locks before commit Insert transaction commit record Wait until log record is flushed Dependent xct serialized at the log buffer No extra overhead, idea around for 30 years …but nobody uses it so far… 8 With ELR other transactions do not wait for locks held during log flushes

© 2010 Ippokratis Pandis 11 Xct 1 Commit WorkingLog Mgr. I/O Waiting One context switch per log flush Pressure on the OS scheduler Bottleneck 2: Excessive context switching Must decouple thread scheduling from log flushes Time Xct 2 Context switch Sun Niagara T2 (64 HW contexts) Mem. resident TPC-B in Shore-MT

© 2010 Ippokratis Pandis Flush Pipelining Scheduler in the critical path and wastes CPU Multi-core HW only amplifies the problem But, transaction nearly stateless at commit Detach transaction state from worker thread Pass it to log writer Worker threads do not block at commit time 12 Thread 1 Time Xct 1 Xct 2 Thread 2

© 2010 Ippokratis Pandis Flush Pipelining Scheduler in the critical path and wastes CPU Multi-core HW only amplifies the problem But, transaction nearly stateless at commit Detach transaction state from worker thread Pass it to log writer Worker threads do not block at commit time 13 Thread 1 Time Xct 1 Xct 2 Thread 2 Log Writer Xct 3 Xct 4 Staged-like mechanism = low scheduling costs

© 2010 Ippokratis Pandis 16 Bottleneck 3: Log buffer contention Xct 1 Xct 2 Working Log Mgr.I/O Waiting Time Xct 3 Log Buffer Latch Waiting Centralized log buffer Contention, which depends on participating number of threads size of modifications (kiB in case of physical logging)

© 2010 Ippokratis Pandis Eliminating critical sections Inspiration: elimination-based backoff * Critical sections can cancel each other out E.g., stack push/pop operations 17 * D. Hendler, N. Shavit, and L. Yerushalmi. A Scalable Lock-free Stack Algorithm. In Proc. SPAA, 2004 Adapt elimination-based backoff for db logging Attempt to acquire mutex If failed, backoff waiting on a array If someone else already waits there, eliminate requests w/o acquiring mutex push() Station area Stack push() pop()

© 2010 Ippokratis Pandis Accessing the log buffer Break log insert into three logical steps (a) Reserve space by updating head LSN (b) Copy log record (memcpy) (c) Make insert visible by updating tail LSN, in LSN order Steps (a) + (c) can be consolidated Accumulate requests off the critical path Send only group leader to fight for the critical section Move (b) out of critical section 18 (a)(b) (c)

© 2010 Ippokratis Pandis Mutex held Start/finish Copy into bufferWaiting Design evolution 19 Consolidation array (C) (D) Decoupled buffer insertHybrid design (CD) (B) Baseline (D) Decoupled buffer insertHybrid design (CD) (B) Baseline contention(work) = O(1) contention(# threads) = O(1) Decouple contention from the # of threads and average log entry size

© 2010 Ippokratis Pandis Sensitivity to slot count 21 30 10 20 40 50 # Slots 12467985310 # Threads 0 60 400 800 1000 1200 1400 1600 1700 Relatively insensitive to slot count (3 or 4 slots good enough for most cases) Colors/height is throughput (in MB/s)

© 2010 Ippokratis Pandis Case against distributed logging Distributing TPC-C log records over 8 logs 1 ms wall time, ~200 in flight transactions, 30 commits Horizontal blue line = 1 log Diagonal line = dependency (new = black, older = grey) 22 Large overhead keeping track dependencies and over-flushing

© 2010 Ippokratis Pandis Putting it all together 24 Gap increases w/ # threads! Sun Niagara T2 (64 HW contexts) Mem. Resident, TPC-B +60% from Baseline Eliminate current log bottlenecks Future-proof system against contention +15%

© 2010 Ippokratis Pandis Conclusions Logging is an essential component for OLTP Simplifies recovery, improves performance without the need of physically partitioning data.. but need to address all lurking bottlenecks Aether is a holistic approach to logging Leverages existing techniques (Early lock release) Reduces context switches (Flush Pipelining) Eliminates log contention (Consolidation-based backoff) Can achieve 2GB/s of log throughput per node 25 Thank you!

© 2010 Ippokratis Pandis Aether: A Scalable Approach to Logging VLDB 2010 Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki.

Similar presentations

Presentation on theme: "© 2010 Ippokratis Pandis Aether: A Scalable Approach to Logging VLDB 2010 Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2010 Ippokratis Pandis Aether: A Scalable Approach to Logging VLDB 2010 Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki.

Similar presentations

Presentation on theme: "© 2010 Ippokratis Pandis Aether: A Scalable Approach to Logging VLDB 2010 Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki."— Presentation transcript:

Similar presentations

About project

Feedback