We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byMonserrat Holly
Modified over 2 years ago
© 2010 Ippokratis Pandis Aether: A Scalable Approach to Logging VLDB 2010 Ryan Johnson Ippokratis Pandis Radu Stoica Manos Athanassoulis Anastasia Ailamaki Carnegie Mellon University École Polytechnique Fédérale de Carnegie Mellon Databases
© 2010 Ippokratis Pandis Scalability is key! Modern hardware needs software parallelism OLTP is inherently parallel at the request level Very good on providing high concurrency But, internal serializations limit execution parallelism 2 Need for scalable OLTP components
© 2010 Ippokratis Pandis Logging is crucial for OLTP Fault tolerance Crash recovery Transaction abort/rollback Performance Log changes for durability (no in-place updates) Write dirty pages back asynchronously 3 * (e.g., Amazon outage*) $$$ Need efficient and scalable logging solution
© 2010 Ippokratis Pandis Logging is bottleneck for scalability Working around the bottlenecks: Asynchronous commit Replace logging with replication and fail-over 4 (1) At commit, must yield for log flush synchronous I/O at critical path locks held for long time two context switches per commit (2) Must insert records to the log buffer centralized main-memory structure source of contention CPU-1 L1 L2 CPU-2 L1 CPU-N L1 DataLog CPU RAM HDD Workarounds compromise durability
© 2010 Ippokratis Pandis Does correct logging have to be so slow? Locks held for long time Not actually used during the flush Indirect way to enforce isolation Two context switches per commit Transactions nearly stateless at commit time Easy to migrate transactions between threads Log buffer is source of contention Log orders incoming requests, not threads Log records can be combined 5 No! Aether: uncompromised, yet scalable logging
© 2010 Ippokratis Pandis Agenda Logging-related problems Aether logging Reducing lock contention Reducing context switching Scalable log buffer implementation Conclusions 6
© 2010 Ippokratis Pandis Bottleneck 1: Amplified lock contention 7 Xct 1 Xct 2 Done! Commit Working Lock Mgr.Log Mgr.I/O Waiting Other transactions wait for locks while the log flush I/O completes
© 2010 Ippokratis Pandis Early Lock Release in case of a single log Finish transaction Release locks before commit Insert transaction commit record Wait until log record is flushed Dependent xct serialized at the log buffer No extra overhead, idea around for 30 years …but nobody uses it so far… 8 With ELR other transactions do not wait for locks held during log flushes
© 2010 Ippokratis Pandis ELR benefits Sun Niagara T2 (64 HW contexts), 64GB RAM Mem. resident TPC-B in Shore-MT Zipfian distribution on transaction inputs 9 ELR is simple and sometimes very useful
© 2010 Ippokratis Pandis Agenda Logging-related problems Aether logging Reducing lock contention Reducing context switching Scalable log buffer implementation Conclusions 10
© 2010 Ippokratis Pandis 11 Xct 1 Commit WorkingLog Mgr. I/O Waiting One context switch per log flush Pressure on the OS scheduler Bottleneck 2: Excessive context switching Must decouple thread scheduling from log flushes Time Xct 2 Context switch Sun Niagara T2 (64 HW contexts) Mem. resident TPC-B in Shore-MT
© 2010 Ippokratis Pandis Flush Pipelining Scheduler in the critical path and wastes CPU Multi-core HW only amplifies the problem But, transaction nearly stateless at commit Detach transaction state from worker thread Pass it to log writer Worker threads do not block at commit time 12 Thread 1 Time Xct 1 Xct 2 Thread 2
© 2010 Ippokratis Pandis Flush Pipelining Scheduler in the critical path and wastes CPU Multi-core HW only amplifies the problem But, transaction nearly stateless at commit Detach transaction state from worker thread Pass it to log writer Worker threads do not block at commit time 13 Thread 1 Time Xct 1 Xct 2 Thread 2 Log Writer Xct 3 Xct 4 Staged-like mechanism = low scheduling costs
© 2010 Ippokratis Pandis Impact of Flush Pipelining 14 Sun Niagara T2 (64 HW contexts) Mem. resident TPC-B in Shore-MT Match Asynchronous Commit throughput without compromising durability
© 2010 Ippokratis Pandis Agenda Logging-related problems Aether logging Reducing lock contention Reducing context switching Scalable log buffer implementation Conclusions 15
© 2010 Ippokratis Pandis 16 Bottleneck 3: Log buffer contention Xct 1 Xct 2 Working Log Mgr.I/O Waiting Time Xct 3 Log Buffer Latch Waiting Centralized log buffer Contention, which depends on participating number of threads size of modifications (kiB in case of physical logging)
© 2010 Ippokratis Pandis Eliminating critical sections Inspiration: elimination-based backoff * Critical sections can cancel each other out E.g., stack push/pop operations 17 * D. Hendler, N. Shavit, and L. Yerushalmi. A Scalable Lock-free Stack Algorithm. In Proc. SPAA, 2004 Adapt elimination-based backoff for db logging Attempt to acquire mutex If failed, backoff waiting on a array If someone else already waits there, eliminate requests w/o acquiring mutex push() Station area Stack push() pop()
© 2010 Ippokratis Pandis Accessing the log buffer Break log insert into three logical steps (a) Reserve space by updating head LSN (b) Copy log record (memcpy) (c) Make insert visible by updating tail LSN, in LSN order Steps (a) + (c) can be consolidated Accumulate requests off the critical path Send only group leader to fight for the critical section Move (b) out of critical section 18 (a)(b) (c)
© 2010 Ippokratis Pandis Mutex held Start/finish Copy into bufferWaiting Design evolution 19 Consolidation array (C) (D) Decoupled buffer insertHybrid design (CD) (B) Baseline (D) Decoupled buffer insertHybrid design (CD) (B) Baseline contention(work) = O(1) contention(# threads) = O(1) Decouple contention from the # of threads and average log entry size
© 2010 Ippokratis Pandis Performance as contention increases 20 Microbenchmark Bimodal distribution 48B and 160B 120B average Hybrid solution combines benefits of both
© 2010 Ippokratis Pandis Sensitivity to slot count # Slots # Threads Relatively insensitive to slot count (3 or 4 slots good enough for most cases) Colors/height is throughput (in MB/s)
© 2010 Ippokratis Pandis Case against distributed logging Distributing TPC-C log records over 8 logs 1 ms wall time, ~200 in flight transactions, 30 commits Horizontal blue line = 1 log Diagonal line = dependency (new = black, older = grey) 22 Large overhead keeping track dependencies and over-flushing
© 2010 Ippokratis Pandis Agenda Logging-related problems Aether logging Reducing context switching Scalable log buffer implementation Conclusions 23
© 2010 Ippokratis Pandis Putting it all together 24 Gap increases w/ # threads! Sun Niagara T2 (64 HW contexts) Mem. Resident, TPC-B +60% from Baseline Eliminate current log bottlenecks Future-proof system against contention +15%
© 2010 Ippokratis Pandis Conclusions Logging is an essential component for OLTP Simplifies recovery, improves performance without the need of physically partitioning data.. but need to address all lurking bottlenecks Aether is a holistic approach to logging Leverages existing techniques (Early lock release) Reduces context switches (Flush Pipelining) Eliminates log contention (Consolidation-based backoff) Can achieve 2GB/s of log throughput per node 25 Thank you!
@ Carnegie Mellon Databases Data-oriented Transaction Execution VLDB 2010 Ippokratis Pandis Ryan Johnson Nikos Hardavellas Anastasia Ailamaki Carnegie.
Critical Sections: Re-emerging Concerns for DBMS Ryan JohnsonIppokratis Pandis Anastasia Ailamaki Carnegie Mellon University École Polytechnique Féderale.
Improving OLTP scalability using speculative lock inheritance Ryan Johnson, Ippokratis Pandis, Anastasia Ailamaki.
From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale.
Data recovery 1. 2 Recovery - introduction recovery restoring a system, after an error or failure, to a state that was previously known as correct have.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
25 seconds left….. 24 seconds left….. 23 seconds left…..
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
1 Term 2, 2004, Lecture 6, TransactionsMarian Ursu, Department of Computing, Goldsmiths College Transactions 3.
Fast Crash Recovery in RAMCloud Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum Stanford University 1.
Jeopardy Topic 1Topic Q 1Q 6Q 11Q 16Q 21 Q 2Q 7Q 12Q 17Q 22 Q 3Q 8Q 13Q 18Q 23 Q 4Q 9Q 14Q 19Q 24 Q 5Q 10Q 15Q 20Q 25 Final Jeopardy.
1 Memory Management Chapter 4 Basic memory management Swapping Virtual memory Page replacement algorithms.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Transaction Indra Budi Fakultas Ilmu Komputer UI 2 Exercise A series of actions to be taken on the database such that either all actions.
Jane Reid, BSc/IT DB, QMUL, 11/3/02 1 Lecture plan Transaction processing Concurrency control Recovery techniques.
© DEEDS – OS Course WS11/12 Lecture 10 - Multiprocessing Support 1 Administrative Issues Exam date candidates CW 7 * Feb 14th (Tue): * Feb 16th.
OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab,
Concurrency control 1. 2 Introduction concurrency more than one transaction have access to data simultaneously part of transaction processing.
Threads, SMP, and Microkernels Chapter 4 1. Process Resource ownership - process includes a virtual address space to hold the process image Scheduling/execution-
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 16: Recovery.
We will resume in: 25 Minutes We will resume in: 24 Minutes.
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
© 2009 EMC Corporation. All rights reserved. Data Protection: RAID Module 1.3.
1 Interprocess Communication 1. Ways of passing information 2. Guarded critical activities (e.g. updating shared data) 3. Proper sequencing in case of.
3.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Process An operating system executes a variety of programs: Batch system.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Executional Architecture Lecture Conceptual vs execution Conceptual Architecture Execution Architecture Component Connector Domain-level responsibilities.
Hardware-assisted Virtualization Operating System Practicum Carnegie Mellon University Pratik Shah (pcshah) Rohan Patil (rspatil) 1.
Chapter 5 Test Review Sections 5-1 through 5-4. Simplify each expression. 1)2) 3)4) 5) 6)
NetSlices: Scalable Multi-Core Packet Processing in User-Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee.
Cache and Virtual Memory Replacement Algorithms Presented by Michael Smaili CS 147 Spring
The Impact of Soft Resource Allocation on n-tier Application Scalability Qingyang Wang, Simon Malkowski, Yasuhiko Kanemasa, Deepal Jayasinghe, Pengcheng.
Silberschatz, Galvin and Gagne Operating System Concepts Chapter 10: Virtual Memory Background Demand Paging Process Creation Page Replacement.
Storage Manager Scalability on CMPs Ippokratis Pandis CIDR Gong Show.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 5 Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach,
The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.
So far Binary numbers Logic gates Digital circuits process data using gates – Half and full adder Data storage – Electronic memory – Magnetic memory –
Title Subtitle 1. A. B. C. C. * D. Click to try again! INCORRECT.
FIFO Queues CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Processes Management. 3.2 Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Process Management A process is a program.
Chapter 1 Introduction Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Introduction Abstract Views of an Operating System.
HyLog: A High Performance Approach to Managing Disk Layout Wenguang Wang Yanping Zhao Rick Bunt Department of Computer Science University of Saskatchewan.
1 15 Making the System Operational Lecture Activities of the Implementation and Support Phases Figure 15-1.
Recovery Amol Deshpande CMSC424. Context ACID properties: We have talked about Isolation and Consistency How do we guarantee Atomicity and Durability.
Centrifuge: Integrated Lease Management and Partitioning for Cloud Services Atul Adya,John Dunagan*,Alec Wolman* Google, *Microsoft Research 1 7th USENIX.
A. Frank - P. Weisberg Operating Systems Page Replacement Algorithms.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
© 2016 SlidePlayer.com Inc. All rights reserved.