Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ross Shaull Liuba Shrira Hao Xu Lab for Experimental Software Systems Brandeis University.

Similar presentations


Presentation on theme: "Ross Shaull Liuba Shrira Hao Xu Lab for Experimental Software Systems Brandeis University."— Presentation transcript:

1 Ross Shaull rshaull@cs.brandeis.edu Liuba Shrira liuba@cs.brandeis.edu Hao Xu hxu@cs.brandeis.edu Lab for Experimental Software Systems Brandeis University An old problem: Time travel in databases Retaining past state at logical record level (ImmortalDB, Postgres) changes arrangement of current state File system-level approaches block transactions to get consistency (VSS) A new solution: Split Snapshots Integrates with page cache and transaction manager to provide disk page-level snapshots Application declares transactionally-consistent snapshots at any time with any frequency Snapshots are retained incrementally using copy-on-write (COW), without reorganizing database All applications and access methods run unmodified on persistent, on-line snapshots Achieve good performance in same manner as the database: leverage db recovery to defer snapshot writes A new problem: How to index copy-on-write split snapshots? Each snapshot needs its own page table (an SPT) which points to current-state and COW’d pages Accessing Snapshots with mapLog Indexing Split Snapshots Updating SPTs on disk would be costly, since one COW may change the pointers in multiple SPTs P1P1 P3P3 P1P1 P1P1 P2P2 P3P3 SPT 2 P1P1 P2P2 DatabaseSnapshot pages P1P1 P2P2 P3P3 SPT 1 P2P2 Order of Events: 1.Snapshot 1 declared 2.Page 1 modified 3.Snapshot 2 declared 4.Page 1 modified again 5.Page 2 modified Instead of maintaining many SPTs, append mappings to snapshot pages into a log, the mapLog (inexpensive to write) Ordering invariant: Mappings retained for snapshot X are written into mapLog before mappings retained for snapshot X+1 Construct SPT for snapshot X by scanning for first-encountered mappings (FEMs) Any page for which a mapping is not found in mapLog is still “in the database” (i.e., has not been COW’d yet) P1P1 P1P1 P2P2 P1P1 P2P2 P3P3 SPT 1 mapLog Snap 1Snap 2 Start P 2 is shared by SPT 1 and SPT 2 P 3 has not been modified so SPT 1 and SPT 2 point to P 3 into the database Skew hurts Impact of Skew Combat Skew with Skippy For faster scan, create higher-level logs of FEMs with fewer repeated mappings P1P1 P1P1 P2P2 P1P1 P1P1 P1P1 P2P2 P1P1 P3P3 P3P3 P1P1 P2P2 P1P1 P2P2 P1P1 P3P3 P3P3 Snap 1Snap 2Snap 3Snap 4Snap 5Snap 6 Solid arrows denote pointers Dotted arrows indicate copying Skippy Level 1 mapLog Start Divide mapLog into equal-sized chunks called nodes Copy each FEM in a mapLog node into Skippy Level 1 At the end of each node record an up-link that points to the next position in Skippy Level 1 where a mapping will be stored To construct Skippy Level N, recursively apply the same procedure to the previous Skippy Level When scanning, follow up-links to Skippy Levels (a Skippy scan) A Skippy scan that begins at Start(X) constructs the same SPT X as a mapLog scan 1.Let overwrite cycle length L be the number of page updates required to overwrite entire database of N pages 2.Overwrite cycle length determines the number of mappings that must be scanned to construct SPT 3.For a uniformly random workload, L = N ln N (by the “coupon collector’s waiting time” problem) 4.Skew in the update workload lengthens overwrite cycle by introducing many more repeated mappings 5.For example, skew of 80/20 (80% of updates to 20% of pages) increases L by a factor of 4 References Shaull, R., Shrira, L., and Xu, H. Skippy: a New Snapshot Indexing Method for Time Travel in the Storage Manager. SIGMOD 2008. Shrira, L., van Ingen, C., and Shaull, R. Time Travel in the Virtualized Past. SYSTOR 2007. Shrira, L., and Xu, H. Thresher: An Efficient Storage Manager for Copy-on-write Snapshots. USENIX 2006. Shrira L., and Xu, H. Snap: Efficient Snapshots for Back-In-Time Execution. ICDE 2005. Skippy: Enabling Long-Lived Snapshots of the Long-Lived Past Motivation Experimental Evaluation Skew# Skippy LevelsTime to Build SPT (s) 50/50013.8 80/20019.0 115.8 214.7 313.9 99/1033.3 16.69 Setup: 100M database 50K node (holds 2560 mappings, which is 1/10th the number of database pages) 10,000rpm disk Conclusions: Skippy could counteract 80/20 skew in 3 levels 99/1 has hot section much smaller than node size, so one level is enough Analysis Expected cost to build SPT factors in Can we create split snapshots with a Skippy index efficiently? Plot shows time to complete a single-threaded updating workload of 100,000 transactions in a 66M database with each of 50/50, 80/20, and 99/1 skews Skippy contains 5 levels (including mapLog level) We can retain a snapshot after every transaction for a 6–8% penalty acceleration cost to read sequentially at each level cost of disk seeks between each level Plot shows time to build SPT versus the number of Skippy levels for various skews Implemented in Berkeley DB (BDB) For efficiency, delay writing mapLog and Skippies to disk until checkpoint For safety, leverage existing BDB recovery for Skippy and snapshot pages


Download ppt "Ross Shaull Liuba Shrira Hao Xu Lab for Experimental Software Systems Brandeis University."

Similar presentations


Ads by Google