Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Transaction-Time DBMS Performance and Functionality David Lomet Microsoft Research Feifei Li Florida State University.

Similar presentations


Presentation on theme: "Improving Transaction-Time DBMS Performance and Functionality David Lomet Microsoft Research Feifei Li Florida State University."— Presentation transcript:

1 Improving Transaction-Time DBMS Performance and Functionality David Lomet Microsoft Research Feifei Li Florida State University

2 Immortal DB: A Transaction-Time DB What is Transaction-Time DB? – Retains versions of records Current and prior database states – Supports temporal based access to these versions Using transaction time Immortal DB Goals – Performance close to unversioned DB – Full indexed access to history – Explore other functionality based on versions History as backup Bad user transaction removal Auditing

3 Prior Publications SIGMOD04: demod and demo paper ICDE04: initial running system described SIGMOD06: removing effects of bad user transactions ICDE08: indexing with version compression ICDE09: performance and functionality

4 Talk Outline Immortal DB: a transaction time database Update Performance: timestamping – Timestamping is main update overhead – Prior approaches – Our new approach – Update performance results Support for auditing – What do we provide – Exploiting timestamping implementation Range Read Performance: new page splitting strategy – Storage utilization determines range read performance – Prior split strategy guaranteeing as off version utilization – Our new approach – Storage utilization results

5 Timestamping & Update Performance Timestamp not known until commit – Fixing it to early leads to aborts Requires 2 nd touch to add TS to record – 1 st for update when TS not known – 2 nd for adding TS when known TID:TS mapping must be stable until all timestamping completes and is stable Biggest single extra cost for updates

6 Prior Timestamping Techniques Eager timestamping – As a 2 nd update during transaction – Delays commit, ~doubles update Lazy Timestamping – several variations – Replace Transaction ID (TID) with timestamp (TS) lazily after commit; but this requires … – Persisting (TID:TS) mapping Trick is in handling this efficiently Most prior efforts updated Persistent Transaction Timestamp Table (PTT) at commit with TID:TS mapping We improve on this part of process

7 Lazier Timestamping Log TID:TS PTT TID:TS Commit record: with TID:TS TID:TS posted to log at commit Main Memory Vol. ts table(VTT) TID:TS: ref cnt TID:TS batch write from VTT to PTT at chkpt Timestamping activity Based mostly on VTT Removes VTT entries When TSing complete Ref cnt = 0 and stable TS added at commit Only TID:TS with unfinished TSing

8 Timestamping Experiment Each record is 200 bytes The database is initialized with 5,000 records Generate workload containing up to 10,000 transactions Each transaction is an insert or an update (to a newly inserted record by another transaction) One checkpoint every 500 transactions Cost metrics: – Execution time – Number of writes to PTT – Number of batched updates

9 Execution Time 50% PTT batch inserts 20% PTT batch inserts Unversioned Prior TS method unbatched 100% PTT batch inserts IMPORTANT: Simple ONE UPDATE Transaction Expected result is less than 20% case

10 Talk Outline Immortal DB: a transaction time database Update Performance: timestamping – Timestamping is main update overhead – Prior approaches – Our new approach – Update performance results Support for auditing – What do we provide – Exploiting timestamping implementation Range Read Performance: new page splitting strategy – Storage utilization determines range read performance – Prior split strategy guaranteeing as off version utilization – Our new approach – Storage utilization results

11 Adding Audit Support Basic infrastructure only – Too much in audit to try to do more – For every update, who did it and when Technique – Extend PTT schema to include User ID (UID) – Always persist this information No garbage collection – Timestamping technique permits batch update to PTT TID:TS:UID PTT

12 What does it cost? 50% PTT batch inserts 20% PTT batch inserts Unversioned Prior TS method unbatched 100% PTT batch inserts Audit Mode: Always keep everything in PTT, never delete ~ equal to 50% batch insert case as these also are batch deleted IMPORTANT: Simple ONE UPDATE Transaction

13 Talk Outline Immortal DB: a transaction time database Update Performance: timestamping – Timestamping is main update overhead – Prior approaches – Our new approach – Update performance results Support for auditing – What do we provide – Exploiting timestamping implementation Range Read Performance: new page splitting strategy – Storage utilization determines range read performance – Prior split strategy guaranteeing as off version utilization – Our new approach – Storage utilization results

14 Utilization => Range Read Performance Biggest factor is records/page Current data is most frequently read We need technique that will improve storage utilization – Surely for current data – No compromise for historical data Prior page splitting technology evolved from WOB- tree – Which was constrained by write-once media We can do better with write-many media

15 Prior Approaches to Guaranteed Utilization Choose target fill factor for current database – Cant be 100% like unversioned – Higher => more redundant versions for partially persistent indexes Like TSB-tree, BV-tree, WOB-tree Because splitting by time creates redundant versions when they cross time split boundary Naked key splits compromise version utilization – Key split splits history as well as current data – Excessive key splits without time splits drives down storage utilization by any specific version. What to do? Always time split with key split – Removes historical data from new current pages – Permitting them to fill fully to fill factor – Protects historical versions from further splitting – Originally in WOB-tree– a necessity there with WO storage media

16 Why time split with key split? Historical data Added versions Free space Key split Page fills Key split Same page over time Historical page key splitTime split Current page Time split with key split guarantees historical page will have good utilization for its versions

17 Intuition for new splitting technique – Always time split when page first is full – Key split afterwards when the page is full again Historical data Added versions Free space Time splitPage fills Historical page Current page Key split Historical page utilization preserved Current page utilization improved

18 Analytical Result We can show the following: Where in is the insertion ratio, up is the update ratio and cr is the compression ratio. * Formula derived based on one extra time for current pages to fill Added current records with one extra page fill before key split

19 Utilization Experiment 50,000 transactions Each transaction inserts or updates a record Varying the insert / update ratio in the workload Each record is 200 bytes Utilize the delta-compression technique to compress the historical versions (as they share a lot of common bits with newer version)

20 Analysis: Current Storage Utilization vs Update Ratio Expect update ratio of 65% - 85% Update Ratio Cur Utilization

21 Summary Optimizing timestamping yields update performance close to unversioned Optimizing page splitting yields current time range search performance close to unversioned Audit functionality easy to add via timestamping infrastructure Questions???


Download ppt "Improving Transaction-Time DBMS Performance and Functionality David Lomet Microsoft Research Feifei Li Florida State University."

Similar presentations


Ads by Google