Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hideaki Kimura #* Efficient Locking Techniques for Databases on Modern Hardware Goetz Graefe + Harumi Kuno + # Brown University * Microsoft Jim Gray Systems.

Similar presentations


Presentation on theme: "Hideaki Kimura #* Efficient Locking Techniques for Databases on Modern Hardware Goetz Graefe + Harumi Kuno + # Brown University * Microsoft Jim Gray Systems."— Presentation transcript:

1 Hideaki Kimura #* Efficient Locking Techniques for Databases on Modern Hardware Goetz Graefe + Harumi Kuno + # Brown University * Microsoft Jim Gray Systems Lab + Hewlett-Packard Laboratories at ADMS'12 Slides/papers available on request. Email us: hkimura@cs.brown.eduhkimura@cs.brown.edu, goetz.graefe@hp.com, harumi.kuno@hp.comgoetz.graefe@hp.comharumi.kuno@hp.com

2 2/26 Traditional DBMS on Modern Hardware Optimized for Magnetic Disk Bottleneck Fig. Instructions and Cycles for New Order [S. Harizopoulos et al. SIGMOD‘08] Disk I/O Costs Other Costs Useful Work Query Execution Overhead Then What’s This?

3 3/26 Context of This Paper Achieved up to 6x overall speed-up Foster B-trees This Paper Work in progress Consolidation Array, Flush-Pipeline Shore-MT/Aether [Johnson et al'10]

4 4/26 Our Prior Work: Foster B-trees Foster Relationship Fence Keys Simple Prefix Compression Poor-man's Normalized Keys Efficient yet Exhaustive Verification On Sun Niagara. Tested without locks. only latches. Low Latch ContentionHigh Latch Contention 2-3x speed-up6x speedup Implemented by modifying Shore-MT and compared with it: [TODS'12]

5 5/26 Talk Overview 1) Key Range Locks w/ Higher Concurrency Combines fence-keys and Graefe lock modes 2) Lightweight Intent Lock Extremely Scalable and Fast 3) Scalable Deadlock Detection Dreadlocks Algorithm applied to Databases 4) Serializable Early-Lock-Release Serializable all-kinds ELR that allows read-only transaction to bypass logging

6 6/26 1. Key Range Lock 102030 SELECT Key=10 UPDATE Key=30 XS SELECT Key=20 ~ 25 SELECT Key=15 Gap Mohan et al. : Locks neighboring key. Lomet et al.: Adds a few new lock modes. (e.g., RangeX-S) Still lacks a few lock modes, resulting in lower concurrency.

7 7/26 Our Key Range Locking Graefe Lock Modes. All 3*3=9 modes Create a ghost record (pseudo deleted record) before insertion as a separate Xct. Use Fence Keys to lock on page boundary EAEA EBEB … EZEZ DE EF Fence Keys

8 8/26 2. Intent Lock Coarse level locking (e.g., table, database) Intent Lock (IS/IX) and Absolute Lock (X/S/SIX) Saves overhead for large scan/write transactions (just one absolute lock) [Gray et al]

9 9/26 Intent Lock: Physical Contention Key-A Key-B DB-1 VOL-1 IND-1 DB-1 VOL-1 IND-1 Key-A Key-B Lock Queues IS S S IX X X IS S S IX X X LogicalPhysical

10 10/26 Lightweight Intent Lock Key-A Key-B DB-1 VOL-1 IND-1 Key-A Key-B Lock Queues for Key Locks S S X X IS S S IX X X LogicalPhysical Counters for Coarse Locks ISIXSX DB11100 VOL11100 IND11100 No Lock Queue, No Mutex

11 11/26 Intent Lock: Summary Extremely Lightweight for Scalability Just a set of counters, no queue Only spinlock. Mutex only when absolute lock is requested. Timeout to avoid deadlock Separate from main lock table Main Lock TableIntent Lock Table Physical ContentionLowHigh Required FunctionalityHighLow

12 12/26 3. Deadlock Handling Deadlock Prevention (e.g., wound-wait/wait-die) can cause many false positives Deadlock Detection (Cycle Detection) Infrequent check: delay Frequent/Immediate check: not scalable on many cores Timeout: false positives, delays, hard to configure. Traditional approaches have some drawback

13 13/26 Solution: Dreadlocks Immediate deadlock detection Local Spin: Scalable and Low-overhead Almost * no false positives (*)due to Bloom filter More details in paper Issues specific to databases: Lock modes, queues and upgrades Avoid pure spinning to save CPU cycles Deadlock resolution for flush pipeline [Koskinen et al '08]

14 14/26 4. Early Lock Release Resources ABC Lock Commit Request Flush Wait Unlock Commit Protocol T2:X T1:S T3:S T3:X Locks Transactions T1 T2 T3 S: Read X: Write 10ms- T4 T5 T1000 … Group-Commit Flush-Pipeline More and More Locks, Waits, Deadlocks [DeWitt et al'84] [Johnson et al'10]

15 15/26 Prior Work: Aether First implementation of ELR in DBMS Significant speed-up (10x) on many-core Simply releases locks on commit-request "… [must hold] until both their own and their predecessor’s log records have reached the disk. Serial log implementations preserve this property naturally,…" [Johnson et al VLDB'10] Problem: A read-only transaction bypasses logging T1: Write T1: Commit T2: Commit ELR Serial LogLSN 10 11 12 Dependent

16 16/26 Anomaly of Prior ELR Technique D=10 Event Latest LSN Durable LSN T2: D=20 10 (T1: Read D) 20 T2: Commit-Req 30 T1: Read D 40 T1: Commit 51 …..2 T2: Commit..3 T2:X T1:S Lock-queue: "D" Crash! D=20 D is 20! T1 Rollback T2

17 17/26 Naïve Solutions Flush wait for Read-Only Transaction Orders of magnitude higher latency. Short read-only query: microseconds Disk Flush: milliseconds Do not release X-locks in ELR (S-ELR) Concurrency as low as No-ELR After all, all lock-waits involve X-locks

18 18/26 Safe SX-ELR: X-Release Tag D=10 0 Event Latest LSN Durable LSN T2: D=20 10 (T1: Read D) 20 T2: Commit-Req 30 T1: Read D (max-tag=3) 40 T1: Commit-Req51 T3: Read E (max- tag=0) & Commit 62 T1, T2: Commit73 E=5 0 tag T2:X T1:S T3:S 3 Lock-queue: "D" Lock-queue: "E" D=20 T3 E is 5 T1 max-tag

19 19/26 Safe SX-ELR: Summary Serializable yet Highly Concurrent Safely release all kinds of locks Most read-only transaction quickly exits Only necessary threads get waited Low Overhead Just LSN comparison Applicable to Coarse Locks Self-tag and Descendant-tag SIX/IX: Update Descendant-tag. X: Upd. Self-tag IS/IX: Check Self-tag. S/X/SIX: Check Both

20 20/26 Experiments TPC-B: 250MB of data, fits in bufferpool Hardware Sun-Niagara: 64 Hardware contexts HP Z600: 6 Cores. SSD drive Software Foster B-trees (Modified) in Shore-MT (Original) with/without each technique Fully ACID, Serializable mode.

21 21/26 Key Range Locks Z600, 6-Threads, AVG & 95% on 20 Runs

22 22/26 Lightweight Intent Lock Sun Niagara, 60 threads, AVG & 95% on 20 Runs

23 23/26 Dreadlocks vs Traditional Sun Niagara, AVG on 20 Runs

24 24/26 Early Lock Release (ELR) SX-ELR performs 5x faster. S-only ELR isn’t useful All improvements combined, -50x faster. HDD LogSSD Log Z600, 6-Threads, AVG & 95% on 20 Runs

25 25/26 Related Work ARIES/KVL, IM [Mohan et al] Key range locking [Lomet'93] Shore-MT at EPFL/CMU/UW-Madison Speculative Lock Inheritance [Johnson et al'09] Aether [Johnson et al'10] Dreadlocks [Koskinen and Herlihy'08] H-Store at Brown/MIT

26 26/26 Wrap up Locking as bottleneck on Modern H/W Revisited all aspects of database locking 1. Graefe Lock Modes 2. Lightweight Intent Lock 3. Dreadlock 4. Early Lock Release All together, significant speed-up (-50x) Future Work: Buffer-pool

27 27/26

28 28/26 Reserved: Locking Details

29 29/26 Transactional Processing High Concurrency Very Short Latency Fully ACID-compliant Relatively Small Data # Digital Transactions Modern Hardware CPU Clock Speed

30 30/26 Many-Cores and Contentions Logical Contention Physical Contention Critical Section Critical Section Shared Resource Mutex or Spinlock Doesn't Help, even Worsens!

31 31/26 Background: Fence keys A~~B B~ ~C C~~E A~~Z ACE AMV A~ ~M ~C Define key ranges in each page.

32 32/26 Key-Range Lock Mode [Lomet '93] 102030 RangeX-S X S RangeI-N I Adds a few new lock modes Consists of 2 parts; Range and Key RangeS-SS (RangeN-S) But, still lacks a few lock modes * (*) Instant X lock

33 33/26 Example: Missing lock modes 10 20 30 SELECT Key=15 UPDATE Key=20 RangeS-N? RangeS-S X RangeA-B

34 34/26 Graefe Lock Modes New lock modes (*) S≡SS X≡XX *

35 35/26 (**) Ours locks the key prior to the range while SQL Server uses next-key locking. RangeS-N ≈ NS Next-key locking Prior-key locking

36 36/26 LIL: Lock-Request Protocol

37 37/26 LIL: Lock-Release Protocol

38 38/26 Dreadlocks [Koskinen et al '08] A A B B C C D D E E A waits for B (live lock) (dead lock) Thread AB C E D Digest * {A} {B} {C} {E} {D} (*) actually a Bloom filter (bit-vector). 1. does it contain me? 2. add it to myself {A,B} {C,D} {D,E} {E,C} {E,C,D}D deadlock!!

39 39/26 0 Naïve Solution: Check Page-LSN? Read-only transaction can exit only after Commit Log of dependents becomes durable. LSN Page D=10 E=5 1: T2, D, 10 → 20 2: T2, Z, 20 → 10 3: T2, Commit Log-buffer 20 1 T2 T1 Page Z M immediately exits if durable-LSN ≥ 1?

40 40/26 Deadlock Victim & Flush Pipeline

41 41/26 Victim & Flush Pipeline (Cont'd)

42 42/26 Dreadlock + Backoff on Sleep TPC-B, Lazy commit, SSD, Xct-chain max 100k

43 43/26 Related Work: H-Store/VoltDB Disk-based DB ↔ Pure Main-Memory DB Shared-everything ↔ -nothing in each node Differences RAM (Note: both are shared-nothing across-nodes) Foster B-Trees/Shore-MTVoltDB Distributed Xct RAM Keep 'em, but improve 'em.Get rid of latches. -Accessible RAM per CPU -Simplicity and Best-case Performance Pros/Cons Both are interesting directions.

44 44/26 Reserved: Foster B-tree Slides

45 45/26 Latch Contention in B-trees 1. Root-leaf EX Latch 2. Next/Prev Pointers

46 46/26 Foster B-trees Architecture A~~B B~ ~C C~~E A~~Z ACE AMV 1. Fence-keys 2. Foster Relationship A~ ~M ~C cf. B-link tree [Lehman et al‘81]

47 47/26 More on Fence Keys Efficient Prefix Compression Powerful B-tree Verification Efficient yet Exhaustive Verification Simpler and More Scalable B-tree No tree-latch B-tree code size Halved Key Range Locking High: "AAP" Low: "AAF" "AAI31""I31" "I3" "J1" Slot array Poor man's normalization "I31", xxx Tuple

48 48/26 B-tree lookup speed-up No Locks. SELECT-only workload.

49 49/26 Insert-Intensive Case 6-7x Speed-up Latch Contention Bottleneck Log-Buffer Contention Bottleneck Will port "Consolidation Array" [Johnson et al]

50 50/26 Chain length: Mixed 1 Thread

51 51/26 Eager-Opportunistic

52 52/26 B-tree Verification


Download ppt "Hideaki Kimura #* Efficient Locking Techniques for Databases on Modern Hardware Goetz Graefe + Harumi Kuno + # Brown University * Microsoft Jim Gray Systems."

Similar presentations


Ads by Google