Presentation on theme: "1 Scaleable Replicated Databases Jim Gray (Microsoft) Pat Helland (Microsoft) Dennis Shasha (Columbia) Pat ONeil (U.Mass)"— Presentation transcript:
1 Scaleable Replicated Databases Jim Gray (Microsoft) Pat Helland (Microsoft) Dennis Shasha (Columbia) Pat ONeil (U.Mass)
2 Outline u Replication strategies –Lazy and Eager –Master and Group u How centralized databases scale –deadlocks rise non-linearly with F transaction size F concurrency u Replication systems are unstable on scaleup u A possible solution
3 Scaleup, Replication, Partition u N 2 more work
4 Why Replicate Databases? u Give users a local copy for –Performance –Availability –Mobility (they are disconnected) u But... What if they update it? u Must propagate updates to other copies
5 Propagation Strategies u Eager: Send update right away –(part of same transaction) –N times larger transactions u Lazy: Send update asynchronously –separate transaction –N times more transactions u Either way –N times more updates per second per node –N 2 times more work overall
6 Update Control Strategies u Master –Each object has a master node –All updates start with the master –Broadcast to the subscribers u Group –Object can be updated by anyone –Update broadcast to all others u Everyone wants Lazy Group: –update anywhere, anytime, anyway
7 Quiz Questions: Name One u Eager –Master:N-Plexed disks –Group: ? u Lazy –Master: Bibles, Bank accounts, SQLserver –Group:Name servers, Oracle, Access... u Note: Lazy contradicts Serializable –If two lazy updates collide, then... reconcile F discard one transaction (or use some other rule) F Ask for human advice u Meanwhile, nodes disagree => –Network DB state diverges: System Delusion
8 Anecdotal Evidence u Update Anywhere systems are attractive u Products offer the feature u It demos well u But when it scales up –Reconciliations start to cascade –Database drifts out of sync (System Delusion) u Whats going on?
9 Outline u Replication strategies –Lazy and Eager –Master and Group u How centralized databases scale –deadlocks rise non-linearly u Replication is unstable on scaleup u A possible solution
10 Simple Model of Waits u TPS transactions per second u Each –Picks Actions records uniformly from set of DBsize records –Then commits About Transactions x Actions/2 resources locked About Transactions x Actions/2 resources locked u Chance a request waits is u Action rate is TPS x Actions Active Transactions Active Transactions TPS x Actions x Action_Time Wait Rate = Action rate x Chance a request waits Wait Rate = Action rate x Chance a request waits u = u 10x more transactions, 100x more waits DBsizerecords Transctions x Actions 2 TPS 2 x Actions 3 x Action_Time TPS 2 x Actions 3 x Action_Time 2 x DB_size 2 x DB_size Transactions x Actions Transactions x Actions 2 x DB_size
11 Simple Model of Deadlocks TPS 2 x Actions 3 x Action_Time TPS 2 x Actions 3 x Action_Time 2 x DB_size TPS x Actions 3 x Action_Time TPS x Actions 3 x Action_Time 2 x DB_size TPS x Actions x Action_Time TPS 2 x Actions 5 x Action_Time TPS 2 x Actions 5 x Action_Time 4 x DB_size 2 u A deadlock is a wait cycle u Cycle of length 2: –Wait rate x Chance Waitee waits for waiter – Wait rate x (P(wait) / Transactions) u Cycles of length 3 are PW 3, so ignored. 10 x bigger trans = 100,000 x more deadlocks 10 x bigger trans = 100,000 x more deadlocks
12 Summary So Far u Even centralized systems unstable u Waits: –Square of concurrency –3rd power of transaction size u Deadlock rate –Square of concurrency –5th power of transaction size Trans Size Concurrency
13 Outline u Replication strategies u How centralized databases scale u Replication is unstable on scaleup F Eager (master & group) F Lazy (master & group & disconnected) u A possible solution
14 Eager Transactions are FAT If N nodes, eager transaction is N x bigger If N nodes, eager transaction is N x bigger –Takes N x longer –10 x nodes, 1,000 x deadlocks – (derivation in paper) u Master slightly better than group u Good news: –Eager transactions only deadlock –No need for reconciliation
15 Lazy Master & Group u Use optimistic concurrency control –Keep transaction timestamp with record –Updates carry old+new timestamp –If record has old timestamp F set value to new value F set timestamp to new timestamp –If record does not match old timestamp F reject lazy transaction –Not SNAPSHOT isolation (stale reads) u Reconciliation: –Some nodes are updated –Some nodes are being reconciled –Some nodes are being reconciled NewTimestamp Write A Write B Write C Commit Write A Write B Write C Commit Write A Write B Write C Commit OID, old time, new value TRID, Timestamp A Lazy Transaction
16 Reconciliation u Reconciliation means System Delusion –Data inconsistent with itself and reality u How frequent is it? u Lazy transactions are not fat –but N times as many –Eager waits become Lazy reconciliations –Rate is: –Assuming everyone is connected TPS 2 x (Actions x Nodes) 3 x Action_Time TPS 2 x (Actions x Nodes) 3 x Action_Time 2 x DB_size
17 Eager & Lazy: Disconnected u Suppose mobile nodes disconnected for a day u When reconnect: –get all incoming updates –send all delayed updates Incoming is Nodes x TPS x Actions x disconnect_time Incoming is Nodes x TPS x Actions x disconnect_time Outgoing is: TPS x Actions x Disconnect_Time Outgoing is: TPS x Actions x Disconnect_Time u Conflicts are intersection of these two sets Action_Time Action_Time Disconnect_Time x (TPS x Actions x Nodes) 2 Disconnect_Time x (TPS x Actions x Nodes) 2DB_size
18 Outline u Replication strategies (lazy & eager, master & group) u How centralized databases scale u Replication is unstable on scaleup u A possible solution –Two-tier architecture: Mobile & Base nodes –Base nodes master objects –Tentative transactions at mobile nodes F Transactions must be commutative –Re-apply transactions on reconnect –Transactions may be rejected
19 Safe Approach u Each object mastered at a node u Update Transactions only read and write master items u Lazy replication to other nodes u Allow reads of stale data (on user request) u PROBLEMS: –doesnt support mobile users –deadlocks explode with scaleup u ?? How do banks work???
20 Two Tier Replication u Two kinds of nodes: –Base nodes always connected, always up –Mobile nodes occasionally connected u Data mastered at base nodes u Mobile nodes –have stale copies –make tentative updates
21 Mobile Node Makes Tentative Updates u Updates local database while disconnected u Saves transactions u When Mobile node reconnects: Tentative transactions re-done as Eager-Master (at original time??) u Some may be rejected –(replaces reconciliation) u No System Delusion.
22 Tentative Transactions u Must be commutative with others –Debit 50$ rather than Change 150$ to 100$. u Must have acceptance criteria –Account balance is positive –Ship date no later than quoted –Price is no greater than quoted Tentative Transactions Transactions at local DB at local DB Updates & Rejects TransactionsFromOthers send Tentative Xacts
23 Refinement: Mobile Node Can Master Some Data u Mobile node can master private data –Only mobile node updates this data –Others only read that data u Examples: –Orders generated by salesman –Mail generated by user –Documents generated by Notes user.
24 Virtue of 2-Tier Approach u Allows mobile operation u No system delusion u Rejects detected at reconnect (know right away) u If commutativity works, –No reconciliations –Even though work rises as (Mobile + Base) 2
25 Outline u Replication strategies (lazy & eager, master & group) u How centralized databases scale u Replication is unstable on scaleup u A possible solution (two-tier architecture) –Tentative transactions at mobile nodes –Re-apply transactions on reconnect –Transactions may be rejected & reconciled u Avoids system delusion