Presentation is loading. Please wait.

Presentation is loading. Please wait.

Company LOGO MVCC on Flash Memory Fan Yulei, Lab of WAMDM, School of Information, Renmin University of China, Beijing, China, 2009-06-13.

Similar presentations


Presentation on theme: "Company LOGO MVCC on Flash Memory Fan Yulei, Lab of WAMDM, School of Information, Renmin University of China, Beijing, China, 2009-06-13."— Presentation transcript:

1 Company LOGO MVCC on Flash Memory Fan Yulei, Lab of WAMDM, School of Information, Renmin University of China, Beijing, China, 2009-06-13

2 Outline Motivation MVCC Berkeley DB PostgreSQL Future work

3 Motivation  Characteristics  Not In-Place Update HDD Flash

4 Motivation Transaction CC 2PL MVCC Conflict graph Timestamp Index CC Recovery Log Transaction Media 2PL MVCC 1st : Lock 2nd : Release Lock Multiple Version Directed Acycling Graph Timestamp Ordering Index : B+-Tree Log File & Data File Checkpoint: D & S Read Log file  Undo & Redo Backup Database Hot-standby : mirrored media Kinds of Lock Snapshot Isolation

5 MVCC  Monoversion Schedule  s = r 1 (x) w 1 (x) r 2 (x) w 2 (y) r 1 (y) w 1 (z) c 1 c 2  s ’ = r 1 (x) w 1 (x) r 2 (x) r 1 (y) w 2 (y) w 1 (z) c 1 c 2  Multiversion Schedule & Monoversion Schedule  Multiversion Schedule m = r 1 (x 0 ) w 1 (x 1 ) r 2 (x 1 ) w 2 (y 2 ) r 1 (y 0 ) w 1 (z 1 ) c 1 c 2 h(r i (x))=w j (x) & h(w i (x))=w i (x): version function  Monoversion Schedule m = r 1 (x 0 ) w 1 (x 1 ) r 2 (x 1 ) w 2 (y 2 ) r 1 (y 2 ) w 1 (z 1 ) c 1 c 2 s = r 1 (x) w 1 (x) r 2 (x) w 2 (y) r 1 (y) w 1 (z) c 1 c 2 Monoversion Schedule is a special case of Multiversion Schedule Conflict cycle: t 1,t 2

6 MVCC  Traditional Conflict  s = w 0 (x) c 0 w 1 (x) c 1 r 2 (x) w 2 (y) c 2  m = w 0 (x 0 ) c 0 w 1 (x 1 ) c 1 r 2 (x 0 ) w 2 (y 2 ) c 2  View Equivalent  Reads-From Relationship RF(m) := {(t i, x, t j ) | r j (x i ) ∈ OP(m) & t i, t j ∈ trans(m)}  View Equivalent trans(m) = trans(m ’ ) and RF(m) = RF(m ’ )  Example m = w 0 (x 0 ) w 0 (y 0 ) c 0 r 3 (x 0 ) w 3 (x 3 ) c 3 w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (y 2 ) c 2 m ’ = w 0 (x 0 ) w 0 (y 0 ) c 0 w 1 (x 1 ) c 1 r 2 (x 1 ) r 3 (x 0 ) w 2 (y 2 ) w 3 (x 3 ) c 3 c 2

7 MVCC  Multiversion View Serializability  Serializable but not View Equivalent m = w 0 (x 0 ) w 0 (y 0 )c 0 r 1 (x 0 ) r 1 (y 0 ) w 1 (x 1 ) w 1 (y 1 )c 1 r 2 (x 0 ) r 2 (y 1 )c 2 s = w 0 (x) w 0 (y)c 0 r 1 (x) r 1 (y) w 1 (x) w 1 (y)c 1 r 2 (x) r 2 (y)c 2  MVSR m’ is a serialized monoversion schedule trans(m) = trans(m’) m and m’ are view equivalent  Example m = w 0 (x 0 ) w 0 (y 0 ) c 0 w 1 (x 1 ) c 1 r 2 (x 1 ) r 3 (x 0 ) w 3 (x 3 ) c 3 w 2 (y 2 ) c 2 m ’ = w 0 (x 0 ) w 0 (y 0 ) c 0 r 3 (x 0 ) w 3 (x 3 ) c 3 w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (y 2 ) c 2 s = w 0 (x) w 0 (y) c 0 r 3 (x) w 3 (x) c 3 w 1 (x) c 1 r 2 (x) w 2 (y) c 2

8 MVCC  Conflict Graph G(m) = (V, E)  V = trans(m) ;  E = {(t i, t j ) | r j (x i ) ∈ OP(m) & t i, t j ∈ trans(m)}}  m and m’ are View Equivalent => G(m) = G(m’)  Version Oder  m = w 0 (x 0 ) w 0 (y 0 ) w 0 (z 0 ) c 0 r 1 (x 0 ) r 2 (x 0 ) r 2 (z 0 ) r 3 (z 0 ) w 1 (y 1 ) w 2 (x 2 ) w 3 (y 3 ) w 3 (z 3 ) c 1 c 2 c 3 r 4 (x 2 ) r 4 (y 3 ) r 4 (z 3 ) c 4  Version Oder = {x 0 «x 2, y 0 «y 1 «y 3, z 0 «z 3 }  MVSG  MVSG = G(m) + Version Order  r k (x j ) and w i (x i ), k≠i≠j  If x i « x j then (t i, t j ) ∈ E; else (t k, t i ) ∈ E  M ∈ MVSR iff MVSG(m, «) have no cycle T0 T2 T3 T1 T4 r 2 (x 0 )r 2 (y 1 ) r 2 (x 1 )r 2 (y 0 )

9 MVCC  Multiversion Conflict  r i (x j ) and w k (x k ) and r i (x j ) < w k (x k )  Multiversion Conflict Serializability  m’ is a serialized monoversion schedule  trans(m) = trans(m’)  Pair of operations with conflict: same ordering  Multiversion Conflict Graph  E={( t i, t k ) | r i (x j ) < w k (x k ) }  M ∈ MVCR iff MSVG(m, «) have no cycle all MVSR MCSR VSR CSR

10 MVCC  Limit the number of version: k=2  w 0 (x 0 ) c 0 r 1 (x 0 ) w 3 (x 3 ) c 3 w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (x 2 ) c 2  w 0 (x 0 ) c 0 r 1 (x 0 ) w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (x 2 ) c 2 w 3 (x 3 ) c 3  w 0 (x 0 ) c 0 r 1 (x 0 ) w 1 (x 1 ) c 1 w 3 (x 3 ) c 3 r 2 (x 3 ) w 2 (x 2 ) c 2  w 0 (x 0 ) c 0 r 2 (x 0 ) w 2 (x 2 ) c 2 r 1 (x 2 ) w 1 (x 1 ) c 1 w 3 (x 3 ) c 3  w 0 (x 0 ) c 0 r 2 (x 0 ) w 2 (x 2 ) c 2 w 3 (x 3 ) c 3 r 1 (x 3 ) w 1 (x 1 ) c 1  w 0 (x 0 ) c 0 w 3 (x 3 ) c 3 r 1 (x 3 ) w 1 (x 1 ) c 1 r 2 (x 1 ) w 2 (x 2 ) c 2  w 0 (x 0 ) c 0 w 3 (y 3 ) c 3 r 2 (x 3 ) w 2 (x 2 ) c 2 r 1 (x 2 ) w 1 (x 1 ) c 1  K-version view serializability (kVSR):  Serializable  View equivalent  k newest/nearest version  Hierarchy Relationship x 1,x 2 x 2,x 3 x 1,x 3 x 1,x 2

11 MVCC  MVCC Protocol  MVTO (multiversion timestamp ordering)  MV2PL : 2VPL three kinds of kinds: rl, wl, cl  MVSGT  ROMV Read-only transaction

12 Berkeley DB  Five components  Deadlock detection db_deadlock DB_ENV->lock_detect, DB_ENV->set_lk_detect  Checkpoints db_checkpoint DB_ENV->txn_checkpoint  Database and log file archival db_archive DB_ENV->log_archive  Log file removal db_archive DB_ENV->log_archive  Recovery procedures db_recover DB_ENV->open a standalone utility one or more library interfaces

13 Berkeley DB  Transaction API  Transaction Subsystem and Related Methods Description DB_ENV->txn_checkpoint, DB_ENV->txn_recover DB_ENV->txn_stat DB_ENV->open DB_ENV->close DB_ENV->remove  Transaction Subsystem Configuration DB_ENV->set_timeout DB_ENV->set_tx_max DB_ENV->set_tx_timestamp  Transaction Operations DB_ENV->txn_begin DB_TXN->abort DB_TXN->commit DB_TXN->discard DB_TXN->id DB_TXN->prepare DB_TXN->set_name DB_TXN->set_timeout

14 Berkeley DB  2PL In Berkeley DB  Locks are released during DB_TXN->abort or DB_TXN->commit.  Guidelines: If possible, use nested transactions to protect the parts of your transaction most likely to deadlock  Transaction limits  Transaction IDs: 31-bit unsigned integer (OX80000000)  Cursors: can not span more transactions, must be opened and closed within a single transaction  Multiple Threads of Control:

15 Berkeley DB  Several filesystem operations on Berkeley DB  Disk seek to database file, Database file read, Disk seek to log file, Log file write, Disk seek to update log file metadata, Log metadata write, Flush log file information to disk, Flush log file metadata to disk  Ways to increase transactional throughput  Berkeley DB software support group commit  Additional tuning parameters Tune the size of the database cache Put the database and the log files on different disks Set the filesystem configuration Upgrade your hardware Turn on DB_TXN_WRITE_NOSYNC or DB_TXN_NOSYNC flags –ACI, but not D

16 PostgreSQL  PG: a sanpshot of data  Reading never blocks writing  Writing never blocks reading  Three undesirable phenomena  dirty reads, non-repeatable reads, phantom read  SQL Transaction Isolation Levels Isolation LevelDirty ReadNon-Repeatable ReadPhantom Read Read uncommittedPossible Read committedNot possiblePossible Repeatable readNot possible Possible SerializableNot possible

17 PostgreSQL  Read Committed Isolation Level  the default isolation level  A SELECT query sees only data committed  The SELECT does see the effects of previous updates executed within this same transaction  Two successive SELECTs can see different data Other transactions commit changes during executions  NOT adequate for many applications that do complex queries and updates  Serializable Isolation Level  This level emulates serial transaction execution.

18 PostgreSQL  Data consistency checks at the application level  Readers in PostgreSQL don't lock data  To ensure the current existence of a row and protect it against concurrent updates one must use SELECT FOR UPDATE or an appropriate LOCK TABLE statement. (SELECT FOR UPDATE locks just the returned rows against concurrent updates, while LOCK TABLE protects the whole table.)  Lock and Tables  Table-level Lock  Row-level : when rows are being updated  Lock and Index  Gist and R-tree : released after statement is done  Hash Index : released after page is processed  B-Tree : released immediately after each index tuple is fetched/inserted

19 ASLRSLRELSUELSLSRELELAEL AccessShareLock √√√√√√√× RowShareLock √√√√√√×× RowExclusiveLock √√√√×××× ShareUpdateExclusiveLock √√√××××× ShareLock √√××√××× ShareRowExclusiveLock √√×××××× ExclusiveLock √××××××× AccessExclusiveLock ×××××××× SRDRIRURATDTCILT AccessShareLock √√√√√√√√ RowShareLock √√ RowExclusiveLock √√√√ ShareUpdateExclusiveLock √ ShareLock √√ ShareRowExclusiveLock √ ExclusiveLock √ AccessExclusiveLock √√√

20 Future work  Experiment  BDB & PG Code  Transaction on Flash Memory  Concurrency Control MVCC  Recovery Log

21 Company LOGO


Download ppt "Company LOGO MVCC on Flash Memory Fan Yulei, Lab of WAMDM, School of Information, Renmin University of China, Beijing, China, 2009-06-13."

Similar presentations


Ads by Google