Presentation is loading. Please wait.

Presentation is loading. Please wait.

VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,

Similar presentations


Presentation on theme: "VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,"— Presentation transcript:

1 VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California, Santa Barbara, §Zhejiang University 15300240024 王夏青 LogBase: A Scalable Log-structured Database System in the cloud

2 Abstract  Introduction  Background & Related Work  Design & Implementation  Performance Evaluation  Conclusion

3 Introduction: Requirements  High write throughput  Dynamic scalability  Efficient multiversion data access  Transactional semantics  Fast recovery from machine failures

4 Introduction: Characters  Log serves as the unique data repository in the system  Adopts an architecture similar to HBase and BigTable where a mashine in the system is responsible for some tablets  Builds an index per tablet for retrieving the data from the log

5 Introduction: Contributions  Propose LogBase – a scalable log-structured database system that can be dynamically deployed in the cloud.  Design a multiversion index strategy in LogBase to provide efficient access to the multiversion data.  Enhance LogBase to support transactional.  Conduct an extensive performance study on LogBase.

6 Background & Related Work  No-overwrite Strategies: System R: shadow paging strategy; POSTGRES: delta record  WAL+Data: Most storage systems  Log-structured Systems: LFS, BlueSky, Berkeley DB, PrimeBase, Hyder, RAMCloud

7 Design & Implementation: Data Model  Model: relational data model  Data Partitioning: vertical: column groups; horizontal: tablets

8 Design & Implementation: Architecture Overview  Log Repository  Data Access Manager  Transaction Manager

9 Design & Implementation: Log Repository  Guarantee: Stable storage: The log-only approach provides similar capability of recovering data from machine failures compared to the WAL+Data approach  Stores the log in HDFS  Design choices for the implementation of the log  Log record: LogKey: LSN, table name, tablet information Data:

10 Design & Implementation: In-memory Multiversion Index  Index: to provide efficient access to the data  In-memory index  Index structure: Blink-trees  Index entry: IdxKey: primary key + timestamp  Consumption analysis

11 Design & Implementation: Tablet Serving(1)

12 Design & Implementation: Tablet Serving(2)  Write  Read  Delete  Scan  Compaction

13 Design & Implementation: Transaction Management(1) Concurrency Control and Isolation:  The Rationale of MVOCC  Validation with Write Locks  Snapshot Isolation in LogBase  Guarantee: Isolation: The hybrid scheme of multiversion optimistic concurrency control(MVOCC) in LogBase guarantees snapshot isolation

14 Design & Implementation: Transaction Management(2) Commit Protocol and Atomicity:  Guarantee: Atomicity: The LogBase’s commit protocol guarantees similar atomicity property to the WAL+Data approach  Commit procedure

15 Design & Implementation: Failures and Recovery  Guarantee: Durability: The LogBase’s recovery protocol guarantees similar data durability property to the WAL+Data approach  Checkpoint operation  Recovery procedure

16 Performance Evaluation: Experimental Setup  An in-house cluster including 24 machines, each with a quad core processor, 8 GB of physical memory, 500 GB of disk capacity and 1 gigabit Ethernet  Implemented in Java, inherits basic infrastructures from HBase open source  Compare the performance of LogBase with HBase  Workload: 5000 operations  15000 operations for warming up the cathe

17 Performance Evaluation: Micro-benchmarks(1) Basic data operations:  Write  Random read  Sequential scan  Range scan

18 Performance Evaluation: Micro-benchmarks(2)

19 Performance Evaluation: Micro-benchmarks(3)

20 Performance Evaluation: Micro-benchmarks(4)

21 Performance Evaluation: YCSB Benchmark(1)  Mixed workloads: 95% and 75% update in the workload  Varying system sizes: 3 to 24 nodes

22 Performance Evaluation: YCSB Benchmark(2)

23 Performance Evaluation: YCSB Benchmark(3)

24 Performance Evaluation: TPC-W Benchmark(1)  Examine the performance when accessing multiple data records possibly from different tables within the transaction boundary  Models a webshop application workload  Browsing, shopping, ordering: 5%, 20%, 50% update transactions

25 Performance Evaluation: TPC-W Benchmark(2)

26 Performance Evaluation: Checkpoint and Recovery

27 Performance Evaluation: Comparison with Log-structured Systems(1)  RAMClouds: stores its data and indexes entirely in memory  Hyder: scales its database in shared-flash environments without data partitioning  LRS: has a distributed architecture and data partitioning strategy similar to RAMCloud and LogBase but stores data on disks

28 Performance Evaluation: Comparison with Log-structured Systems(2)

29 Performance Evaluation: Comparison with Log-structured Systems(3)

30 Conclusion  Introduced a scalable log-structured database system called LogBase  Can be elastically deployed in the cloud  Can provide sustained write throughput and effective recovery time  The in-memory indexes support efficient data retrieval  Provides the widely accepted snapshot isolation for transactions  Extensive experiments  Future works: the design and implementation of efficient secondary indexes and query processing for LogBase

31 Thanks


Download ppt "VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,"

Similar presentations


Ads by Google