Bigtable A Distributed Storage System for Structured Data.

Slides:



Advertisements
Similar presentations
Introduction to cloud computing
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon 10/22/2012 Fall.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
Big Table Alon pluda.
Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber Bigtable: A Distributed.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable A System for Distributed Structured Storage Yanen Li Department of Computer Science University of Illinois at Urbana-Champaign
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
BigTable and Google File System
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Cloudera Kudu Introduction
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CSCI5570 Large Scale Data Processing Systems
Bigtable A Distributed Storage System for Structured Data
Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD
Bigtable: A Distributed Storage System for Structured Data
GFS and BigTable (Lecture 20, cs262a)
Data Management in the Cloud
CSE-291 (Cloud Computing) Fall 2016
Gowtham Rajappan.
Google Filesystem Some slides taken from Alan Sussman.
Cloud Computing Storage Systems
A Distributed Storage System for Structured Data
THE GOOGLE FILE SYSTEM.
Presentation transcript:

Bigtable A Distributed Storage System for Structured Data

Abstract Distributed Storage System Petabytes of Structured Data Web indexing, Google Earth, Google Finance……

1. Introduction Goals: wide applicability, scalability, high performance, high availability Not full relational data model Row + Column Name Uninterpreted Strings

2. Data Model A sparse, distributed, persistent multi- dimensional sorted map (row: string, column: string, time:int64)--->string

Rows Row keys are arbitrary strings(up to 64KB) Atomic read and write Lexicographic order Tablet is the unit of distribution and balancing

Column Families Column keys are grouped into sets called column families Same type data(compressed together) Column Key family:qualifier Access control and both disk and memory accounting are performed at the column- family level

Timestamp 64-bit integer(us) Each cell can contain multiple versions of the same data Assigned by Bigtable or client Data are stored in decreasing timestamp order Garbage collect

3. API Create and delete tables and column families Change cluster, table and column family metadata, such as control rights Read and write values in Bigtable

Feature Single-row transaction Client-supplied scripts(Sawzall) for data processing(only reading)

4. Building Blocks Google File System(GFS) Operates in a shared pool of machines Depends on a cluster management system for scheduling jobs, managing resources, dealing with machine failures, and monitoring machine status

SSTable and Chubby SSTable – Block – block index Chubby – distributed lock service – Paxos – dir and small files can be used as lock – session/session lease – callback

Chubby Bigtable use Chubby for a variety of tasks – ensure one active master at any time – store bootstrap location – discover tablet servers and finalize tablet server deaths – store schema information – store access control lists

5. Implementation Client library Master server Tablet server

5.1 Tablet Location Three level hierarchy

METADATA METADATA Table (table id, end row)---> (location of tablet, secondary information) Client library caches tablet location – incorrect, empty, stale – prefetch

5.2 Tablet Assignment Master assigns tablet to tablet servers When tablet server starts, it creates and acquires an exclusive lock on, a uniquely- named file in specific Chubby directory(servers directory) A tablet server stops serving its tablets if it loses its exclusive lock(e.g. loses its session) Whenever a tablet server terminates, it attempts to release its lock.

When a master is started by the cluster management system – grabs a unique master lock in Chubby – scans the servers directory – communicates with every live tablet server to discover what tablets are already assigned to each server – scans the METADATA table – add unassigned tablet to a unassigned tablet set – first add root tablet

The master is responsible for detecting when a tablet server is no longer serving its tablets, and for reassigning those tablets as soon as possible The master periodically asks each tablet server for the status of its lock

The master initiates these tablets changes – when a table is created or deleted – when two tablets are merged A tablet server initiates tablet split – commit the split by recording in METADATA – notifies the master

5.3 Tablet Serving

Write Operation Check – well-formed – authorized Write to commit log(group commit) Insert into memtable

Read Operation Check – well-formed – authorized Merged view of SSTables and memtable – SSTables and memtable are lexicographic sorted data structures, the merged view can be formed efficiently

Recover Tablet Memtable – recently committed updates SSTables – older updates To recover a tablet – a tablet server read its metadata from METADATA – metadata contains the list of SSTables and a set of redo points – the tablet server read SSTables indices and reconstruct the memtable since the redo points

5.4 Compactions Minor compaction – when the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS Merging compaction – merge a few SSTables and the memtable and write out a new SSTable Major compaction – rewrites all SSTables into one SSTable – produces an SSTable contains no deleted data – Bigtable cycles through all of its tablets and regularly applies major compactions to them

6. Refinements The implementation described in the previous section required a number of refinements to achieve the high performance, availability, and reliability required by our users.

Locality Groups Clients can group multiple column families together into a locality group A separate SSTable is generated for each locality group Effect – efficient reads – tuning parameters based on locality group in-memory(read frequently, location column family in METADATA table)

Compression Clients can control whether or not the SSTables for a locality group are compressed, and if so, which compression format is used Many clients use a two-pass custom compression scheme When similar data ends up clustered, applications achieve very good compression ratios

Caching for Read Performance Scan Cache – higher level, key-value pairs – from SSTable interface – useful for repeatedly read Block Cache – lower-level, SSTable blocks – from GFS – useful for sequential reads

Bloom Filters A Bloom filter allows us to ask whether an SSTable might contain any data for a specfied row/column pair In tablet server memory Reduce disk seeks

Commit-log Implementation A single commit log per tablet server Parallelize sorting log on different tablet server; sequential reads To protect mutations from GFS latency spikes, each tablet server actually has two log writing threads, each writing to its own log file; only one of these two threads is actively in use at a time

Speeding Up Tablet Recovery When move a tablet Two minor compactions first-->stop serving-->second-->unload After this second minor compaction is complete, the tablet can be loaded on another tablet server without requiring any recovery of log entries

Exploiting Immutability All of the SSTables that we generate are immutable The only mutable data structure that is accessed by both reads and writes is the memtable Copy-on-write Mark-and-swap

7. Performance Evaluation