Bigtable: A Distributed Storage System for Structured Data

Slides:



Advertisements
Similar presentations
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Big Table Alon pluda.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Bigtable A Distributed Storage System for Structured Data.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
1 CMPT 431© A. Fedorova Google File System A real massive distributed file system Hundreds of servers and clients –The largest cluster has >1000 storage.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Bigtable A Distributed Storage System for Structured Data
Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD
Data Management in the Cloud
CSE-291 (Cloud Computing) Fall 2016
Gowtham Rajappan.
Google Filesystem Some slides taken from Alan Sussman.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
湖南大学-信息科学与工程学院-计算机与科学系
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
CS 345A Data Mining MapReduce This presentation has been altered.
The Google File System (GFS)
Cloud Computing Storage Systems
A Distributed Storage System for Structured Data
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
The Google File System (GFS)
Presentation transcript:

Bigtable: A Distributed Storage System for Structured Data Authors: Chang et al Google Inc Presenter: Victoria Cooper

Introduction Create a distributed storage system for structured data that will 1. Wide applicability 2. Scalability 3. High performance 4. High availability

Outline Data Model API Construction of Bigtable Implementation and refinements Evaluation Applications Conclusions

Data Model Dynamic control over layout and format locality properties Names for indexing can be arbitrary strings Ability to dynamically control whether the data comes from

Data Model (row:string, column:string, time:int64)-> string MAP Row key Uninterpreted array of bytes Column key Timestamp

Data Model: Figure 1

Rows: Tablets Row range for a table Dynamically partitioned Unit of distribution and load balancing Table Tablet

Rows Reads of short tablets are efficient and only need a few machines This can lead to good locality

Columns Unit of access control Family before key Small number of column families Large amount of columns Column family

Columns Column key: Family names Must be printable Qualifiers Family: qualifier syntax Family names Must be printable Qualifiers Arbitrary strings

Columns: Example 1 Column family: Language webpage is written in Column key: stores the each web page’s language id

Columns: Example 2 Family: anchor Key: single anchor

Timestamps Multiple versions of the same data 64 bit integers Can be assigned by big table (real time) Can be assigned by client Need to be unique to avoid collisions Stored in decreasing order Most recent read first

Timestamps Two settings to garbage collect from column families 1) Bigtable garbage collects automatically 2) User specifies only the last n versions be kept

API Create tables/columns Delete tables/columns Alters: Cluster Table Column family metadata (access control rights)

API: Figure 2

API: Figure 3

API Single-row transactions Allows cells to be integer counters Execution of client-supplied scripts Written inn Sawzall Can be used with MapReduce

Building Blocks Google File Systems Google SSTable (file format) Chubby (lock service)

Google File System (GFS) Stores and logs files Bigtable cluster – operates on machines that do many different operations for different reasons Cluster management: properly schedule jobs Manage resources Deal with failures Monitor status

KEYS VALUES SSTable Stores Bigtable data Map from keys to values Persistent Immutable Ordered Both keys and values are arbitrary byte strings KEYS VALUES

SSTable 64 KB Block Index

Chubby Serves Requests Master Replica Replica Replica Replica

Chubby Namespace

Bigtable and Chubby One active master at a time Store bootstrap location of Bigtable data Discover tablet servers Finalize tablet server deaths Store column family information Store access control lists

Implementation Library Master server Assigning tablets to tablet servers Many Tablet servers Manages a set of tablets Tablet Servers Library Master Server Client Client Client

Implementation Tablet Server Client Data Tablets Master Server Reads Writes Client Data Tablets

Cluster Cluster Table Table Tablet Tablet Row Range Data

Tablet Location: Figure 4

METADATA table Stores location of tablet Under row key Typically stores: 1 KB data Library caches tablet locations If incorrect: moves up the hierarchy If cache is empty: could take 3 trips If cache is stale: could take 6 trips

Tablet Assignment Tablet server start Lock Unique chubby file Master looks at server’s directory Tablet stops serving Loses its lock Loses its file Try to reacquire lock If file does not exist it kills itself When dead it releases the lock

Tablet Assignment Master’s job to detect tablets who stop serving Periodically checks tablets to see if lock still exist If can’t reach or tablet has lost its lock Master tries to get exclusive lock If able: Tablet is dead Tablet is having trouble contacting Chubby Master kills tablet Moves files

Tablet Assignment The set of existing tables change: Tablet Splits Table is created Table is deleted Tables are merged Tables are split Tablet Splits Initiated by tablet server Notifies Master and updates METADATA table

Tablet Serving: Figure 5

Tablet Serving Write Operation Read Operation Checks form Checks authorization Checks a list in Chubby file Valid mutation is written Contents are inserted into memtable Checks form Checks authorization Valid read is executed on a merged view of SSTables and memtable

Minor Compaction Shrinks memory usage on tablet server Reduces amount of data to be read from commit log in case of error SSTable memtable memtable memtable

Merging Compaction Merges a given number of SSTables and the memtable into a new SSTable Discards the old data SSTable SSTable SSTable memtable SSTable

Major Compaction Merging compaction that re-writes all SSTables into 1 SSTable No deleted data Reclaim resources used by deleted data Bigtable will periodically do these SSTable SSTable SSTable memtable SSTable

Refinements Locality groups Compression Caching for read performance Bloom filters Commit-log implementation Speeding up tablet recovery Exploiting immutability

Refinements Locality Groups Compression Multiple column families grouped together Separate SSTable Efficient reads User specified compression format Compress SSTable 2 pass compression scheme Speed and space efficient

Refinements Caching for read Bloom Filters 2 levels of caching Scan Cache Block Cache Reduces number of accesses to disk memory Filter for the SSTables in a certain locality group Check to see if a SSTable might have data for a row column pair

Commit-log Implementation Speed-up Tablet Recovery Refinements Commit-log Implementation Speed-up Tablet Recovery Append mutations to a single commit log per tablet server One log has performance benefits Complicates recovery Avoid duplicating the log Master moves tablet to a different server Minor compaction Tablet Server 1 stops serving tablet Tablet loaded onto Tablet Server 2 No recovery of log entries required

Refinements Immutability SSTables are immutable Do not need to synchronize access Deleting data = garbage collection Split tablets quickly Memtable is mutable Each row is copy-on-write

Performance Evaluation N Tablet Servers that make the cluster 1 GB Write to GFS 1 GB Write to GFS 1 GB Write to GFS Client Servers Sufficient physical memory Client Servers Sufficient physical memory Client Servers Sufficient physical memory

Performance Evaluation The machines were in a two level tree-shaped switched network 100-200 Gbps of bandwidth available at root Run on the same machines Tablet servers and master Test clients GFS servers Machines ran: Tablet server Client or other job processes

Performance Evaluation Sequential write Random writes Sequential read Random reads Random reads from memory Scan

Write Benchmarks Sequential Random Used row keys with names 0 to R Partitioned into 10N equal ranges Ranges assigned to the N clients (dynamic assignment) Wrote a single string under each row key Row keys are distinct Similar to sequential write Hashed row key row key % R before writing This ensured write load was evenly distributed across row space R is the distinct number of Bigtable row keys involved with the test

Read Benchmarks Sequential Random Generation of row keys same as sequential write Reads string under row key Similar to random read Hashed the row key before reading the string under the row key

Read/ Scan Benchmarks Random from memory Scan Similar to random read Locality group marked in-memory Reads from tablet server memory Similar to sequential read Scans over all values in row range Uses Bigtable API for support Reduces number of RPC’s

Performance Evaluation: Figure 6

Single Tablet-Server Slowest: random reads Random reads from memory > random reads Random writes = sequential writes Sequential reads > random reads Scans > sequential reads

Scaling Increased number of tablet servers from 1 to 500 Performance does not increase linearly Drop from 1 to 50 servers Random reads were the worst with scaling

Real Applications Google Analytics Personalized Search Google Earth Google Finance Orkut (Google +) Writely (Google docs) www.fahad.com http://www.ubergizmo.com http://www.entrarnoorkut.net/ www.userlogos.org www.support.google.com www.google.play.com

Real Applications: Table 1

Real Applications: Table 2

Personalized search The user’s data goes in Bigtable Row name userid All user actions are stored Column family for type of action Replicated over several clusters

Google Earth Preprocessing table Stores imagery Data cleaned Entered into final table Rows named geographic segments Column families track sources of data

Google Analytics Raw click table Summery table Row for each end-user session Row name tuple (website’s name, time created) Summery table Various pre-defined summaries for the website

Lessons This type of system has vulnerabilities Memory/network corruption Problems with relied on systems Planned/unplanned matainice Understand how features will be used Have proper system level monitoring Simplicity

Conclusion Google’s distributed storage system Other Google applications Bigtable is scalable and efficent Google users found Bigtable to be easy to use and helpful

Future Work Support for secondary indicies Support for infrastructure for building cross-data-center replicated Bigtables with multiple master replicas Keep Bigtable working well and fixing bugs as they arise

Thanks/Questions?