Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

Slides:



Advertisements
Similar presentations
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Big Table Alon pluda.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
CS5412: DIVING IN: INSIDE THE DATA CENTER Ken Birman 1 CS5412 Spring 2014 (Cloud Computing: Birman) Lecture V.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
CS5412: OTHER DATA CENTER SERVICES Ken Birman 1 CS5412 Spring 2012 (Cloud Computing: Birman) Lecture IX.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber Bigtable: A Distributed.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable A System for Distributed Structured Storage Yanen Li Department of Computer Science University of Illinois at Urbana-Champaign
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Author: Murray Stokely Presenter: Pim van Pelt Distributed Computing at Google.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Hypertable Doug Judd Background  Zvents plan is to become the “Google” of local search  Identified the need for a scalable DB 
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
MapReduce How to painlessly process terabytes of data.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Large Scale Machine Translation Architectures Qin Gao.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Big Data Infrastructure Week 10: Mutable State (1/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
CSCI5570 Large Scale Data Processing Systems
Bigtable A Distributed Storage System for Structured Data
Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD
Bigtable: A Distributed Storage System for Structured Data
How did it start? • At Google • • • • Lots of semi structured data
GFS and BigTable (Lecture 20, cs262a)
Data Management in the Cloud
CSE-291 (Cloud Computing) Fall 2016
Data-Intensive Distributed Computing
Cloud Computing Storage Systems
A Distributed Storage System for Structured Data
John Kubiatowicz (with slides from Ion Stoica and Ali Ghodsi)
Presentation transcript:

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. Appeared in OSDI’06 Slides adapted from a reading group presentation by Erik Paulson, U. Washington October 2006

2 Google Scale Lots of data –Copies of the web, satellite data, user data, and USENET, Subversion backing store Many incoming requests No commercial system big enough –Couldn’t afford it if there was one –Might not have made appropriate design choices Firm believers in the End-to-End argument 450,000 machines (NYTimes estimate, June’06)

3 Building Blocks Scheduler (Google WorkQueue) Google File System Chubby Lock service Two other pieces helpful but not required –Sawzall –MapReduce BigTable: Build a more application-friendly storage service using these parts

4 Google File System Large-scale distributed “file system” Master: responsible for metadata Chunk servers: responsible for reading and writing large chunks of data Chunks replicated on 3 machines, master responsible for ensuring replicas exist OSDI ’04 Paper

5 Chubby {lock/file/name} service Coarse-grained locks, can store small amount of data in a lock 5 replicas, need a majority vote to be active Also an OSDI ’06 Paper

6 Data model: a big map triple for key Each value is an uninterpreted array of bytes Arbitrary “columns” on a row-by-row basis Column family:qualifier. Family is heavyweight, qualifier lightweight Column-oriented physical store- rows are sparse! Lookup, insert, delete API Each read or write of data under a single row key is atomic

7 BigTable vs. Relational DB No table-wide integrity constraints No multi-row transactions Uninterpreted values: No aggregation over data Immutable data ala Versioning DBs –Can specify: keep last N versions or last N days C++ functions, not SQL (no complex queries) Clients indicate what data to cache in memory Data stored lexicographically sorted: Client control locality by the naming of rows & columns

8 SSTable Immutable, sorted file of key-value pairs Chunks of data plus an index –Index is of block ranges, not values –Index loaded into memory when SSTable is opened –Lookup is a single disk seek Alternatively, client can load SSTable into mem Index 64K block SSTable

9 Tablet Contains some range of rows of the table Unit of distribution & load balance Built out of multiple SSTables Index 64K block SSTable Index 64K block SSTable Tablet Start:aardvarkEnd:apple

10 Table Multiple tablets make up the table SSTables can be shared Tablets do not overlap, SSTables can overlap SSTable Tablet aardvark apple Tablet apple_two_E boat

11 Finding a tablet Client library caches tablet locations Metadata table includes log of all events pertaining to each tablet

12 Servers Tablet servers manage tablets, multiple tablets per server. Each tablet is MBs –Each tablet lives at only one server –Tablet server splits tablets that get too big Master responsible for load balancing and fault tolerance –Use Chubby to monitor health of tablet servers, restart failed servers –GFS replicates data. Prefer to start tablet server on same machine that the data is already at

13 Editing/Reading a table Mutations are committed to a commit log (in GFS) Then applied to an in-memory version (memtable) For concurrency, each memtable row is copy-on-write Reads applied to merged view of SSTables & memtable Reads & writes continue during tablet split or merge SSTable (sorted) SSTable (sorted) Tablet apple_two_E boat Insert Delete Insert Delete Insert Memtable (sorted)

14 Compactions Minor compaction – convert a full memtable into an SSTable, and start a new memtable –Reduce memory usage –Reduce log traffic on restart Merging compaction –Reduce number of SSTables –Good place to apply policy “keep only N versions” Major compaction –Merging compaction that results in only one SSTable –No deletion records, only live data

15 Locality Groups Group column families together into an SSTable –Avoid mingling data, i.e. page contents and page metadata –Can keep some locality groups all in memory Can compress locality groups (10:1 typical) –Popular compression scheme: (1) long common strings across a large window, (2) standard compression across small 16KB windows Bloom Filters on locality groups – avoid searching SSTable

16 Microbenchmarks: Throughput 1000-byte values per server per second 1786 machines, with two dual-core Operton 2 GHz chips, large physical mem, two 400 GB IDE hard drives each, Gb Ethernet LAN Tablet server: 1 GB mem, # clients = # servers

17 Aggregate rate

18 Application at Google

19 Lessons learned Only implement some of the requirements, since the last is probably not needed Many types of failure possible Big systems need proper systems-level monitoring Value simple designs