Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.

Slides:



Advertisements
Similar presentations
Introduction to cloud computing
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon 10/22/2012 Fall.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Big Table Alon pluda.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber Bigtable: A Distributed.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Author: Murray Stokely Presenter: Pim van Pelt Distributed Computing at Google.
BigTable and Google File System
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By Daniel Peng and Frank Dabek Presented By Michael Over.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cloud Data Models Lecturer.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Large Scale Machine Translation Architectures Qin Gao.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CSCI5570 Large Scale Data Processing Systems
Bigtable A Distributed Storage System for Structured Data
Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD
Column-Based.
Bigtable: A Distributed Storage System for Structured Data
GFS and BigTable (Lecture 20, cs262a)
Data Management in the Cloud
CSE-291 (Cloud Computing) Fall 2016
Cloud Computing Storage Systems
A Distributed Storage System for Structured Data
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Webtable Example How many web pages are there? Recently Google reported finding 1 trillion unique URLs, which would require 80 terabytes to store. How much storage is required to hold a single snapshot of the Web? 1 trillion web pages at 100K bytes per page requires 100 petabytes How is the data stored in the Bigtable? 2013/4/1An Example2

Introduction Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth. Bigtable has achieved several goals: wide applicability, scalability, high performance, and high availability. 2013/4/1Introduction3

Data Model A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. 2013/4/1Data Model4

Rows The row keys in a table are arbitrary strings (currently up to 64KB in size, although bytes is a typical size for most of our users). Bigtable maintains data in lexicographic order by row key. The row range for a table is dynamically partitioned. Each row range is called a tablet, which is the unit of distribution and load balancing. Column Families Column keys are grouped into sets called column families, which form the basic unit of access control. A column key is named using the following syntax: family:qualier. Column family names must be printable, but qualiers may be arbitrary strings. Timestamps Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp. Bigtable timestamps are 64-bit integers. They can be assigned by Bigtable, in which case they represent “real time” in microseconds, or be explicitly assigned by client 2013/4/1Data Model5

Webtable Example Rows In Webtable, pages in the same domain are grouped together into contiguous rows by reversing the hostname components of the URLs. For example, we store data for maps.google.com/index.html under the key com.google.maps/index.html. Storing pages from the same domain near each other makes some host and domain analyses more efficient. Column Families An example column family for the Webtable is language, which stores the language in which a web page was written. We use only one column key in the language family, and it stores each web page's language ID. Timestamps In our Webtable example, we set the timestamps of the crawled pages stored in the contents: column to the times at which these page versions were actually crawled. The garbage-collection mechanism lets us keep only the most recent several versions,which we specify, of every page. 2013/4/1An Example6

API The Bigtable API provides functions for creating and deleting tables and column families. It also provides functions for changing cluster, table, and column family metadata, such as access control rights. 2013/4/1API7

API Bigtable supports single-row transactions, which can be used to perform atomic read-modify-write sequences on data stored under a single row key. Bigtable allows cells to be used as integer counters. Bigtable supports the execution of client-supplied scripts in the address spaces of the servers. The scripts are written in a language developed at Google for processing data called Sawzall. Bigtable can be used with MapReduce, a framework for running large-scale parallel computations developed at Google. 2013/4/1API8

Building Blocks Bigtable is built on several other pieces of Google infrastructure. Bigtable uses the distributed Google File System (GFS) to store log and data files. The Google SSTable file format is used internally to store Bigtable data. An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Bigtable relies on a highly-available and persistent distributed lock service called Chubby. Bigtable uses Chubby for a variety of tasks: to ensure that there is at most one active master at any time; to store the bootstrap location of Bigtable data; to discover tablet servers and nalize tablet server deaths; to store Bigtable schema information (the column family information for each table); and to store access control lists. 2013/4/1Building Blocks9

Implementation The Bigtable implementation has three major components: a library that is linked into every client, one master server, and many tablet servers. The master is responsible for assigning tablets to tablet servers, detecting the addition and expiration of tablet servers, balancing tablet-server load, and garbage collection of files in GFS. Each tablet server manages a set of tablets. The tablet server handles read and write requests to the tablets that it has loaded, and also splits tablets that have grown too large. As with many single-master distributed storage systems, client data does not move through the master: clients communicate directly with tablet servers for reads and writes. 2013/4/1Implementation10

Implementation Tablet Location Using a three-level hierarchy analogous to that of a B+- tree to store tablet location information. Tablet Assignment Each tablet is assigned to one tablet server at a time. Bigtable uses Chubby to keep track of tablet servers. 2013/4/1Implementation11

Implementation Tablet Serving Compactions As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS. A merging compaction that rewrites all SSTables into exactly one SSTable. 2013/4/1Implementation12

Refinements Locality groups Clients can group multiple column families together into a locality group. A separate SSTable is generated for each locality group in each tablet. Segregating column families that are not typically accessed together into separate locality groups enables more effcient reads. Compression Clients can control whether or not the SSTables for a locality group are compressed, and if so, which compression format is used. The user-specified compression format is applied to each SSTable block. 2013/4/1Refinements13

Refinements Caching for read performance To improve read performance, tablet servers use two levels of caching. The Scan Cache is a higher-level cache that caches the key-value pairs returned by the SSTable interface to the tablet server code. The Block Cache is a lower-level cache that caches SSTables blocks that were read from GFS. Bloom filters A Bloom filter allows us to ask whether an SSTable might contain any data for a specified row/column pair. For certain applications, a small amount of tablet server memory used for storing Bloom filters drastically reduces the number of disk seeks required for read operations. Commit-log implementation Using one log provides significant performance benefits during normal operation, but it complicates recovery. 2013/4/1Refinements14

Refinements Speeding up tablet recovery If the master moves a tablet from one tablet server to another, the source tablet server first does a minor compaction on that tablet. After finishing this compaction, the tablet server stops serving the tablet. Before it actually unloads the tablet, the tablet server does another(usually very fast) minor compaction to eliminate any remaining uncompacted state in the tablet server's log that arrived while the first minor compaction was being performed. After this second minor compaction is complete, the tablet can be loaded on another tablet server without requiring any recovery of log entries. Exploiting immutability Besides the SSTable caches, various other parts of the Bigtable system have been simplified by the fact that all of the SSTables that we generate are immutable. 2013/4/1Refinements15

Pros Introduce the structure and function of Bigtable comprehensively. Discuss how Bigtable face to different requirements. Introduce the experience during the process of designing Bigtable. 2013/4/1Pros16

Cons According to professor Eric Brewer’s CAP theory, consistency, availability and partition tolerance cannot be met by a distributed system at the same time. As a typical AP database, consistency, its weakness, is not discussed in this paper. 2013/4/1Cons17