Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.

Slides:



Advertisements
Similar presentations
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon 10/22/2012 Fall.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Big Table Alon pluda.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
HBase Presented by Chintamani Siddeshwar Swathi Selvavinayakam
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber Bigtable: A Distributed.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By Daniel Peng and Frank Dabek Presented By Michael Over.
SaaS 傅汝緯 李碩元 林子驥 1. What is SaaS?  Definition :Software as a service  a software delivery model in which software and associated data are centrally.
Computer Science iBigTable: Practical Data Integrity for BigTable in Public Cloud CODASPY 2013 Wei Wei, Ting Yu, Rui Xue 1/40.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cloud Data Models Lecturer.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Data Model and Storage in NoSQL Systems (Bigtable, HBase) 1 Slides from Mohamed Eltabakh.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Big Data Infrastructure Week 10: Mutable State (1/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
and Big Data Storage Systems
CSCI5570 Large Scale Data Processing Systems
Bigtable A Distributed Storage System for Structured Data
Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD
Column-Based.
HBase Mohamed Eltabakh
Bigtable: A Distributed Storage System for Structured Data
GFS and BigTable (Lecture 20, cs262a)
Data Management in the Cloud
NoSQL Database and Application
CSE-291 (Cloud Computing) Fall 2016
NOSQL.
Gowtham Rajappan.
NOSQL databases and Big Data Storage Systems
Google and Cloud Computing
Data-Intensive Distributed Computing
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
Cloud Computing Storage Systems
A Distributed Storage System for Structured Data
Presentation transcript:

Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data

Why a new Data Model? Well, consider Google… – Bigtable is designed to reliably scale to petabytes of data and thousands of machines. Various Goals – wide applicability – Scalability – high performance – high availability 2

Applications Numerous Google products and projects – Google Analytics – Google Finance – Orkut – Personalized Search – Writely – Google Earth 3

The Goal… To provide clients with a simple data model that supports dynamic control over data layout and format and allows clients to reason about the locality properties of the data represented in the underlying storage. 4

The Structure Bigtable does not support a full relational data model – Data is indexed using row and column names that can be arbitrary strings – Treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings – The schema parameters let clients dynamically control whether to serve data out of memory or from disk 5

Data Model A Bigtable is a sparse, distributed, persistent multidimensional sorted map. – The map is indexed by a row key, column key, and a timestamp – Each value in the map is an uninterpreted array of bytes (row:string, column:string, time:int64)  string 6

Rows and Columns The row keys in a table are arbitrary strings – Bigtable maintains data in lexicographic order by row key – The row range for a table is dynamically partitioned – Each row range is called a tablet, which is the unit of distribution and load balancing. reads of short row ranges are efficient and typically require communication with only a small number of machines 7

Rows and Columns Column keys are grouped into sets called column families – These form the basic unit of access control – All data stored in a column family is usually of the same type – After a family has been created, any column key within the family can be used for queries A column key is named using the following syntax: family:qualifier 8

Data Model 9 The qualifier is the name of the referring site; the cell contents is the link text.

Timestamps Each cell in a Bigtable can contain multiple versions of the same data – Each version is indexed by a timestamp 10

Writing to Bigtable // Open the table Table *T = OpenOrDie("/bigtable/web/webtable"); // Write a new anchor and delete an old anchor RowMutation r1(T, "com.cnn.www"); r1.Set("anchor: "CNN"); r1.Delete("anchor: Operation op; Apply(&op, &r1); 11

Writing to Bigtable Scanner scanner(T); ScanStream *stream; stream = scanner.FetchColumnFamily("anchor"); stream->SetReturnAllVersions(); scanner.Lookup("com.cnn.www"); for (; !stream->Done(); stream->Next()) { printf("%s %s %lld %s\n", scanner.RowName(), stream->ColumnName(), stream->MicroTimestamp(), stream->Value()); } 12

Chubby Bigtable relies on a highly-available and persistent distributed lock service called Chubby – A Chubby service consists of five active replicas, one of which is elected to be the master and actively serve requests – Each Chubby client maintains a session with a Chubby service and each directory or file can be used as a lock 13

Chubby Chubby Applications: – To ensure that there is at most one active master at any time – To store the bootstrap location of Bigtable data To discover tablet servers and finalize tablet server deaths To store Bigtable schema information (the column family information for each table) To store access control lists. 14

Implementation The Bigtable implementation has three major components: – A library that is linked into every client – One master server – Many tablet servers 15

Tablet Location Management 16

Conclusions Users like… – the performance and high availability provided by the Bigtable implementation – that they can scale the capacity of their clusters by simply adding more machines to the system as their resource demands change over time – There are significant advantages to building a custom storage solution Challenges… – User adoption and acceptance of a new interface – Implementation issues 17