Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD

Slides:



Advertisements
Similar presentations
Introduction to cloud computing
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Big Table Alon pluda.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Lecture 6. NoSQL and Bigtable
CSCI5570 Large Scale Data Processing Systems
Lecture 8: BigTable and Dynamo
Bigtable A Distributed Storage System for Structured Data
HBase Mohamed Eltabakh
Software Systems Development
Bigtable: A Distributed Storage System for Structured Data
How did it start? • At Google • • • • Lots of semi structured data
GFS and BigTable (Lecture 20, cs262a)
Data Management in the Cloud
CSE-291 (Cloud Computing) Fall 2016
NOSQL.
NOSQL databases and Big Data Storage Systems
Google and Cloud Computing
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
Introduction to Apache
Cloud Computing Storage Systems
A Distributed Storage System for Structured Data
THE GOOGLE FILE SYSTEM.
John Kubiatowicz (with slides from Ion Stoica and Ali Ghodsi)
Presentation transcript:

Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD COSC6376 Cloud Computing Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston

Outline Hadoop Bigtable Hbase

Projects

Sample Projects Support video processing using HDFS and Mapreduce Image processing using cloud Security services using cloud Web analytics using cloud Cloud based MPI Novel applications of cloud based storage New pricing model Cyber physical system with cloud as the backend Bioinformatics using Mapreduce

Hadoop DFS (HDFS) http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html Mimic GFS Same assumptions Highly similar design Different names: Master  namenode Chunkserver datanode Chunk  block Operation log  EditLog

Working with HDFS /usr/local/hadoop/ Installation bin/ : scripts for starting/stopping the system conf/ : configure files log/ : system log files Installation Single node: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ Cluster: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

In-Memory Accelerator for Hadoop

HDFS on different storage devices

PCM Emerging NVM technology that can replace Flash and DRAM Much higher density; much better scalability; can do multi-level cells Non-volatile, fast reads (~50ns), slow and energy-hungry writes; limited lifetime (~10 writes per cell), no leakage

Bigtable Fay Chang, et al @google.com

Global Picture

Why Bigtable? Performance of RDBMS system is good for transaction processing but for very large scale analytic processing, the solutions are commercial, expensive, and specialized. Very large scale analytic processing Big queries – typically range or table scans. Big databases (100s of TB)

Why Bigtable? (2) Map reduce on Bigtable with optionally Cascading on top to support some relational algebras may be a cost effective solution. Sharding is not a solution to scale open source RDBMS platforms Application specific Labor intensive (re)partitionaing

Bigtable BigTable is a distributed storage system for managing structured data. Designed to scale to a very large size Petabytes of data across thousands of servers Used for many Google projects Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … Flexible, high-performance solution for all of Google’s products

BigTable Distributed multi-level map Fault-tolerant, persistent Scalable Thousands of servers Terabytes of in-memory data Petabyte of disk-based data Millions of reads/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance Often want to examine data changes over time E.g. Contents of a web page over multiple crawls

Building Blocks Building blocks: BigTable uses of building blocks: Google File System (GFS): Raw storage Scheduler: schedules jobs onto machines Lock service: distributed lock manager MapReduce: simplified large-scale data processing BigTable uses of building blocks: GFS: stores persistent data (SSTable file format for storage of data) Scheduler: schedules jobs involved in BigTable serving Lock service: master election Map Reduce: often used to read/write BigTable data

Google File System Large-scale distributed “filesystem” Master: responsible for metadata Chunk servers: responsible for reading and writing large chunks of data Chunks replicated on 3 machines, master responsible for ensuring replicas exist

(row, column, timestamp) -> cell contents Basic Data Model A BigTable is a sparse, distributed persistent multi-dimensional sorted map (row, column, timestamp) -> cell contents Good match for most Google applications

WebTable Example Want to keep copy of a large collection of web pages and related information Use URLs as row keys Various aspects of web page as column names Store contents of web pages in the contents: column under the timestamps when they were fetched.

Rows Name is an arbitrary string Rows ordered lexicographically Access to data in a row is atomic Row creation is implicit upon storing data Rows ordered lexicographically Rows close together lexicographically usually on one or a small number of machines

Rows (cont.) Reads of short row ranges are efficient and typically require communication with a small number of machines. Can exploit this property by selecting row keys so they get good locality for data access. Example: math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu VS edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys

Columns Columns have two-level name structure: Column family family:optional_qualifier Column family Unit of access control Has associated type information Qualifier gives unbounded columns Additional levels of indexing, if desired

Timestamps Used to store different versions of data in a cell New writes default to current time, but timestamps for writes can also be set explicitly by clients Lookup options: “Return most recent K values” “Return all values in timestamp range (or all values)” Column families can be marked w/ attributes: “Only retain most recent K values in a cell” “Keep values until they are older than K seconds”

SSTable Immutable, sorted file of key-value pairs Chunks of data plus an index Index is of block ranges, not values SSTable 64K block 64K block 64K block Index

Tablet Contains some range of rows of the table Built out of multiple SSTables Tablet Start:aardvark End:apple SSTable SSTable 64K block 64K block 64K block 64K block 64K block 64K block Index Index

Table Multiple tablets make up the table SSTables can be shared Tablets do not overlap, SSTables can overlap Tablet Tablet aardvark apple apple_two_E boat SSTable SSTable SSTable SSTable

Architecture Client library Single master server Tablet servers

Bigtable Master Assigns tablets to tablet servers Detects addition and expiration of tablet servers Balances tablet server load. Tablets are distributed randomly on nodes of the cluster for load balancing. Handles garbage collection Handles schema changes

Bigtable Tablet Servers Each tablet server manages a set of tablets Typically between ten to a thousand tablets Each 100-200 MB by default Handles read and write requests to the tablets Splits tablets that have grown too large Master responsible for load balancing and fault tolerance Use Chubby to monitor health of tablet servers, restart failed servers

A 3-level Hierarchy 1st Level: A file stored in chubby contains location of the root tablet, i.e., a directory of ranges (tablets) and associated meta-data. The root tablet never splits. 2nd Level: Each meta-data tablet contains the location of a set of user tablets. 3rd Level: A set of SSTable identifiers for each tablet.

A 3-level Hierarchy Each meta-data row stores ~ 1KB of data, With 128 MB tablets, the three level store addresses 234 tablets (261 bytes in 128 MB tablets). Approaches a Zetabyte (million Petabytes).

Editing a Table Mutations are logged, then applied to an in-memory version Logfile stored in GFS Tablet Insert Memtable Insert Delete apple_two_E boat Insert Delete Insert SSTable SSTable

Chubby A persistent and distributed lock service. Consists of 5 active replicas, one replica is the master and serves requests. Service is functional when majority of the replicas are running and in communication with one another – when there is a quorum. Implements a nameservice that consists of directories and files.

Bigtable and Chubby Bigtable uses Chubby to: Ensure there is at most one active master at a time, Store the bootstrap location of Bigtable data (Root tablet), Discover tablet servers and finalize tablet server deaths, Store Bigtable schema information (column family information), Store access control list. If Chubby becomes unavailable for an extended period of time, Bigtable becomes unavailable.

Tablet Assignment Each tablet is assigned to one tablet server at a time. Master server keeps track of the set of live tablet servers and current assignments of tablets to servers. Also keeps track of unassigned tablets. When a tablet is unassigned, master assigns the tablet to an tablet server with sufficient room.

API Metadata operations Writes (atomic) Reads Create/delete tables, column families, change metadata Writes (atomic) Set(): write cells in a row DeleteCells(): delete cells in a row DeleteRow(): delete all cells in a row Reads Scanner: read arbitrary cells in a bigtable Each row read is atomic Can restrict returned rows to a particular range Can ask for just data from 1 row, all rows, etc. Can ask for all columns, just certain column families, or specific columns

API Examples: Write/Modify atomic row modification No support for (RDBMS-style) multi-row transactions

Return sets can be filtered using regular expressions: API Examples: Read Return sets can be filtered using regular expressions: anchor: com.cnn.*

Tablet Serving “Log Structured Merge Trees” Image Source: Chang et al., OSDI 2006

Tablet Representation append-only log on GFS SSTable on GFS write buffer in memory (random-access) write read Tablet SSTable: Immutable on-disk ordered map from stringstring String keys: <row, column, timestamp> triples

Client Write & Read Operations Write operation arrives at a tablet server: Server ensures the client has sufficient privileges for the write operation (Chubby), A log record is generated to the commit log file, Once the write commits, its contents are inserted into the memtable. Read operation arrives at a tablet server: Server ensures client has sufficient privileges for the read operation (Chubby), Read is performed on a merged view of (a) the SSTables that constitute the tablet, and (b) the memtable.

Write Operations As writes execute, size of memtable increases. Once memtable reaches a threshold: Memtable is frozen, A new memtable is created, Frozen metable is converted to an SSTable and written to GFS.

Compactions Minor compaction Merging compaction Major compaction Converts the memtable into an SSTable Reduces memory usage and log traffic on restart Merging compaction Reads the contents of a few SSTables and the memtable, and writes out a new SSTable Reduces number of SSTables Major compaction Merging compaction that results in only one SSTable No deletion records, only live data

Refinements: Locality Groups Can group multiple column families into a locality group Separate SSTable is created for each locality group in each tablet. Segregating columns families that are not typically accessed together enables more efficient reads. In WebTable, page metadata can be in one group and contents of the page in another group.

Refinements: Compression Many opportunities for compression Similar values in the same row/column at different timestamps Similar values in different columns Similar values across adjacent rows Two-pass custom compressions scheme First pass: compress long common strings across a large window Second pass: look for repetitions in small window Speed emphasized, but good space reduction (10-to-1)