Spanner: Google’s Globally-Distributed Database By - James C

Slides:



Advertisements
Similar presentations
Chen Zhang Hans De Sterck University of Waterloo
Advertisements

Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
ISV Partner Alliance Value Settings Management User State Virtualization for Microsoft® System Center.
Spanner: Google’s Globally-Distributed Database James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat,
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
SPANNER: GOOGLE’S GLOBALLYDISTRIBUTED DATABASE James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat,
Presented By Alon Adler – Based on OSDI ’12 (USENIX Association)
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#26: Database Systems.
Chapter 13 (Web): Distributed Databases
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Managing Data in the Cloud
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
Spanner Lixin Shi Quiz2 Review (Some slides from Spanner’s OSDI presentation)
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Manage & Configure SQL Database on the Cloud Haishi Bai Technical Evangelist Microsoft.
Enterprise Object Framework. What is EOF? Enterprise Objects Framework is a set of tools and resources that help you create applications that work with.
Spanner: Google’s Globally-Distributed Database James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman,Sanjay Ghemawat,
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Cube Enterprise Database Solution presented to MTF GIS Committee presented by Minhua Wang Citilabs, Inc. November 20, 2008.
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
2/1/00 Porcupine: a highly scalable service Authors: Y. Saito, B. N. Bershad and H. M. Levy This presentation by: Pratik Mukhopadhyay CSE 291 Presentation.
CSC 536 Lecture 10. Outline Case study Google Spanner Consensus, revisited Raft Consensus Algorithm.
LOGO Discussion Zhang Gang 2012/11/8. Discussion Progress on HBase 1 Cassandra or HBase 2.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M.
Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson,Jean Michel L´eon, Yawei Li, Alexander Lloyd, Vadim Yushprakh Megastore.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Large Scale Machine Translation Architectures Qin Gao.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin,
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 35 – Media Server (Part 4) Klara Nahrstedt Spring 2014.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Spanner Steve Ko Computer Sciences and Engineering University at Buffalo.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
Storage systems: File Systems
Bigtable A Distributed Storage System for Structured Data
Data Platform and Analytics Foundational Training
Cloud Scale Storage Systems Yunhao Zhang & Matthew Gharrity
Spanner: Google’s Globally Distributed Database
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
CSE-291 (Cloud Computing) Fall 2016
NOSQL.
Spanner: Google’s Globally-Distributed Database
Distributed Transactions and Spanner
EECS 498 Introduction to Distributed Systems Fall 2017
Spanner: Google’s Globally-Distributed Database
Arrested by the CAP Handling Data in Distributed Systems
Introduction to Apache
AWS Cloud Computing Masaki.
Cloud scale storage: The Google File system
Cloud Computing Storage Systems
COS 418: Distributed Systems Lecture 16 Wyatt Lloyd
A Distributed Storage System for Structured Data
Presentation transcript:

Spanner: Google’s Globally-Distributed Database By - James C Spanner: Google’s Globally-Distributed Database By - James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford Published in Proceedings of OSDI 2012 Speaker-Mugdha Goel

What is Spanner? It is a system to distribute data at a global scale and support externally consistent distributed transactions. With Spanner, Google can offer a web service to a worldwide audience, but still ensure that something happening on the service in one part of the world doesn’t contradict what’s happening in another.  It automatically migrates data across machines and data centers to balance load in case of failures. Scalable, Multi-version, and synchronously-replicated database. Yes, we had “NoSQL” databases capable of storing information across multiple data centers, but they couldn’t do so while keeping that information “consistent” — meaning that someone looking at the data on one side of the world sees the same thing as someone on the other side.

Need and Evolution Spanner was made for high availability. Need for consistent data across the globe. Read and write without being crushed by huge latencies. Having data located as per the Clients need. Evolution Spanner has evolved from a Bigtable-like versioned key-value store into a temporal multi-version database. Spanner is the successor to Google's Megastore system. Data is stored in Semi-relational tables and handles faster read-writes. Google's 'F1' advertising backend is using Spanner. Gmail, Picasa, Google Calendar, the Android Market and its AppEngine cloud all use Megastore, making them potential candidates for a Spanner upgrade.

Features The replication configurations for data can be dynamically controlled at a fine grain level by applications and this process is transparent. Applications can control the location of data. Provides externally Consistent Read-writes. Provides globally consistent reads across the database at a timestamp. Provides an implementation of the TrueTime API.

Implementation

Spanserver Software Stack BigTable based implementation has following mapping: (key:string, timestamp:int64) -> string

Spanner’s Data Model Need for Schematized Semi-relational tables and synchronous replication - Megastore (replication,despite its relatively poor write throughput). A semi-relational data model which provides support for synchronous Need for an SQL like query language – Dremel(an interactive data analysis tool) 2 phase commit had availability problems in Bigtable. Paxos mitigates the problems. Underneath uses a Distributed File system known as Colossus. Structure An application creates one or more databases in a universe. Each database can contain an unlimited number of schematized tables. Every table has an ordered set of one or more primary-key columns. Primary keys form the name for a row. Table defines a mapping from the primary-key columns to the non-primary-key columns. Simpler than Bigtable. Provides the support for SQL like query language. Data model is layered on top of the directory-bucketed key-value mappings supported by the implementation. An application creates one or more databases in a universe. Each database can contain an unlimited number of schematized tables. Tables look like relational-database tables, with rows, columns, and versioned values. Every table has an ordered set of one or more primary-key columns - This requirement is where Spanner still looks like a key-value store

Example CREATE TABLE Users { } PRIMARY KEY (uid), DIRECTORY; uid INT64 NOT NULL, email STRING } PRIMARY KEY (uid), DIRECTORY; CREATE TABLE Albums { uid INT64 NOT NULL, aid INT64 NOT NULL, name STRING } PRIMARY KEY (uid, aid), INTERLEAVE IN PARENT Users ON DELETE CASCADE;

TrueTime Method Returns TT.now() TTinterval: [earliest; latest] TT.after(t) True if t has definitely passed TT.before(t) True if t has definitely not arrived TTinterval - Interval with bounded time uncertainty(Endpoints are of TTstamp type). tabs(e) – denotes absolute time for an event e. for an invocation enow, tt = TT.now(), tt.earliest <= tabs(enow) <= tt.latest

Concurrency Control Spanner supports: Read only transactions – - Predeclared as not having any writes. - Not simply a read-write transaction without any writes. - Reads execute at a system-chosen timestamp without locking, so that incoming writes are not blocked. Read-Write transactions - Writes that occur in a transaction are buffered at the client until commit. Snapshot reads - Client chooses a timestamp for the read - OR Client specifies an upperbound on the timestamp’s staleness. A read-only transaction must be predeclared as not having any writes; it is not simply a read-write transaction without any writes. Reads in a read-only transaction execute at a system-chosen timestamp without locking, so that incoming writes are not blocked. Can proceed on any replica that up to date. Write Like Bigtable, writes that occur in a transaction are buffered at the client until commit. As a result, reads in a transaction do not see the effects of the transaction’s writes. This design works well in Spanner because a read returns the timestamps of any data read, and uncommitted writes have not yet been assigned timestamps. Snapshots reads A client can either specify a timestamp for a snapshot read, or provide an upper bound on the desired timestamp’s staleness and let Spanner choose a timestamp. In either case, the execution of a snapshot read proceeds at any replica that is sufficiently up-to-date.

Evaluations Microbenchmarks Availaibility Two-phase commit scalability. Mean and standard deviations over 10 runs. Effect of killing servers on throughput

TrueTime F1 F1-perceived operation latencies Distribution of TrueTime values

F1’s transition F1’s backend was originally based on MYSQL. Disadvantages of using MYSQL It manually Sharded data. The MySQL sharding scheme assigned each customer and all related data to a fixed shard. Extremely costly with increasing customers. Resharding was a very complex process which took about 2 years. Advantages of Spanner Spanner removes the need to manually reshard. Spanner provides synchronous replication and automatic failover. With MySQL master-slave replication, failover was difficult, and risked data loss and downtime. F1 Requires strong transactional semantics.

Future Work Google’s advertising backend was transitioned from MySQL to Spanner. Currently working on Spanner’s schema language, automatic maintenance of secondary indices, and automatic load-based resharding. In Future, Optimistically doing reads in parallel. Plans to support direct changes to paxos configurations. Improve single node performance by improving algorithms and data structures. Moving data automatically between datacenters in response to changes in client load.