PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

Chen Zhang Hans De Sterck University of Waterloo
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Managing Data in the Cloud
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
September 24, 2007The 3 rd CSAIL Student Workshop Byzantine Fault Tolerant Cooperative Caching Raluca Ada Popa, James Cowling, Barbara Liskov Summer UROP.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Distributed Databases
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON,
IBM Almaden Research Center © 2011 IBM Corporation 1 Spinnaker Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao Eugene.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Geo-distributed Messaging with RabbitMQ
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
Bigtable: A Distributed Storage System for Structured Data
CSci8211: Distributed System Techniques & Case Studies: I 1 Detour: Distributed Systems Techniques & Case Studies I  Distributing (Logically) Centralized.
Zookeeper Wait-Free Coordination for Internet-Scale Systems.
1 Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears Yahoo! Research.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Bigtable A Distributed Storage System for Structured Data.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Adam Silberstein James Cheng CSE, CUHK.
Web-Scale Data Serving with PNUTS
CS 405G: Introduction to Database Systems
Dr.S.Sridhar, Director, RVCET, RVCE, Bangalore
NOSQL.
PNUTS: Yahoo!’s Hosted Data Serving Platform
PNUTS: Yahoo!’s Hosted Data Serving Platform
Chapter 19: Distributed Databases
NOSQL databases and Big Data Storage Systems
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Massively Parallel Cloud Data Storage Systems
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
Chapter 21: Parallel and Distributed Storage
Chapter 21: Parallel and Distributed Storage
Presentation transcript:

PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Research Mina Farid University of Waterloo CS 848 Presentation 8 February 2010

Outline Mina Farid2  Motivation  Data and Query Model  Consistency  System Architecture  Applications  Experiments

Motivation Mina Farid3  Scalability  Response Time (SLAs)  High Availability and Fault Tolerance  Relaxed Consistency Guarantees  Serializable Transactions  Eventual Consistency: update any replica, all updates are propagated to all replicas, but potentially in different orders

Data and Query Model Mina Farid4  Simplified Relational Data Model (tables, records, attributes)  Flexible schemas  Query: Selection and Projection from a single table. Specific applications  Scans a few records  No ad-hoc queries  Support for hashed and ordered tables

Consistency Mina Farid5  In between  One record updates  Per-record timeline consistency: replicas of a record apply updates in the same order  For one version, all replicas contain the same information General SerializabilityEventual Consistency

Consistency (cont’d) Mina Farid6  Master replica for each record.  Updates are forwarded to this master replica  Master record carries the version info  API calls - Consistency Read-any Read-critical(required_version) Read-latest Write Test-and-set-write(required_version)

System Architecture Mina Farid7 Tablet Controlle r Storage Unit 1 Storage Unit 2 Storage Unit N Routers Message Broker.... Region T1SU1 T2SU2 T3SU3 T4SU1

System Architecture – Data Storage and Retrieval Mina Farid8  Regions with full complement of system and data  Tables are partitioned into tablets  Tablet is just a group of records of a certain table  Tablets are stored on storage units servers  Storage units respond to: get() scan() set()

Tablet 1Tablet 2Tablet 3Tablet 4 Routers’ Mapping – Ordered Table Mina Farid9  Routers decide:  Which tablets contain which records  Which SU holds which tablets Banana.. Grape.. Lemon.. MAX_STRING MIN_STRING.. T1SU1 T2SU2 T3SU3 T4SU1 MINT1 BananaT2 GrapeT3 LemonT4

System Architecture Mina Farid10 Tablet Controlle r Storage Unit 1 Storage Unit 2 Storage Unit N Routers Message Broker.... Region T1SU1 T2SU2 T3SU3 T4SU1 MINT1 BananaT2 GrapeT3 LemonT4 MINT1 BananaT2 GrapeT3 LemonT4 T1SU1 T2SU2 T3SU3 T4SU1

System Architecture Mina Farid11 Tablet Controller Routers Message Broker Tablet Controller Routers Message Broker Storage Units Region 1Region 2 T1SU 1 T2SU 2 T3SU 3 T4SU 1 T1SU 1 T2SU 2 T3SU 3 T4SU 1 MINT1 BananaT2 GrapeT3 LemonT4 T1SU1 T2SU2 T3SU3 T4SU1 MINT1 BananaT2 GrapeT3 LemonT4 T1SU1 T2SU2 T3SU3 T4SU1

System Architecture – Replication and Consistency Mina Farid12 1- Yahoo! Message Broker  Reliable topic based publish/subscribe  Updates are asynchronously propagated to all replicas  Provides ‘Partial Ordering’:  Messages published to a particular YMB will be delivered to all subscribers in the same order.  Messages published to different YMBs may be delivered in any order  Solution: per-record mastership

System Architecture – Replication and Consistency Mina Farid13 2- Consistency and Record Mastership  One copy of a record as a master  Updates are forwarded to that master copy  Publish update (commit)  Different records in the same table can be mastered in different clusters  Who is the master record? How it is selected?  Each record carries meta-data information about the identity of the master record (changeable)  Record receiving most updates

Query Processing Mina Farid14  Multi-record querying  Scatter-gather engine (Router)  Split multi-record request to multiple single-record requests  Initiates parallel queries  Assemble and evaluate results, and send it back to the client  Handles range and scan queries (also supports top-k)

Applications Mina Farid15  User Databases Millions of records, frequent updates, important data, relaxed consistency  Social Application Flexible schemas, large number of small updates, no real-time requirements (relaxed consistency)  Content Meta-Data Manage structured metadata, scalable, consistent  Session Data Scalable storage to manage states, but low consistency required

Experiments Mina Farid16  Main criteria: Average Request Latency ( response time )  Experiment Setup  3 Regions (2 West, 1 East) 1- Inserting data 2- Varying Load 3- Varying number of Storage Units

Future Enhancements Mina Farid17 Includes adding the following features:  Indexing, Materialized Views  Bundled updates (atomic non-isolated updates for multiple records)

Conclusion Mina Farid18

Mina Farid19 Thank You! Questions?

Mina Farid20

Google BigTable Mina Farid21  Record-oriented access to very large tables  Does not support:  Geographic replication  Secondary indexes  Materialized views  Hash-organized tables

Dynamo Mina Farid22  Focuses on availability  Provides geographic replication via ‘gossip’ mechanism  Eventual consistency model does not suit all applications  “Updates are committed in different orders at different replicas”, then replicas are eventually reconciled (updates may roll back)  Does not support:  Ordered tables

Boxwood Mina Farid23  Provides B-tree implementation  The design favors consistency over scalability (tens of machines)