1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.

Slides:



Advertisements
Similar presentations
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Advertisements

Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
NoSQL Databases: MongoDB vs Cassandra
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Web Data Management Raghu Ramakrishnan Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy.
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Ymir Vigfusson Adam Silberstein Brian Cooper Rodrigo Fonseca.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Wide-area cooperative storage with CFS
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
MapReduce.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
L/O/G/O 云端的小飞象系列报告之二 Cloud 组. L/O/G/O Hadoop in SIGMOD
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Serverless Network File Systems Overview by Joseph Thompson.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Ceph: A Scalable, High-Performance Distributed File System
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #MongoDB Introduction to Sharding.
Linux Operations and Administration
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Cloudera Kudu Introduction
Bigtable: A Distributed Storage System for Structured Data
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Bigtable A Distributed Storage System for Structured Data.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Adam Silberstein James Cheng CSE, CUHK.
Web-Scale Data Serving with PNUTS
Real-time analytics using Kudu at petabyte scale
Dr.S.Sridhar, Director, RVCET, RVCE, Bangalore
Parallel Databases.
CSE-291 Cloud Computing, Fall 2016 Kesden
CSE-291 (Cloud Computing) Fall 2016
PNUTS: Yahoo!’s Hosted Data Serving Platform
Google Filesystem Some slides taken from Alan Sussman.
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
Building a Database on S3
AWS Cloud Computing Masaki.
Presentation transcript:

1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research

2 Outline PNUTS Architecture Recent Developments –New features –New challenges Adoption at Yahoo!

3 Yahoo! Cloud Data Systems Scan oriented workloads Focus on Sequential disk I/O CRUD Point lookups and short scans Index organized table and random I/Os Object retrieval and streaming Scalable file storage Yahoo! Cloud Hadoop Large Data Analysis PNUTS Structured Record Storage MobStor Large Blob Storage

4 What is PNUTS? CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Structured, flexible schema Hosted, managed infrastructure Key142342E Key242521W Key366354W Key412352E Key575656C Key615677E Geographic replication Key142342E Key242521W Key366354W Key412352E Key575656C Key615677E Key142342E Key242521W Key366354W Key412352E Key575656C Key615677E

5 PNUTS Design Features Simplicity Scalability via commodity servers Elasticity: add capacity with growth APIs: key lookup or range scan Global Access Asynchronous Replication across data centers Low Latency local access Consistency: Timeline, Eventual Operability Resilience and automatic recovery Automatic load balancing Single multi-tenant hosted service 5

6 Distributed Hash Table Primary KeyRecord Grape{"liquid" : "wine"} Lime{"color" : "green"} Apple{"quote" : "Apple a day keeps the …"} Strawberry{"spread" : "jam"} Orange{"color" : "orange"} Avocado{"spread" : "guacamole"} Lemon{"expression" : "expensive crap"} Tomato{"classification" : "yes… fruit"} Banana{"expression" : "goes bananas"} Kiwi{"expression" : "New Zealand"} 0x0000 0x911F 0x2AF3 Tablet

7 Distributed Ordered Table Primary KeyRecord Apple{"quote" : "Apple a day keeps the …"} Avocado{"spread" : "guacamole"} Banana{"expression" : "goes bananas"} Grape{"liquid" : "wine"} Kiwi{"expression" : "New Zealand"} Lemon{"expression" : "expensive crap"} Lime{"color" : "green"} Orange{"color" : "orange"} Strawberry{"spread" : "jam"} Tomato{"classification" : "yes… fruit"} Tablet clustered by key range

8 PNUTS-Single Region Maintains map from database.table.key to tablet to storage-unit Routes client requests to correct storage unit Caches the maps from the tablet controller Routes client requests to correct storage unit Caches the maps from the tablet controller Stores records Services get/set/delete requests Stores records Services get/set/delete requests 8

9 Tablet Splitting & Balancing Each storage unit has many tablets (horizontal partitions of the table) Tablets may grow over time Overfull tablets split Storage unit may become a hotspot Shed load by moving tablets to other servers 9

10 PNUTS Multi-Region

11 Asynchronous Replication

12 Consistency Options Eventual Consistency o Low latency updates and inserts done locally Record Timeline Consistency o Each record is assigned a “master region” o Inserts succeed, but updates could fail during outages* Primary Key Constraint + Record Timeline o Each tablet and record is assigned a “master region” o Inserts and updates could fail during outages* Availability Consistency

13 Record Timeline Consistency Transactions: Alice changes status from “Sleeping” to “Awake” Alice changes location from “Home” to “Work” (Alice, Home, Sleeping)(Alice, Home, Awake) Region 1 (Alice, Home, Sleeping)(Alice, Work, Awake) Region 2 Awake Work (Alice, Work, Awake) Work (Alice, Work, Awake) No replica should see record as (Alice, Work, Sleeping )

14 Eventual Consistency Timeline consistency comes at a price –Writes not originating in record master region forward to master and have longer latency –When master region down, record is unavailable for write We added eventual consistency mode –On conflict, latest write per field wins –Target customers Those that externally guarantee no conflicts Those that understand/can cope

15 Outline PNUTS Architecture Recent Developments –New features –New challenges Adoption at Yahoo!

16 Ordered Table Challenges MIN I S MAX apple carrot tomato banana avocado lemon MIN B L MAX Carefully choose initial tablet boundaries Sample input keys Same goes for any big load Pre-split and move tablets if needed

17 Ordered Table Challenges Dealing with skewed workloads –Tablet split, tablet moves Initially operator driven Now driven by Yak load balancer Yak –Collect storage unit stats –Issue move, split requests –Be conservative, make sure loads are here to stay! Moves are expensive Splits not reversible

18 Notifications Many customers want a stream of updates made to their tables Update external indexes, e.g., Lucene-style index Maintain cache Dump as logs into Hadoop Under the covers, notification stream is actually our pub/sub replication layer, Tribble client pnuts not. client client index, logs, etc.

19 Materialized Views KeyValue item123type=bike, price=100 item456type=toaster, price=20 item789type=bike, price=200 Does not efficiently support list all bikes for sale! KeyValue bike_item123price=100 bike_item789price=200 toaster_item456price=20 Async updates via pub/sub layer Adding/deleting item triggers add/delete on index Updating item type trigger delete and add on index Get bikes for sale with prefix scan: bike* Index on type! Items

20 Bulk Operations HDFS 1) User click history logs stored in HDFS 2) Hadoop job builds models of user preferences 4) Models read from PNUTS help decide users’ frontpage content Candidate content 3) Hadoop reduce writes models to PNUTS user table PNUTS

21 PNUTS-Hadoop Reading from PNUTS Hadoop Tasks scan(0x2-0x4 ) scan(0xa-0xc ) scan(0x8-0xa ) scan(0x0-0x2 ) scan(0xc-0xe ) Map PNUTS 1.Split PNUTS table into ranges 2.Each Hadoop task assigned a range 3.Task uses PNUTS scan API to retrieve records in range 4.Task feeds scan results and feeds records to map function Record Reader Writing to PNUTS Map or Reduce Hadoop Tasks PNUTS Router set 1. Call PNUTS set to write output set

22 Bulk w/Snapshot Snapshot daemons Per-tablet snapshot files PNUTS tablet map Hadoop tasks PNUTS Storage units Send map to tasks Tasks write output to snapshot files Sender daemons send snapshots to PNUTS Receiver daemons load snapshots into PNUTS foo

23 Selective Replication PNUTS replicates at the table-level, potentially among 10+ data centers –Some records only read in 1 or a few data centers –Legal reasons prevent us from replicating user data except where created –Tables are global, records may be local! Storing unneeded replicas wastes disk Maintaining unneeded replicas wastes network capacity

24 Selective Replication Static –Per-record constraints –Client sets mandatory, disallowed regions Dynamic –Create replicas in regions where record is read –Evict replicas from regions where record not read –Lease-based When a replica read, guaranteed to survive for a time period Eviction lazy; when lease expires, replica deleted on next write –Maintains minimum replication levels –Respects explicit constraints

25 Outline PNUTS Architecture Recent Developments –New features –New challenges Adoption at Yahoo!

26 PNUTS in production Over 100 Yahoo! applications/platforms on PNUTS –Movies, Travel, Answers –Over 450 tables, 50K tablets Growth, past 18 months –10s to 1000s of storage servers –Less than 5 data centers to over 15

27 Customer Experience PNUTS is a hosted service –Customers don’t install –Customers usually don’t wait for hardware requests Customer interaction –Architects and dev mailing list help with design –Ticketing to get tables –Latency SLA and REST API Ticketing ensured PNUTS stays sufficiently provisioned for all customers –We check on intended use, expected load, etc.

28 Sandbox Self-provisioned system for getting test PNUTS tables Start using REST API in minutes No SLA –Just running on a few storage servers, shared among many clients No replication –Don’t put production data here!

29 Thanks! Adam Silberstein Further Reading –System Overview: VLDB 2008 –Pre-planning for big loads: SIGMOD 2008 –Materialized views: SIGMOD 2009 –PNUTS-Hadoop: SIGMOD 2011 –Selective replication: VLDB 2011 –YCSB: SOCC 2010https://github.com/brianfrankcooper/YCSB/