Presentation is loading. Please wait.

Presentation is loading. Please wait.

PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON,

Similar presentations


Presentation on theme: "PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON,"— Presentation transcript:

1 PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON, H ANS -A RNO J ACOBSEN, N ICK P UZ, D ANIEL W EAVER AND R AMANA Y ERNENI Y AHOO ! R ESEARCH Presented by Team Silverlining- Rakesh Nair, Navya Sruti Sirugudi, Shantanu Sardal, Smruti Aski, Chandra Sekhar

2 D ISTRIBUTED D ATABASES – O VERVIEW Web applications need: Scalability And the ability to scale linearly Geographic scope High availability and fault tolerance Web applications typically have: Simplified query needs No joins, aggregations Relaxed consistency needs Applications can tolerate stale or reordered data 2

3 A GENDA Introduction PNUTS Features Architecture PNUTS applications Experimental Results Feature Enhancements Related Work

4 PNUTS A massive-scale hosted database system Focus on data serving for web applications Provides data storage organized as hashed or ordered tables Low latency for large numbers of concurrent requests Novel per-record consistency guarantees

5 W HAT IS PNUTS? 5 E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Geographic replication Indexes and views Structured, flexible schema Hosted, managed infrastructure A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E

6 F EATURES Data Model and Features Relational data model, scatter-gather operations, asynchronous notifications, bulk loading Fault Tolerance Employs redundancy, supports low-latency reads and writes even after failure Pub-Sub Message System Asynchronous operations carried out using YMB Record-level Mastering All high-latency operations are asynchronous Hosting Centrally managed database service shared by multiple applications

7 D ESIGN DECISIONS Record-level, asynchronous geographic replication Guaranteed message delivery service Consistency model which is not fully serialized Hashed and ordered table organizations, flexible schema Data management as a hosted service

8 SCALABILITY 8 Data-path components Storage units Routers Tablet controller REST API Clients Message Broker

9 R EPLICATION 9 Storage units Routers Tablet controller REST API Clients Local region Remote regions YMB

10 D ATA AND Q UERY M ODEL Data organized into tables of records with attributes Query language of PNUTS supports selection and projection from a single table. PNUTS allows application declare tables to be hashed or ordered.

11 Q UERY MODEL Per-record operations Get Set Delete Multi-record operations Multiget Scan Getrange Web service (RESTful) API 11

12 C ONSISTENCY MODEL Web applications typically manipulate one record at a time. Per-record timeline consistency Data in PNUTS is replicated across sites Each record contains Sequence number – #updates since the time of creation Version number – changes on each update on record Hidden field in each record stores which copy is the master copy updates can be submitted to any copy forwarded to master, applied in order received by master Record also contains origin of last few updates Mastership can be changed by current master, based on this information Mastership change is simply a record update

13 C ONSISTENCY MODEL Goal: make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key “Brian”? 13 Time Record inserted Update Delete Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Update

14 C ONSISTENCY MODEL (API S ) 14 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Current version Stale version Read

15 C ONSISTENCY MODEL (API S ) 15 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read up-to-date Current version Stale version

16 C ONSISTENCY MODEL (API S ) 16 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read ≥ v.6 Current version Stale version Read-critical(required version):

17 C ONSISTENCY MODEL (API S ) 17 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write Current version Stale version

18 C ONSISTENCY MODEL (API S ) 18 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale version Test-and-set-write(required version)

19 C ONSISTENCY MODEL (API S ) 19 Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale version Mechanism: per record mastership

20 S YSTEM A RCHITECTURE System divided into regions typically geographically distributed Each region contains a complete copy of each table Use pub/sub mechanism for reliability and replication (Yahoo Message Broker) Data tables are horizontally partitioned into groups of records called tablets. Each server might have hundreds or thousands of tablets.

21 T ABLET SPLITTING AND BALANCING 21 Each storage unit has many tablets (horizontal partitions of the table) Tablets may grow over time Overfull tablets split Storage unit may become a hotspot Shed load by moving tablets to other servers Storage unit Tablet

22 R EADING DATA Three components: Storage Unit (SU) Router Tablet Controller Each router contains interval mapping of each tablet boundry mapped to the SU containing the tablet. For ordered tables, the primary key space is divided into intervals. For hash tables, the hash space is divided into intervals for each tablet.

23 T ABLET C ONTROLLER Routers contain only a cached copy of the interval mapping. Mapping owned by tablet controller Routers get an update of the mapping from the tablet controller when a read request fails Simplifies router’s failure recovery

24 A CCESSING SINGLE RECORD 24 SU 1 Get key k 2 3 Record for key k 4

25 B ULK READ 25 SU Scatter/ gather server SU 1 {k 1, k 2, … k n } 2 Get k 1 Get k 2 Get k 3

26 R ANGE QUERIES MIN-CanteloupeSU1 Canteloupe-LimeSU3 Lime-StrawberrySU2 Strawberry-MAXSU1 26 Storage unit 1Storage unit 2Storage unit 3 Router Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Grapefruit…Pear? Grapefruit…Lime? Lime…Pear? SU1Strawberry-MAX SU2Lime-Strawberry SU3Canteloupe-Lime SU1MIN-Canteloupe

27 U PDATES 27 1 Write key k 2 7 Sequence # for key k 8 SU 3 Write key k 4 5 SUCCESS 6 Write key k Routers Message brokers

28 Y AHOO M ESSAGE B ROKER Distributed publish-subscribe service Guarantees delivery once a message is published Logging at site where message is published, and at other sites when received Guarantees messages published to a particular cluster will be delivered in same order at all other clusters Record updates are published to YMB by master copy (Record-level mastering) All replicas subscribe to the updates, and get them in same order for a particular record 28

29 A SYNCHRONOUS REPLICATION 29

30 O THER F EATURES Per record transactions Copying a tablet (on failure, for e.g.) Request copy Publish checkpoint message Get copy of tablet as of when checkpoint is received Apply later updates Tablet split Has to be coordinated across all copies 30

31 Q UERY P ROCESSING Range scan can span tablets done by scatter gather engine (in router) Only one tablet scanned at a time Client may not need all results at once Continuation object returned to client to indicate where range scan should continue Notification One pub-sub topic per tablet Client knows about tables, does not know about tablets Automatically subscribed to all tablets, even as tablets are added/removed. Usual problem with pub-sub: undelivered notifications, handled in usual way 31

32 PNUTS A PPLICATIONS User Database Millions of active Yahoo users – user profiles, IM buddy lists Record timeline - relaxed consistency Hosted DB – many apps sharing same data Social and Web 2.0 Apps Rapidly evolving and expanding – flexible schema Connections in a social graph – ordered table abstraction Content Metadata Bulk data – distributed FS, metadata – PNUTS Helps high performance operations like file creation, deletion, renaming Listings Management Comparison shopping (sorted by price, rating, etc) Ordered table and views – data sorted by price, ratings,etc Session Data Large session-state storage PNUTS as a service – easy access to session store

33 E XPERIMENTAL SETUP Production PNUTS code Enhanced with ordered table type Three PNUTS regions 2 west coast, 1 east coast 5 storage units, 2 message brokers, 1 router West: Dual 2.8 GHz Xeon, 4GB RAM, 6 disk RAID 5 array East: Quad 2.13 GHz Xeon, 4GB RAM, 1 SATA disk Workload 1200-3600 requests/second 0-50% writes 80% locality 33

34 I NSERTS Required 75.6 ms per insert in West 1 (tablet master) 131.5 ms per insert into the non-master West 2, and 315.5 ms per insert into the non-master East. 34

35 35 10% writes by default

36 S CALABILITY 36

37 R EQUEST SKEW 37

38 S IZE OF RANGE SCANS 38

39 R ELATED WORK Distributed and parallel databases Especially query processing and transactions BigTable, Dynamo, S3, SimpleDB, SQL Server Data Services, Cassandra Distributed filesystems Ceph, Boxwood, Sinfonia Distributed (P2P) hash tables Chord, Pastry, … Database replication Master-slave, epidemic/gossip, synchronous… 39

40 C ONCLUSIONS AND ONGOING WORK PNUTS is an interesting research product Research: consistency, performance, fault tolerance, rich functionality Product: make it work, keep it (relatively) simple, learn from experience and real applications Ongoing work Indexes and materialized views Bundled updates Batch query processing 40

41 S UMMARY Aim of PNUTS Rich Database functionality Low latency on a massive scale Tradeoffs between functionality, performance and scalability Asynchronous replication – Low write latency Consistency Model – Useful guarantees without sacrificing scalability Hosted Service – Minimize operation costs for applications Features Limited – Preserving Reliability and Scale Novel Aspects Per-record timeline consistency - Asynchronous replication Message broker - Replication mechanism, Redo log Flexible mapping of tablets to storage units – Auto Failover, Load Balancing

42 T HANK YOU ! Questions??


Download ppt "PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON,"

Similar presentations


Ads by Google