Presentation is loading. Please wait.

Presentation is loading. Please wait.

PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.

Similar presentations


Presentation on theme: "PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang."— Presentation transcript:

1 PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang

2 social network websites 2 Brian SonjaJimiBrandonKurt What are my friends up to? Sonja: Brandon:

3 What does a web application need? Scalability – architectural scalability – scale during periods of rapid growth with minimal operational effort Response Time and Geographic Scope – Fast response time to geographically distributed users High Availability and Fault Tolerance – Read and even write data in failures Relaxed Consistency Guarantees – Eventually consistency: update one replica first and then update others 3

4 What do we need from our DBMS? Web applications need: – Scalability And the ability to scale linearly – Geographic scope – High availability Web applications typically have: – Simplified query needs No joins, aggregations – Relaxed consistency needs Applications can tolerate stale or reordered data 4

5 What is PNUTS? 5

6 6 E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Geographic replication Indexes and views Structured, flexible schema Hosted, managed infrastructure A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E

7 Query model Per-record operations – Get – Set – Delete Multi-record operations – Multiget – Scan – Getrange 7

8 8 Data-path components Storage units Routers Tablet controller REST API Clients Message Broker Detailed architecture Data tables are horizontally partitioned into groups of records called tablets. Storage units: store tablets respond to get() and scan() requests by retrieving and returning matching records respond to set() requests by processing the update. If we want to commit the update result, need to write them to Message Broker firstly. Router: determine which storage unit is responsible for a given record to be read or written by the client, we must first determine which tablet contains the record, and then determine which storage unit has that tablet tablet controller : determines when it is time to move a tablet between storage units for load balancing or recovery when a large tablet must be split. update the copy of the interval mapping.

9 9 Storage units Routers Tablet controller REST API Clients Local region Remote regions YMB Detailed architecture record-level mastering: mastership is assigned on a record-by-record basis, and different records in the same table can be mastered in different clusters. In one week, 85 percent of the writes to a given record originated in the same datacenter. A master publishes its updates to a single broker, and thus updates are delivered to replicas in commit order. YMB takes multiple steps to ensure messages are not lost before they are applied to the database. messages published to one YMB cluster will be relayed to other YMB clusters for delivery to local subscribers

10 Query processing 10

11 Accessing data 11 SU 1 Get key k 2 3 Record for key k 4

12 Bulk read 12 SU Scatter/ gather engine: a component of the router. receives a multi-record request, splits it into multiple individual requests for single records or single tablet scans, and initiates those requests in parallel. SU 1 {k 1, k 2, … k n } 2 Get k 1 Get k 2 Get k 3

13 Range queries MIN-CanteloupeSU1 Canteloupe-LimeSU3 Lime-StrawberrySU2 Strawberry-MAXSU1 13 Storage unit 1Storage unit 2Storage unit 3 Router Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Pear Strawberry Tomato Watermelon Grapefruit…Pear? Grapefruit…Lime? Lime…Pear? SU1Strawberry-MAX SU2Lime-Strawberry SU3Canteloupe-Lime SU4MIN-Canteloupe

14 Updates 14 1 Write key k 2 7 Sequence # for key k 8 SU 3 Write key k 4 5 SUCCESS 6 Write key k Routers Message brokers

15 Asynchronous replication and consistency 15

16 Asynchronous replication 16

17 Goal: make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key “Brian”? 17 Consistency model Time Record inserted Update Delete Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Update

18 18 Consistency model Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Current version Stale version Read Read-any: Returns a possibly stale version of the record. e.g., in a social networking application, for displaying a user’s friend’s status, it is not absolutely essential to get the most up-to-date value, and hence read-any can be used.

19 19 Consistency model Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read up-to-date Current version Stale version

20 20 Consistency model Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read ≥ v.6 Current version Stale version Read-critical(required version): Read-critical: Returns a version of the record that is strictly newer than, or the same as the required version. For example, when a user writes a record, and then wants to read a version of the record that definitely reflects his changes.

21 21 Consistency model Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write Current version Stale version

22 22 Consistency model Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale version Test-and-set-write(required version) Test-and-set-write(required version): This call performs the requested write to the record if and only if the present version of the record is the same as required version. This call can be used to implement transactions that first read a record, and then do a write to the record based on the read, e.g., incrementing the value of a counter..

23 Record and Tablet Mastership Data in PNUTS is replicated across sites Hidden field in each record stores which copy is the master copy – updates can be submitted to any copy – forwarded to master, applied in order received by master Record also contains origin of last few updates – Mastership can be changed by current master, based on this information – Mastership change is simply a record update Tablets mastership – Required to ensure primary key consistency – Can be different from record mastership 23

24 Other Features Per record transactions Copying a tablet (failure recovery, for e.g.) – Request copy – Publish checkpoint message – Get copy of tablet as of when checkpoint is received – Apply later updates Tablet split – Has to be coordinated across all copies 24

25 Query Processing Range scan can span tablets – Only one tablet scanned at a time – Client may not need all results at once Continuation object returned to client to indicate where range scan should continue Notification – One pub-sub topic per tablet – Client knows about tables, does not know about tablets Automatically subscribed to all tablets, even as tablets are added/removed. – Usual problem with pub-sub: undelivered notifications, handled in usual way 25

26 Experiments 26

27 Experimental setup Production version supported by – Hash tables – ordered tables Database – 3 regions: 2 west coast, 1 east coast – 1 KB records, 128 tablets per region – Each process had 100 client threads, – Totally 300 clients across the system. Workload – 1200-3600 requests/second – 0-50% writes – 80% locality 27

28 Inserts Inserts (hash tables) – required 75.6 ms per insert in West 1 (tablet master) – 131.5 ms per insert into the non-master West 2, and – 315.5 ms per insert into the non-master East. Inserts (ordered tables) – 33 ms per insert in West 1 – 105.8 ms per insert in the non-master West2 – 324.5 ms per insert in the non-master East. 28

29 29 10% writes by default latency decreases, and then increases, with increasing load The high latency at low request rate resulted from an anomaly in the HTTP client library we used, which closed TCP connections in between requests at low request rates, requiring expensive TCP setup for each call. As the proportion of reads increases, the average latency decreases.

30 Scalability 30

31 Size of range scans 31

32 Thanks! 32


Download ppt "PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang."

Similar presentations


Ads by Google