Download presentation
Presentation is loading. Please wait.
Published byAvery Stoken Modified over 9 years ago
1
R2D2 LinkedIn’s Request/Response Infrastructure Oby Sumampouw (pronounced o-bee soo-mum-pow) osumampouw@linkedin.com
2
Why R2D2? Server Cluster Load Balancer Load Balancer 2 Load Balancer 3 Cluster 3 Cluster 2
3
R2D2 in a nutshell Client Server for Resource “foo” Server for Resource “foo” Profile Service Server for Resource “foo” Server for Resource “foo” Inbox Service Server for Resource “foo” Server for Resource “foo” Ads Service Send request to get profile?id=123 Zookeeper Listens to profile zookeeper node Get a list of servers’ URIs where profile are hosted Get notified if a server leaves or joins a cluster Choose one server to send the request to ??? Request Servers
4
Agenda R2D2 Architecture How information is stored and organized in zookeeper How R2D2 does load balancing and graceful degradation Partitioning and sticky routing Miscellaneous D2 use cases at LinkedIn: -Redlining -Cluster variants Q&A
5
StarWars™? Note*: This R2D2 is not related to StarWars™. Lucas-arts/Disney don’t sue us.
6
What is rest.li? Open source Java REST framework. Go to http://rest.li
7
What is D2? Primarily a name server and traffic router The global “address book” is stored in zookeeper We store the back-up in the local filesystem Definitions: D2 Cluster represents a collection of identical servers that host one or many D2 services D2 Service represents a service D2 Uri represents a server’s address and weight
8
How is D2 information organized and stored? / Root /d2 /d2/clusters/d2/services/d2/uris /d2/clusters/clusterA /d2/clusters/clusterB /d2/services/serviceA1 /d2/services/serviceA2 /d2/services/serviceB Service Properties: -Cluster = clusterA -Load-balancer configuration -Degrader configuration -Strategy configuration -Etc. Cluster Properties: -Partition configuration -Etc. /d2/uris/clusterA /d2/uris/clusterB /d2/uris/clusterB/ephemeralNode1 /d2/uris/clusterB/ephemeralNode2 Uri Properties: -Machine URI -Weight
9
9 How is zookeeper initialized ? Zookeeper Config file / Root /d2 /d2/clusters/d2/services/d2/uris /d2/clusters/clusterA /d2/clusters/clusterB /d2/clusters/clusterC /d2/services/serviceA1 /d2/services/serviceA2 /d2/services/serviceA3 ServiceA1 Client ClusterA Server /d2/uris/clusterA /d2/uris/clusterA/ephemeralNode1 D2Config.java
10
D2 Load Balancer Client-side load balancer Client keeps track of the state 2 Strategies to use: -Random -Degrader
11
LOAD_BALANCE Individual Server stats: Cluster total call count: 0 Cluster average latency: 0 ms Cluster drop rate: 0.0 LOAD_BALANCE Individual Server stats: Cluster average latency: 2500 ms How does the degrader load balancer work? Server 1 Server 2 Client Total Call Count: 0 Latency: 0 ms Total Call Count: 0 Latency: 0 ms 100 points Period 1Period 2 Total Call Count: 100 Latency: 4900 ms Total Call Count: 100 Latency: 100 ms 61 points CALL_DROPPING Individual Server stats: Period 3 CALL_DROPPING Individual Server stats: Cluster average latency: 3636.5 ms Total Call Count: 67 Latency: 4900 ms Total Call Count: 133 Latency: 3000 ms LOAD_BALANCE Individual Server stats: Cluster drop rate: 0.2 Notice: The number of points don’t change because we are in CALL_DROPPING mode LB Configuration: Latency Low Water Mark: 500 ms Latency High Water Mark: 2000 ms Min Call Count: 10
12
How does the degrader recover from a bad state? Server 1 Server 2 Client Period N LOAD_BALANCE Individual Server stats: Cluster total call count: 0 Cluster average latency: 0 ms Cluster drop rate: 1.0 1 points 1 point Total Call Count: 0 Latency: 0 ms Total Call Count: 0 Latency: 0 ms CALL_DROPPING Individual Server stats: 2 points Notice: We’re in recovery mode Because we choke all traffic So we will try recovering regardless of call stats Period N+1 CALL_DROPPING Individual Server stats: LOAD_BALANCE Individual Server stats: Cluster drop rate: 0.8 Period N+2 Total Call Count: 15 Latency: 150 ms Total Call Count: 20 Latency: 200 ms LOAD_BALANCE Individual Server stats: Cluster total call count: 35 Cluster average latency: 178.6 ms Cluster drop rate: 0.8 37 points CALL_DROPPING Individual Server stats: Period N+3 Total Call Count: 50 Latency: 200 ms Total Call Count: 50 Latency: 200 ms CALL_DROPPING Individual Server stats: Cluster total call count: 100 Cluster average latency: 200 ms Cluster drop rate: 0.8 LOAD_BALANCE Individual Server stats: Cluster drop rate: 0.6 LB Configuration: Latency Low Water Mark: 500 ms Latency High Water Mark: 2000 ms Min Call Count: 10
13
A few more extra details Min call count is reduced depending on how degraded the state is It’s not just latency, we also consider error rate and number of outstanding calls We can use many types of latency: -AVERAGE -90% -95% -99% We can set different low/high water mark for cluster vs for individual node
14
Call Dropping vs Load Balancing Call Dropping ModeLoad Balancing Mode Affects the entire clusters Affects only individual machines in the cluster Purpose: graceful degradationPurpose: load balancing traffic Drop RatePoints Hints: Latency Hints: individual node latency, error rate, #outstanding calls
15
Partitioning and Sticky Routing D2 supports partitioning of clusters -Range partitioning -Hash partitioning (MD5 or Modulo) -Use regex to extract key from URI to determine where a request should go Sticky routing within partition is also supported -Use regex to extract key from URI (same as above) -Use consistent hash ring
16
Consistent Hash Ring | Integer.MAX_INTInteger.MIN_INT 0 100 -100 app1.foo.com app2.foo.com app3.foo.com Request for “foo”
17
Miscellaneous D2 use cases Redlining: Measure max capacity of server Use real traffic Don’t have to worry about mutable operations | Integer.MAX_INTInteger.MIN_INT 0 100 -100 app1.foo.com app2.foo.com app3.foo.com
18
Miscellaneous D2 use cases What if there are different requirements from different clients? Let’s say we have a service called profile. -For clients who can only view profile, we want them to go to read-only cluster -For clients who can edit profile, we want them to go to read-write cluster. Use Cluster variant technique Cluster variant allows changing D2 Service’s namespace to get around the restriction that zookeeper node’s name must be unique.
19
Miscellaneous D2 use cases / Root /d2 /d2/clusters/d2/services/d2/uris /d2/clusters/readonly /d2/clusters/readwrite /d2/services/profile Service Properties: -Cluster = readonly /d2/uris/readonly /d2/uris/readwrite /d2/profileClusterVariant /d2/profileClusterVariant/profile Service Properties: -Cluster = readwrite /d2/uris/readonly/ephemeralNode1 /d2/uris/readwrite/ephemeralNode1 readonly Server readwrite Server View ClientEdit Client Request for profile
20
Q&A Questions? Email me at: osumampouw@linkedin.comosumampouw@linkedin.com Check out http://rest.li https://github.com/linkedin/rest.li for more infohttp://rest.lihttps://github.com/linkedin/rest.li
21
Cross data center routing ©2013 LinkedIn Corporation. All Rights Reserved.21 Zookeeper Data Center 1 Zookeeper Data Center 2 Server Cluster for Data Center 1 Server Cluster for Data Center 2 / Root /d2 /d2/clusters/d2/services/d2/uris /d2/clusters/clusterA /d2/clusters/clusterA-1 /d2/clusters/clusterA-2 /d2/services/serviceA /d2/services/serviceA-1 /d2/services/serviceA-2 /d2/clusters/clusterA-2 /d2/uris/clusterA-2/ephemeralNode1 /d2/clusters/clusterA /d2/uris/clusterA/ephemeralNode1 Service Properties: -Cluster = clusterA-2 View of Zookeeper In Data Center 1 Client in Data Center 1 Client in Data Center 2
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.