Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Similar presentations


Presentation on theme: "Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems."— Presentation transcript:

1 Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems

2 Dynamo Dennis Kafura – CS5204 – Operating Systems Context Core e-commerce services need scalable and reliable storage for massive amounts of data  n x 100 of services  n x 100,000 concurrent sessions on key services Size and scalability require a storage architecture that is  highly decentralized  high component count  commodity hardware High component count creates reliability problems (“treats failure handling as the normal case”) Address reliability problems by replication Replication raises issues of:  Consistency (replicas differ after failure)  Performance When to enforce consistency (on read, on write) Who enforces consistency (client, storage system) 2

3 Dynamo System Elements Maintain state of services with  High reliability requirements  Latency-sensitive performance  Control tradeoff between consistency and performance Used only internally  Can leverage characteristics of services and workloads  Non-hostile environment (no security requirements) Simple key-value interface  Applications do not require more complicated (e.g. database) semantics or hierarchical name space  Key is unique identifier for data item; Value is a binary object (blob)  No operations over multiple data items Adopts weaker model of consistency (eventual consistency) in favor of higher availability Service level agreements (SLA)  At 99.9% percentile  Key factors: service latency at a given request rate  Example: response time of 300ms for 99.9% of requests at peak client load of 500 requests per second  State manage (storage) efficiency a key factor in SLAs Dennis Kafura – CS5204 – Operating Systems3

4 Dynamo Design Considerations Consistency vs. availability  Strict consistency means that data is unavailable in case of failure to one of the replicas  To improve availability, use weaker form of consistency (eventual consistency) allow optimistic updates (changes propagate in the background)  Can lead to conflicting changes which must be detected and resolved Conflicts  Dynamo applications require “always writeable” storage  Perform conflict detection/resolution on reads Other factors  Incremental scalability  Symmetry/decentralization (P2P organization/control)  Heterogeneity (not all servers the same) Dennis Kafura – CS5204 – Operating Systems4

5 Dynamo Design Overview Dennis Kafura – CS5204 – Operating Systems5

6 Dynamo Partitioning Interface  get(key) Returns context and A single object or a list of conflicting objects  put(key, context, object) Context from previous read Object placement/replication  MD5 hash of key yields 128 bit identifier  Consistent hashing Dennis Kafura – CS5204 – Operating Systems6 preference list

7 Dynamo Versioning Failure free operation What to do in case of failure? Dennis Kafura – CS5204 – Operating Systems7 ? put replicas put replicas

8 Dynamo Versioning Object content is treated as immutable and an update operation creates a new version Dennis Kafura – CS5204 – Operating Systems8

9 Dynamo Versioning Versoning can lead to inconsistency  Due to network partitioning Dennis Kafura – CS5204 – Operating Systems9 put

10 Dynamo Versioning Versoning can lead to inconsistency  Due to concurrent updates Dennis Kafura – CS5204 – Operating Systems10 put a put b

11 Dynamo Object Resolution Uses vector-clocks Conflicting versions passed to application as output of get operation Application resolves conflicts and puts a new (consistent) version Inconsistent version rare: 99.94% of get operations saw exactly one version Dennis Kafura – CS5204 – Operating Systems11

12 Dynamo Handling get/put operations Operating handled by coordinator:  First among the top N nodes in the preference list  Located by call to load balancer (no Dynamo-specific node needed in application but may require extra level of indirection) Direct call to coordinator (via Dynamo-specific client library) Quorum voting  R nodes must agree to a get operation  W nodes must agree to a put operation  R+W > N  (N, R, W) can be chosen to achieve desired tradeoff  Common configuration is (3,2,2) “Sloppy quorum”  Top N’ healthy nodes in the preference list  Coordinator is first in this group  Replicas sent to node contain a “hint” indicating the (unavailable) original node that should hold the replica  Hinted replicas are stored by available node and sent forwarded when original node recovers. Dennis Kafura – CS5204 – Operating Systems12

13 Dynamo Replica synchronization Accelerates detection of inconsistent replicas using Merkle tree Separate tree maintained by each node for each key range Adds overhead to maintain Merkle trees Dennis Kafura – CS5204 – Operating Systems13

14 Dynamo Ring membership Nodes are explicitly added to/removed from a ring Membership, partitioning, and placement information propagates via periodic exchanges (a gossip protocol) Existing nodes transfer key ranges to newly added node or receive key ranges from exiting nodes Nodes eventually know key ranges of its peers and can forward requests to them Some “seed” nodes are well-known Nodes failures detected by lack of responsiveness and recovery detected by periodic retry Dennis Kafura – CS5204 – Operating Systems14

15 Dynamo Partition/Placement Strategies Dennis Kafura – CS5204 – Operating Systems15 StrategyPlacementPartition 1T random tokens per nodeConsecutive tokens create a partition 2T random tokens per nodeQ equal sized partitions 3Q/S tokens per nodeQ equal sized partitions S = number of nodes

16 Dynamo Strategy Performance Factors Strategy 1  Bootstrapping of new node is lengthy It must acquire its key ranges from other nodes Other nodes process scanning/transmission of key ranges for new node as background activities Has taken a full day during peak periods  Numerous nodes many have to adjust their Merkle trees when a new node joins/leaves system  Archival process difficult Key ranges may be in transit No obvious synchronization/checkpointing structure Dennis Kafura – CS5204 – Operating Systems16

17 Dynamo Strategy Performance Factors Strategy 2  Decouples partition and placement  Allows changing of placement scheme at run-time Strategy 3  Decouples partition and placement  Faster bootstrapping/recovery and ease of archiving because key ranges can be segregates into different files that can be shared/archived separately Dennis Kafura – CS5204 – Operating Systems17

18 Dynamo Partition Strategies - Performance Strategies have different tuning parameters Fair comparison: evaluate the skew in their load distributions for a fixed amount of space to maintain membership information Strategy 3 superior Dennis Kafura – CS5204 – Operating Systems18

19 Dynamo Client- vs Server-Side Coordination Any node can coordinate read requests; write requests handled by coordinator State-machine for coordination can be in load balancing server or incorporated into client Client-driven coordination has lower latency because it avoids extra network hop (redirection) Dennis Kafura – CS5204 – Operating Systems19


Download ppt "Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems."

Similar presentations


Ads by Google