Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Mainak Ghosh, Wenting Wang, Gopalakrishna Holla, Indranil Gupta.

Similar presentations

Presentation on theme: " Mainak Ghosh, Wenting Wang, Gopalakrishna Holla, Indranil Gupta."— Presentation transcript:

1  Mainak Ghosh, Wenting Wang, Gopalakrishna Holla, Indranil Gupta

2 2 Predicted to become a $3.4B industry by 2018

3  Problem: Changing database or table-level configuration parameters o Primary/Shard Key – MongoDB, Cassandra o Ring size - Cassandra  Challenge: Affects a lot of data at once  Motivation: o Initially DB configured based on guesses o Sys admin needs to play around with different parameters Seamlessly, and efficiently o Later, as workload changes and business/use case evolves: Change parameters, but do it in a live database 3

4  Existing Solution: Create a new schema, export and re-import data  Why Bad? o Massive unavailability of data: every second of outage costs $1.1K at Amazon and $1.5K at Google o Reconfiguration change caused outage at Foursquare o Manual change of primary key at Google: took 2 years and involved 2 dozen teams  Need a solution that is automated, seamless and efficient 4

5  Fast completion time  Minimize the amount of data transfer required  Highly available  CRUD operations should be answered as much as possible with little impact on latency  Network Aware  Reconfiguration should adapt to underlying network latencies 5

6  Master Slave Replication  Range-based Partitioning  Flexibility in data assignment  MongoDB, RethinkDB, CouchDB, etc. 6


8 8 S1 S2 S3 KoKo 123 KnKn 248 KoKo 456 KnKn 163 KoKo 789 KnKn 957 Old Arrangement KoKo 416 KnKn 123 KoKo 285 KnKn 456 KoKo 937 KnKn 789 New Arrangement

9 9 S1 S2 S3 1 1 1 2 1 0 0 1 2 Lemma 1: The greedy algorithm is optimal in total network transfer volume

10 10 S1 S2 S3 KoKo 123 KnKn 248 KoKo 456 KnKn 163 KoKo 789 KnKn 957 Old Arrangement KoKo 416 KnKn 123 KoKo 285 KnKn 456 KoKo 937 KnKn 789 New Arrangement

11 11 S1 S2 S3 1 2 3 2 1 0 0 0 0 Greedy Hungarian Assignment


13 13 RS0 Primary RS1 Primary RS2 Primary Config Front End select * from table where user_id = 20 user_id = 20 RS1 select * from table where product_id = 20

14 14 RS 0 Secondary RS0 Primary RS 1 Secondary RS1 Primary RS 2 Secondary RS2 Primary Config Front End db.collection.changeShardKey(product_id:1)

15 15 RS 0 Secondary RS0 Primary RS 1 Secondary RS1 Primary RS 2 Secondary RS2 Primary Config Front End db.collection.changeShardKey(product_id:1) 1. Runs Algorithm 2. Generates placement plan update table set price = 20 where user_id = 20

16 16 RS 0 Secondary RS0 Primary RS 1 Secondary RS1 Primary RS 2 Secondary RS2 Primary Config Front End db.collection.changeShardKey(product_id:1) Iteration 1Iteration 2

17 17 RS0 Secondary RS0 Primary RS1 Secondary RS1 Primary RS2 Secondary RS2 Primary Config Front End db.collection.changeShardKey(product_id:1) update table set price = 20 where product_id = 20 RS0 Primary RS1 Primary RS2 Primary RS 0 Secondary RS 1 Secondary RS 2 Secondary


19  Assigns a socket per chunk per source-destination pair of communicating servers (Chunk-Based) 19 Inter Rack Latency Unequal Data Size

20 20

21 21 WFS strategy 30% better than naïve chunk- based scheme and 9% better than Orchestra [Chowdhury et al]

22  Morphus chooses slaves for reconfiguration during first Isolation phase  In a geo-distributed setting, naïve choice can lead to bulk transfers over wide area network  Solution: Localize bulk transfer by choosing replicas in the same datacenter o Morphus extracts the datacenter information from replica’s metadata.  2x-3x improvement observed in experiments 22


24  Dataset: Amazon Reviews [Snap].  Cluster: Emulab d710 nodes, 100 Mbps LAN switch and Google Cloud (n1-standard-4 VMs), 1 Gbps network  Workload: Custom generator similar to YCSB. Implements Uniform, Zipfian and Latest distribution for key access  Morphus: Implemented on top of MongoDB 24

25 25 Access Distribution Read Success Rate Write Success Rate Read Only99.9- Uniform99.998.5 Latest97.296.8 Zipf99.998.3

26 26

27 27 Access Distribution Read Success Rate Write Success Rate Read Only99.9- Uniform99.998.5 Latest97.296.8 Zipf99.998.3 Morphus has a small impact on data availability

28 28 Hungarian performs well in both scenarios and should be preferred over Greedy and Random schemes

29 29 Sub-linear increase in reconfiguration time as data and cluster size increases

30  Online schema change [Rae et al.]: Resultant availabilities smaller.  Live data migration: Attempted in databases [Albatross, ShuttleDB] and VMs [Bradford et al.]. Similar approach as ours. o Albatross and ShuttleDB ship whole state to a set of empty servers.  Reactive reconfiguration proposed by Squall [Elmore et al]: Fully available but takes longer. 30

31  Morphus is a system which allows live reconfiguration of a sharded NoSQL database.  Morphus is implemented on top of MongoDB  Morphus uses network efficient algorithms for placing chunks while changing shard key  Morphus minimally affects data availability during reconfiguration  Morphus mitigates stragglers during data migration by optimizing for the underlying network latencies. 31

32 32 Morphus scales super-linearly with data size significant portion of which is spent in data migration

33 33 Increase in number of replicas improves Morphus performance

34  Assigns a socket per chunk per source-destination pair of communicating servers (Chunk-Based)  What if the topology is:  Each shape represents a node in a single replica set. Circle is the front end. Intra-rack latency is 1ms while inter-rack is 2ms 34

35 35 Questions?

36  Growing quickly o $3.4B industry by 2018  Fast reads and writes (much faster than MySQL and relational databases)  Many companies use them in running critical infrastructures o Google, Facebook, Yahoo, and many others  Many open source NoSQL databases o MongoDB, Cassandra, Riak, etc. 36

37  Initially DB configured based on guesses  Sys admin needs to play around with different parameters o Seamlessly, and efficiently  Later, as workload changes and business/use case evolves: o Change parameters, but do it in a live database 37

Download ppt " Mainak Ghosh, Wenting Wang, Gopalakrishna Holla, Indranil Gupta."

Similar presentations

Ads by Google