Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary.

Similar presentations


Presentation on theme: "1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary."— Presentation transcript:

1 1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg 3 Department of Computer Science, Yale University 4 Department of Computer Science and Technology, Tsinghua University 5 Department of Computer Science and Engineering, SUNY Buffalo On the Synchronization Bottleneck of OpenStack-like Object Storage Service

2 2 ①Background and Motivation ②Problem ③Proposed Solution ④Evaluation Summary

3 3 Applications Storage Cluster Or Datacenter HTTP/REST API Object Storage Systems PUT http://example.com/v1/account/container/new_object ~3,500,000,000 Objects created/day

4 For better QoS, Some object storage system focus on High availability at the expense of weaker consistency Eventual Consistency For better QoS, Some object storage system focus on High availability at the expense of weaker consistency Eventual Consistency Claim 99.99… Availability Most relax situation r Replica nodes Writes: At least r/2 +1 replica writes to succeed Writes: At least r/2 +1 replica writes to succeed Reads: Any replica read Is enough to succeed Reads: Any replica read Is enough to succeed Availability over Consistency Most Tutorials have r=3 and experiments are ran with few objects N<1000 All focus on read/Write performance Most Tutorials have r=3 and experiments are ran with few objects N<1000 All focus on read/Write performance 4

5 ... Zone 1... Zone 2... Zone 3... Zone 4 Alice OpenStack Swift Cluster Object Write Proxy Node Storage Node Proxy Node Storage Node OK 5

6 Proxy Node Storage Node... Zone 1... Zone 2... Zone 3... Zone 4 OpenStack Swift Cluster Bob Object Read Proxy Node Storage Node 6

7 ... Zone 1... Zone 2... Zone 3... Zone 4 Ted OpenStack Swift Cluster Proxy Node Storage Node Proxy Node Storage Node Replicator(Sync) Hinted Hand-off Disk Full/Failed 7

8 Motivation 8 How long does it take to sync replica (copies)? How much is OSync‘s (Object Synchronization) network traffic overhead?

9 9 1. Preliminary Measurements 2. Source Code Analysis & Problem Validation Problem 3. Our Proposed Mechanism Proposed Solution 4. Comparison between both Mechanisms Evaluation

10 10 ①Background and Motivation ②Problem ③Proposed Solution ④Evaluation Summary

11 Measurements Setup 11 Proxy & Storage Storage Client & admin term router During Objects Upload. During Measurements (Idle: No read/write operation on the system) More node can be added for larger scale 28-core Intel Xeon CPUs@2.0 GHz, 32-GB 1600-MHz DDR3 1-Gbps 8x600-GB 15K-RPM SAS disk storage, Two 1-Gbps Broadcom Ethernet interfaces. Specifications

12 Sync Delay(Check Consistency) 12 Runtime increases with # objects Run repeatedly every 30s (By Default) Run repeatedly every 30s (By Default) x10+1M

13 Multithreading 13 Relation between CPU usage and number of threads (r=3, n=5M) Openstack Swift default Config: object-server.conf Concurrency=1

14 Experimental Results 14 This experiments have been done on LAN, what about Geo-replication that uses WAN? Will have much Larger delay This experiments have been done on LAN, what about Geo-replication that uses WAN? Will have much Larger delay Network Overflow * Total node number = Overall. ~7835 MB (r=5, n=4M) Network Overflow * Total node number = Overall. ~7835 MB (r=5, n=4M)

15 A Step Further 15 OpenStack Swift is open source We can use the source code OpenStack Swift is open source We can use the source code We found 12 source files to be the most relevant to our work. We studied more than ~12800 lines of code

16 Objects are Stored in Groups - For load balancing object are stored in N p groups - The groups are randomly distributed to the nodes according to the mapping. - For load balancing object are stored in N p groups - The groups are randomly distributed to the nodes according to the mapping. … Hashes.pkl Node Objects Objects Group Objects Fingerprint data 16 N p = 2 18 x10+1M

17 Object Storage Structure 17 Hashes.pkl Hashes.pkl size H Can reach 140 KB

18 Synchronization Message 18 Object Server Osync process AB Object Server Osync process The two messages are asynchronous

19 Synchronization Protocol 19 4 3 2 1

20 More Findings 20 r=5, There are 20 messages Synchronize to: 1.Ask for hashes.pkl 2.Compare … Synchronize to: 1.Ask for hashes.pkl 2.Compare … r2 r1 r3 r4 r5 r replicas r(r-1) sync messages

21 21 Systems with Those Issues (2015 OpenStack Summit) Rackspace Cloud Files 4 region > 75,000,000,000 objects > 50,000 accounts

22 22 ①Background and Motivation ②Problem ③Proposed Solution ④Evaluation Summary

23 From Existing Mechanism 23 Openstack Swift is open source We Can edit the source code Openstack Swift is open source We Can edit the source code We edited 2 source files. We added/modified ~200 lines of code.

24 LightSync:Hash of Hashes(HoH) 24 5 Milion objects r=3

25 LightSync:HoH’s Protocol 25 4 3 2 1

26 Note 26 LightSync: Circular Hash Checking(CHC) The list of replica is maintained on each storage node We reduce the number of replication(Sync) messages From r(r-1) to r We reduce the number of replication(Sync) messages From r(r-1) to r

27 27 Lightsync: CHC Node Failure

28 28 Lightsync: CHC Disk Full/Failed

29 29 ①Background and Motivation ②Problem ③Proposed Solution ④Evaluation Summary

30 Evaluation: Sync Delay 30 Replicas already in Sync (just checking consistency) ~2.21 min ~48.23 min

31 Evaluation: Network Traffic 31 Replicas already in Sync (just checking consistency) ~181 MB ~1354 MB Network Overflow * Total Node Number = Overall ~7835 MB VS ~1060 MB (r=5, n=4M) Network Overflow * Total Node Number = Overall ~7835 MB VS ~1060 MB (r=5, n=4M) N p = 2 18

32 32 ①Background and Motivation ②Problem ③Proposed Solution ④Evaluation Summary

33 Large OSync delay Large Osync network traffic overhead Large number and size of exchanged messages Improvement: LightSync (HoH + CHC) LightSync Can also be applied to Containers and Accounts Summary 33

34 34 Titcheu Pierre (10th April 2016) Dedication Jesus Christ

35 35

36 36

37 37 LightSync: Deployed as a Patch http://github.com/LightSync/patch-openstack-swift

38 Large Scale Measurements (64 VMs) 38 r=5, n=4M MethodOSync Delay(min)Network Overhead(MB) Original OpenStack Swift76.05849 LightSync1.8716.3 Reduced by97.5%98.08% Network Overflow * Total Node Number = Overall Slower Machine Slower Network Fewer groups per node (~65 times less) More hashes per group’s Hashes.pkl file (~7 times more)


Download ppt "1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary."

Similar presentations


Ads by Google