Presentation is loading. Please wait.

Presentation is loading. Please wait.

Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove.

Similar presentations


Presentation on theme: "Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove."— Presentation transcript:

1 Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove Ansley Post * MPI-SWS Northeastern University *Now at Google CoNEXT, December 2012

2 Lots of personal data on Online Social Networks (OSNs) 2CoNEXT, December 2012

3 What is the concern with aggregation of this large data? Aggregators can mine this large data To infer attributes missing in the data, e.g. sexual orientation Aggregators can republish this data in easily accessible form Neither user nor OSN has control over usage of crawled data Problem for OSN operators User data is valuable asset to OSN operators OSN operators are blamed for misuse of user data [NYTimes 10] OSNs need to limit large-scale aggregation of user data 3CoNEXT, December 2012 In 2010, 171 M Facebook users data published in BitTorrent

4 Challenge We are defending against a crawler who Wants to crawl as many accounts as possible Wants to crawl as fast as possible Our goal is Limit the rate of crawling Make the crawlers as slow as possible 4CoNEXT, December 2012

5 OSNs rate-limit on per-account or per IP address basis Crawlers can defeat rate-limit using multiple accounts Existing solution: Simple rate-limiting 5CoNEXT, December 2012 The crawlers can create multiple fake accounts or Sybils Or, the crawlers can use compromised accounts

6 Our solution: Genie Assumption: Social links to good users are harder to get than accounts Replace user-account-based rate-limiting with link-based rate-limiting 6CoNEXT, December 2012

7 Outline Background and key idea Genie design Credit networks How to use credit networks to defend against crawlers Using difference between user and crawler activity Genie evaluation 7CoNEXT, December 2012

8 Credit Networks [EC 11] Nodes trust each other by providing pair-wise credit Credit is used to pay for the services received A B 2 4 1 8 5 CoNEXT, December 2012

9 Credit Networks [EC 11] Nodes trust each other by providing pair-wise credit Credit is used to pay the services received A C B To obtain a service, find path(s) with sufficient credits 6 32 3 25 9 3 4 CoNEXT, December 2012

10 How can we map OSN to credit networks ? OSN operator forms credit network from the social network Operator replenishes credit on each link at a fixed rate Credit deducted from links to view another users profile 2 2 5 2 36 3 3 4 ACDB 10 33 4 CoNEXT, December 2012

11 How do credit network defend against crawlers? Amount of crawling is proportional to attack cut Rest of the Network (normal users) Attack cut 11CoNEXT, December 201211 is small Attack cut may be larger Sybil accounts Compromised accounts (SybilRank, NSDI 2012)

12 Difference between normal users and crawlers Reciprocity in profile views Normal users are more reciprocal than crawlers Repeated profile views Normal users repeatedly visit the same set of profiles Locality of views 12CoNEXT, December 2012

13 Difference in locality between normal users and crawlers Renren graph and user browsing trace [IMC 10] 33 K users, 96 K activities (2 weeks) Most of the normal views are local 13 crawler activity CoNEXT, December 2012 Flickr: Mislove et al. [WOSN 08] Orkut: Cha et al. [IMC 09] % of views

14 Genie design principles Use a credit network to rate limit links Exploit difference between normal and crawler activity to discriminate crawlers Charge more for views further away 14CoNEXT, December 2012

15 Genie design New charging model: Pay more to view profiles far away Credit charged per link = Shortest path distance between two nodes -1 Rate of crawling decreases with increased path length 2 1 4 2 36 3 2 4 ACDB - 2 15 4 45 + 2 CoNEXT, December 2012

16 Outline Background and key idea Genie design Credit networks How to use credit networks to defend against crawlers Using difference between user and crawler activity Genie evaluation 16CoNEXT, December 2012

17 Genie evaluation Does Genie limit attackers while allowing normal users? The parameter to tweak: Credit replenishment rate per link Replenishment rate too high: Crawlers will be allowed Replenishment rate too low: Users will be heavily penalized 17CoNEXT, December 2012

18 Experimental setup Genie simulator written in C++ Input: social graph and user activity trace Output: allowed/flagged for each activity Normal user activity trace from Renren Generated multiple synthetic traces for other graphs We model a strong and efficient crawler Crawler controls compromised user accounts Each good user profile is crawled once Crawlers try to crawl as many profiles as possible 18CoNEXT, December 2012

19 Does Genie limit crawlers? 19 The crawlers are slowed down ~3000 times Credits/week per link % of users crawled per week Only 2.7% of the network is crawled in 1 week CoNEXT, December 2012

20 Does Genie penalize good users? 20 Credit/week per link % of user activity flagged 2.6% of total activities from 0.8 %users flagged CoNEXT, December 2012

21 Does Genie penalize good users? 21 Credit/week per link % of user activity flagged CoNEXT, December 2012 10 8 6 4 2 0 % of users crawled per week Trade-off point

22 Who are these flagged users? 3 Users with very high number of random profile views Shows crawler like behavior 70% of the flagged activity are by these users Users with normal # of profile views but very few friends 99% of flagged users have less than 5 friends Adding 4 more friends unflags 97% of these users 22CoNEXT, December 2012

23 Efficiency of Genie In our Genie simulator To scale up Genie we used Canal library [EuroSys 12] Multithreaded implementation Used a 24-core, 48 GB physical memory machine for evaluation For a million node social graph Memory overhead 5 GB Each view request processed in 0.65 ms on average 23CoNEXT, December 2012

24 Summary We propose rate-limiting links to defend against crawlers We strengthen our defense using difference between normal user and crawler activities We evaluated Genie on real world user activity trace 24CoNEXT, December 2012

25 Thank you 25CoNEXT, December 2012


Download ppt "Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove."

Similar presentations


Ads by Google