Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine.

Similar presentations


Presentation on theme: "Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine."— Presentation transcript:

1 Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

2 Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion

3 Motivation Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010

4 Example category Rating

5 Motivation Rising awareness of privacy

6 Motivation How is it applied? Traceability/Linkability Linkability of Ad hoc Reviews Linkablility of Several Accounts

7 Goal Assess the linkability in user reviews

8 Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion

9 Data Set 1 Million Reviews 2000 Users more than 300 reviews

10 Problem Settings

11

12 IR: Identified Record IRIR IRIR IRIR IRIR AR AR: Anonymous Record Problem Formulation

13 Anonymous Record (AR) Identified Records (IR’s) Matching Model TOP-X Linkability X: 1 and 10 1, 5, 10, 20,…60 Problem Settings

14 Methodologies (1) Naïve Bayesian Model (2) Kullback-Leibler Divergence (KLD) Decreasing Sorted List of IRs Increasing Sorted List of IRs Maximum-Likelihood Estimation

15 Tokens Unigram: “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y” 26 values Digram “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy” 676 values Rating 5 values Category 28 values

16 Naïve Bayesian Identified Record Anonymous Record Decreasing Sorted List of IRs

17 Kullback-Leibler Divergence (KLD ) Identified Record (IR) Anonymous Record (AR) Increasing Sorted List of IRs

18 Maximum Likelihood Estimation

19 Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion

20 NB -Unigram Unigram Results Anonymous Record Size Linkability Ratio Size 60, LR 83%/ Top-1 LR 96% Top-10

21 Digram Results NB -Digram Linkability Ratio Anonymous Record Size Size 20, LR 97%/ Top- 1 Size10, LR 88%/ Top- 1

22 Improvement (1): Combining Lexical and non-Lexical ones NB Model Anonymous Record Size Linkability Ratio Gain, up to 20% Size 60, 83 % To 96% Size 30, 60 % To 80%

23 First, Combine Rating and Category Second, Combine non-lexical and lexical 0.5 0.997/0.97 for Unigram/Digram KLD Weighted Average

24 Rating and Category Beta Value of 0.5

25 Non-lexical and Unigram Alpha Value of 0.997

26 Non-Lexical and Digram Alpha Value of 0.97

27 What about Restricting Identified Record (IR) Size? NB Model KLD Model Anonymous Record Size Linkability Ratio Anonymous Record Size Linkability Ratio Affected by IR size Performed better for smaller IR Size 20 or less, improved

28 ✔ ✔ ✔ ✔ ✖ ✖ ✖ ✖ ✖ ✖ v1 v3 v2 v4 v7 v5 v6 v8 v9 v10 v11 v12 v13 v14 v15 v16 Improvement (2): Matching All IR’s At Once

29 Matching All Results Restricted IRFull IR Anonymous Record Size Linkability Ratio Anonymous Record Size Linkability Ratio Gain, up to 16% Size 30, From 74% To 90% Gain, up to 23% Size 20, From 35% To 55%

30 Improvement (3): For Small IR Size Changing it to:0.5+ Review Length Anonymous Record Size Linkability Ratio Size 10, 89% To 92% Size 7, 79% To 84% Gain up to 5%

31 Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion

32 Discussion o Unigram and Scalability o 26 VS 676 o 59 VS 676 o Less than 10% o Prolific Users o On the long run, will be prolific o Anonymous Record Size o A set of 60 reviews, less than 20% of minimum contribution o Detecting Spam Reviews

33 Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion

34 Future Work o Improving more for Small AR’s o Other Probabilistic Models o Using Stylometry o Review Anonymization o Exploring Linkability in other Preference Databases

35 Conclusion o Extensive Study to Assess Linkability of User Reviews o For large set of users o Using very simple features o Users are very exposed even with simple features and large number of authors Reviews can be accurately de-anonymized using alphabetical letter distributions Takeaway Point:

36 Questions?


Download ppt "Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine."

Similar presentations


Ads by Google