Download presentation

Presentation is loading. Please wait.

Published byChris Libby Modified over 2 years ago

1
Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine

2
Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010

3
category Rating

4
Rising Awareness of Privacy

5
How Privacy apply to Reviews? Traceability Linkability of Ad hoc Reviews Linkablility of Several Accounts

6
Contribution Extensive Study to Measure privacy/linakability in user reviews Propose models that adequately identify authors

7
Settings & Problem Formulation

8

9

10
IR: Identified Record IRIR IRIR IRIR IRIR AR AR: Anonymous Record

11
Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10 1, 5, 10, 20,…60

12
Dataset 1 Million Reviews 2000 Users more than 300 review

13
Methodology Naïve Bayesian Model Kullback-Leibler Model Symmetric Version

14
Methodology Anonymous Record AR -> Identified Record IR Naïve Bayesian Model, NB Max IRi P(AR|IR i ) Kullback-Leibler Divergence, KLD Distance(AR, IR_i) and return IR_i with MIN

15
Naïve Bayesian (NB) Identified Record (IR) Anonymous Record (AR) Decreasing Sorted List of IRs

16
Naïve Bayesian Identified Record Anonymous Record Sorted List of IRs

17
Kullback-Leibler Divergence (KLD) Identified Record (IR) Anonymous Record (AR) Increasing Sorted List of IRs

18
Maximum Likelihood Estimation

19
Tokens Unigram: a, ….z Digram: aa, ab,…,zz Rating :1,2,3,4,5 Category: restaurant, Beauty and Spa, Education

20
Lexical Token Results

21
NB -Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

22
KLD - Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

23
NB Digram Size 20, LR 97%/ Top-1 Size10, LR 88%/ Top-1

24
KLD Digram Size 60, LR 99%/ Top-1 Size 30, LR 75%/ Top-1

25
Improvement (1): Combining Lexical and non- Lexical ones

26
Combining in NB model Straightforward P(Rating|IR), P(Category|IR) But for KLD? Weighted Average

27
First, Combine Rating and Category Second, Combine non-lexical and lexical /0.97 for Unigram/Digram

28
Rating and Category Beta Value of 0.5

29
Non-lexical and Unigram Alpha Value of 0.997

30
Non-Lexical and Digram Alpha Value of 0.97

31
Token Combining Results

32
Rating, Category, and Unigram - NB Gain, up to 20% Size 30, 60 % To 80% Size 60, 83 % To 96%

33
Rating, Category, and Unigram - KLD Gain, up to 12% Size 40, 68 % To 80% Size 60, 83 % To 92%

34
Rating, Category, and Digram - NB

35
Rating, Category, and Digram - KLD

36
What about Restricting Identified Record (IR) Size?

37
Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10

38
Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10

39
Restricted IR - NB Affected by IR size

40
Restricted IR - KLD Performed better for smaller IR Size 20 or less, improved The rest, comparable

41
What about Matching All ARs at once?

42
Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10

43
Anonymous Records (ARs) Identified Records (IRs) Matching Model

44
Improvement (2): Matching All IRs At Once

45

46

47
MatchAll - Restricted Gain, up to 16% Size 30, From 74% To 90%

48
Matchall - Full Gain, up to 23% Size 20, From 35% To 55%

49
Improvement (3): For Small IR Size

50
Changing it to: Review Length

51
Results – Improvement (3) Size 10, 89% To 92% Size 7, 79% To 84% Gain up to 5%

52
Discussion Implications Cross-Referencing Review Spam Non-Prolific Users Gradually becomes prolific IR of 20, Link Around 70% Anonymous Record Size Linkability high even for small (92% for AR of 10) 60 only 20% of min user contribution

53
Discussion (cont.) Unigram Token Very Comparable for larger AR Entail less resources in the attach 26 VS 676

54
Future Directions Improving more for Small ARs Other Probabilistic Models Using Stylometry Exploring Linkability in other Preference Databases More than one AR for different Users: Exploring it more

55
Conclusion Extensive Study to Assess Linkability of User Reviews For large set of users Using very simple features Users are very exposed even with simple features and large number of authors

56
Thank you all!

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google