Presentation is loading. Please wait.

Presentation is loading. Please wait.

- 1 -. - 2 - Agenda  Introduction and background  Attacks: dimension and Classification  Attack Types  Effectiveness analysis  Countermeasures 

Similar presentations


Presentation on theme: "- 1 -. - 2 - Agenda  Introduction and background  Attacks: dimension and Classification  Attack Types  Effectiveness analysis  Countermeasures "— Presentation transcript:

1 - 1 -

2 - 2 - Agenda  Introduction and background  Attacks: dimension and Classification  Attack Types  Effectiveness analysis  Countermeasures  Privacy aspects  Discussion

3 - 3 - Introduction/ Background  In the real world the assumption of having only honest and fair users may not hold: – in particular when we consider that the proposals of a recommender system can influence the buying behavior of users and real money comes into play.  A malevolent user have different goals – Increase the sale of some items by manipulating the recommender system –Individuals might be interested to decrease the rank of other items (cause damage to competitors ) –Some simply may want to sabotage the system  We call such users who tries to influence the functioning of the system intentionally, an attacker. – When a person expresses his or her genuine negative opinion – which can be based on any reasons- this is not seen as an attack

4 - 4 - Introduction / Background  Attacker’s simple strategy: –Influence (bias) system behaviors by (Automatically) creating numerous fake accounts / profiles –Issue high or low ratings to the "target item" Will not work for neighbor-based recommenders More elaborate attack models required Goal is to insert profiles that will appear in neighborhood of many other users

5 - 5 - Example profile injection  Assume that a memory-based collaborative filtering is used with: –Pearson correlation as similarity measure –Neighborhood size of 1  Only opinion of most similar user will be used to make prediction Item1Item2Item3Item4…TargetPearson Alice5341…? User13125…5-0.54 User24333…2 0.68 User33315…4-0.72 User41552…1-0.02

6 - 6 - Example of profile injection  Assume that a memory-based collaborative filtering is used with: –Pearson correlation as similarity measure –Neighborhood size of 1  Only opinion of most similar user will be used to make prediction Item1Item2Item3Item4…TargetPearson Alice5341…? User13125…5-0.54 User24333…2 0.68 User33315…4-0.72 User41552…1-0.02 User2 most similar to Alice

7 - 7 - Example of profile injection  Assume that a memory-based collaborative filtering is used with: –Pearson correlation as similarity measure –Neighborhood size of 1  Only opinion of most similar user will be used to make prediction User2 most similar to Alice Attack Item1Item2Item3Item4…TargetPearson Alice5341…? User13125…5-0.54 User24333…2 0.68 User33315…4-0.72 User41552…1-0.02 Attack5343…5 0.87

8 - 8 - Item1Item2Item3Item4…TargetPearson Alice5341…? User13125…5-0.54 User24333…2 0.68 User33315…4-0.72 User41552…1-0.02 Attack5343…5 0.87 Example of profile injection  Assume that a memory-based collaborative filtering is used with: –Pearson correlation as similarity measure –Neighborhood size of 1  Only opinion of most similar user will be used to make prediction Attack most similar to Alice Attack User2 most similar to Alice

9 - 9 - Attacks Dimensions  Attack dimensions –Push attack:  Increase the prediction value of a target item –Nuke attack:  Decrease the prediction value of a target item –Make the recommender system unusable as a whole  No technical difference between push and nuke attacks  Nevertheless Push and Nuke attacks are not always equally effective –E.g. push attacks is more effective than nuke attacks in some situations  Another differentiation factor between attacks: –Where is the focus of an attack? Only on particular users and items? –Targeting a subset of items or users might be less suspicious –More focused (segmented) attacks may be more effective  attack profile can be more precisely defined

10 - 10 - Attacks Classification  Attacks in RS are classified based on: –Cost  How costly is it to make an attack?  How many profiles have to be inserted?  Is knowledge about the ratings matrix required? –usually it is not public, but estimates can be made –Algorithm dependability  Is the attack designed for a particular recommendation algorithm? –Detectability  How easy is it to detect the attack by system admin or automated monitoring tool

11 - 11 - Attack Types 1.The Random Attack –All item ratings of the fake profile are filled with random values drawn from the normal distribution  Typical distribution of ratings in database is known, e.g., for the movieLens (Average 3.6, standard deviation around 1.1) –Idea:  generate profiles with "typical" ratings so they are considered as neighbors to many other real profiles –Consider high/low ratings for target items to launch push or nuke attack –Attack requires limited knowledge about the rating database ; however, it is less effective compared with more advanced models

12 - 12 - Attack Types 2. The Average Attack  The fake profile is filled with the average rating of each item  intuitively, such fake profiles should have more neighbors –more details about the existing rating datasets are taken into account.  additional cost involved: find out the average rating of an item  more effective than Random Attack in user-based CF  Quite easy to determine average rating values per item –Values explicitly provided when item is displayed

13 - 13 - Attack Types 3.Bandwagon Attack  Simple idea: –Injected profiles contain only high ratings for “popular items” that are liked by large number of users –Will intuitively lead to more neighbors because  popular items will have many ratings and  rating values are similar to many other user-profiles  Example: Injecting a profile with high rating values for the Harry Potter series  Low-cost attack –Set of top-selling items or popular items can be easily determined  Equally harmful as the average attack ; however, –Does not require additional knowledge about mean item ratings as in Average attack

14 - 14 - Attack Types 4. Segment Attack  Designing an attack that aims to push target item A  The idea –Identify a subset of user community that is interested in items that are similar to item A –Fake profiles includes  high ratings for similar items to target item and  random or low ratings for dissimilar items to target item  For example: Push the new Harry Potter book –The attacker includes positive ratings other fantasy books in the injected profile and low ratings for other genres. –Harry Potter book will be recommended to typical fantasy book reader  Additional knowledge (e.g. genre of a book) is required  The segment attack is designed to defeat item-based CF –It is also effective for user-based CF

15 - 15 - Attack Types 5. Special nuke attacks  Love/hate attack –Target item is given the minimum value –Other items in the injected profiles are given the highest possible rating value –Serious effect on system’s recommendations when goal is to nuke an item in user-based CF –Other way around (push an item) it is not effective  A detailed analysis of the reasons for this asymmetry is not known.. Why??  Reverse bandwagon –Find items that are disliked by many people –injected profiles are filled with items that have minimum rating value as well as target item with low rating –Associate target item with those items that are disliked by many people.

16 - 16 - Attack Types  Described attack types are based on the assumption that “standard” recommender systems collects explicit ratings from registered users. 6. Clickstream attacks and implicit feedback –Attacks based on implicit feedback e.g. users’ click behavior –A “click-stream” recommender systems (e.g. Amazon.com) perform personalization by mining usage logs of different users on the website  Inform users that “ users who view this book also viewed these items” –The recommendation process of click-stream RS 1.RS preprocesses web log data is preprocessed and extracts users’ sessions –Users’ session consists of session ID and sets of visited pages 2.RS mine these data and identify different navigation patterns 3.At run-time, RS compares set of viewed page of current users with these navigation patterns and predict the pages that user may be interested. –Prediction is based on Cosine similarity measure in user-based CF

17 - 17 - Attack Types  Attacks on click-stream RS –Attackers employ crawlers that automatically browse different pages with the goal of associating a target page with other pages  Push target page among other pages so that it appear on the recommendation list more often –Segment Attack  the crawler employed by attacker visit a target page, together with a specific subset of pages that are of interest of certain community –Popular Attack (Bandwagon Attack)  The crawler employed by attacker visit the target page together with the most popular pages  Recommender systems based on usage logs are vulnerable to the crawling attacks –Small attack sizes are sufficient to bias the systems –No advanced countermeasure have been reported

18 - 18 - Effectiveness Analysis  The metrics to measure the actual influence of an attack on the outcome of RS –Robustness  Measures the shift in overall accuracy before and after the attack –Stability  Measures the change in prediction for a target item (before/after attack)  In addition: rank metrics –How often does an item appear in Top-N lists (before/after attack)

19 - 19 - Effectiveness Analysis  Effect depends mainly on the attack size (number of fake profiles inserted)  User-based CF recommenders: –Bandwagon / Average Attack:  Small number of well-designed item ratings are required to effect the system –Prediction shift of 1.5 points on a 5-point scale requires 3% attack size (i.e., 3% of profiles are fake) –Problems for attackers  3% attack size means inserting e.g., 30,000 fake profiles into one-million rating database  Determining the average of 3% of the items in a several-million item database is also problematic  Item-based CF recommenders –Far more stable; only 0.15 points prediction shift achieved even with a 15% attack size –Exception: Segment attack successful (was designed for item-based method) –Hybrid recommenders and other model-based algorithms cannot be easily biased (with the described/known attack models)

20 - 20 - Countermeasures How can we protect our systems against different attack models?  Using model-based with additional information (hybrid algorithm) –RS can exploit information about trust among different participants in the community –More robust against profile injection attacks –Less vulnerable  Increase profile injection costs –Standard mechanism to prevent the automatic creation of accounts is : 1.Captchas: a challenge-response test designed to find out whether the user of a system is a computer or a human  However, Captchas can be solved by low-cost outsourced labor manually. 2. Limiting the number of allowed profile creations for a single IP address with a certain time frame

21 - 21 - Countermeasures II  Use statistical attack detection methods 1 st Approach: to automatically detect suspicious ratings in the rating database –Suspicious ratings:  Ratings are unusual when compared with other ratings  Entered into system in a short time 2 nd Approach: detect groups of users who collaborate to push/nuke items  Detect cluster of users who co-rated many items and also gave similar ratings for them 3 rd Approach: monitor development of ratings for an item within a short time  changes in average rating  changes in rating entropy: show developments in the distribution of the ratings 4 th Approach: use machine-learning methods to discriminate real from fake profiles (anomaly detection method)

22 - 22 - Privacy aspects in (CF) Recommender Systems  Rating databases of collaborative filtering recommender systems contain detailed information about the individual tastes and preferences of their users –Problem:  Store and manage sensitive customer information  personal information is particularly valuable in monetary terms: –Detailed customer profiles are the basis for market intelligence such as segmentation of consumers  Ensuring customer privacy is important for the success of RS  users refrain from using the application if privacy leaks get publicly known

23 - 23 - Privacy aspects  Main architectural assumption of CF-Recommender system is –One central server holding the database and –the plain (non-encrypted) ratings are stored in this database  Once an attacker achieved access to that (central) system, all information can be directly used  Prevent such privacy breaches by –Distributing the information or –Avoiding the exchange, transfer or central storage of the raw user ratings.

24 - 24 - First approach in Preserving user privacy: Data perturbation  Main Idea: Instead of sending the raw ratings to the central server, a user first obfuscates (disguises) his/her ratings by applying random data perturbation –Server although does not know the exact values of the customer ratings –Accurate recommendation can still be made because:  The range of data and their distribution is known –Server compute recommendation based on aggregation of large number of obfuscated data sets  Tradeoff between degree of obfuscation and an accuracy of recommendation –The more "noise" in the data,  the better users' privacy is preserved  the harder the approximation of real data for the server –Reduce accuracy of the recommendation

25 - 25 - Data perturbation (Example)  Consider a case in which the server must do a computation based on the sum of a vector of numbers in A –Vector of numbers (ratings) provided by client –Client disguise by adding vector – taken from uniform distribution –Pertubed vector are sent to server  Server does not know original ratings but good estimation can be made of the sum of the vectors: –If range of distribution is known and –enough data are available

26 - 26 - Second approach in Preserving user privacy: Distributed collaborative filtering  Distribute knowledge and avoid storing the information in one central place – Agents participating in such a recommendation community may then decide  with whom they share their information and,  whether they provide the raw data or some randomized or obfuscated version  Peer-to-peer (P2P) CF –Exchange rating information in a scalable P2P network –Active user broadcasts a query (vector of user´s item ratings + request for target item rating) –Peers calculate similarity between received and other known vectors  If similarity > threshold, target item rating returned to requester  If not, query forwarded to the neighboring peers –Active user calculates prediction with received ratings

27 - 27 - Distributed collaborative filtering with obfuscation  Combines P2P data exchange and data obfuscation  Instead of broadcasting the "raw" profile only obfuscated version is published  Peers received this broadcast compute the similarity of the obfuscated profile with their own and return –A prediction for target item –Degree of profile similarity  Active user –collects these answers and –calculates a prediction using standard nearest-neighbor-method  Obfuscation will help to preserve privacy of participants; however, –Obfuscation of requester profile deteriorates recommendation accuracy – the more profiles are obfuscated, the more imprecise the results  The responding agent’s privacy is also endangered when he/she returns a rating for the target item – Although this is merely one single rating, an attacker could use a series of requests (probe attack) to incrementally reconstruct the profile of an individual respondent –Advisable to perturb the profiles of respondent agents

28 - 28 - Distributed CF with estimated concordance measures  Picks up tradeoff problem "privacy vs. accuracy” in distributed CF.  Main idea: Do not use standard similarity measure (like Pearson)  Instead, use concordance measure with comparable accuracy to Pearson without breaching the user privacy –Given set of items rated by user A and user B. Determine: –number of concordant  Items on which both users have the same opinion –number of discordant  Items on which their disagree –number of items for which their ratings are tied  Same opinion or one of the users has not rated item –Association between A and B computed by Somers' d measure

29 - 29 - Distributed CF with estimated concordance measures  The problem of privacy, of course, is not completely solved when using this new measure, –because its calculation requires knowledge about the user ratings. –Lathia et al. (2007) therefore propose to exploit the transitivity of concordance to determine the similarity between two users by comparing their ratings to a third set of ratings.  The idea is that: –when user A agrees with a third user C on item i and user B also agrees with user C on item i, it can be concluded that user A and user B are concordant on this item. –Discordant and tied opinions between A and B can be determined in a similar way.

30 - 30 - Community-building and aggregates  Canny (2002a) proposed that the participants in the network form knowledge communities that they share their information –inside the community or –with outsiders  Active user can derive predictions from shared information  Information are aggregated based on different methods e.g. SVD (Canny 2002a)  Individual user ratings are not visible to users outside the community  Use of cryptographic schemes for secure communication between participants in the network  The question of how to organize the required community formation process – in particular, in the context of the dynamic web environment – not yet fully answered and can be a severe limitation in practical settings

31 - 31 - Discussion & summary  Research on attacks –Vulnerability of some existing methods shown –Incorporation of more knowledge-sources /hybridization may help –To detect attack: monitor the evolution of the rating database and detect atypical rating pattern WRT certain time window  Practical aspects –No public information on large-scale real-world attack available  Industries wont interested to circulate information about attacks or privacy problems because there is money involved –Attack sizes are still relatively high –More research and industry-collaboration required


Download ppt "- 1 -. - 2 - Agenda  Introduction and background  Attacks: dimension and Classification  Attack Types  Effectiveness analysis  Countermeasures "

Similar presentations


Ads by Google