Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen,

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

A Common Sense Approach to Web Usability Steve Krug Highly Recommend Resource!
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.
Internet Online Safety How to have FUN and Stay in Control.
By Srishti Gahlot (sg2856) 1. 2 What do you mean by online behavior? Why do we need to analyze online behavior and personalize it? How do we analyze this.
The End of Anonymity Vitaly Shmatikov. Tastes and Purchases slide 2.
e-safety and cyber bullying
Students’ online profiles for employability and community Frances Chetwynd, Karen Kear, Helen Jefferis and John Woodthorpe The Open University.
The Burnet News Club THE SEVEN ‘C’S TRUTH CHECKER The Seven ‘C’s Truth Checker.
“Mortgages, Privacy, and Deidentified Data” Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Internet Privacy & Security A Presentation to WPI Student Pugwash Michael J. Ciaraldi Professor of Practice, WPI Computer Science 2003/10/02.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Web Security A how to guide on Keeping your Website Safe. By: Robert Black.
You Are What You Say: Privacy Risks of Public Mentions Dan Frankowski, Dan Cosley, Shilad Sen, Loren Terveen, John Riedl University of Minnesota.
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Internet Ethics Presented by: Daniel Wu Kalven Wu.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
Preserving Privacy in Clickstreams Isabelle Stanton.
1 Agenda 1. What is (Web) data mining? And what does it have to do with privacy? – a simple view – 2. Examples of data mining and "privacy-preserving data.
Do you know how to keep yourself safe?
Recommender Systems and Collaborative Filtering
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Digital Citizenship By Bhavna. Plagiarism Plagiarism is illegal and can get you arrested. If a teacher finds out you used plagiarism he/she can fail you.
Topic: Security / Privacy “Your Apps Are Watching You” Source: The Wall Street Journal Online Presented By: Corey Campbell.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.
What Is Wrong With This Slide? Dr. Steve Broskoske Misericordia University.
From Open towards Linking Marine Data? Geospatial Data Developments in UK Dave Morris (a marine biologist TEMPORARILY in a world of “surveyors” Strictly.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Reliability & Desirability of Data
Broadcasting News Trivia "LESSON PLANS." BBC News. BBC, 30 Jan Web. 19 Nov
BY: WHITLEY HAWTHORNE 4th Grade. Technology is a wonderful thing! We can explore the world and learn so many new things. BUT we have to be cautious and.
Digital Citizenship 6 th – 8 th Unit 1 Lesson 5 A Creator’s Rights What rights do you have as a creator?
Becoming a geographical researcher I will have to be a good ‘hunter-gatherer’ and get myself organised to keep things….. I will need to think like a detective….finding,
Friday, October 23, 2008 Objective: Students will be able to create their own text features to unformatted articles.
1.NET Web Forms Business Forms © 2002 by Jerry Post.
Digital Citizenship Lesson 3. Does it Matter who has your Data What kinds of information about yourself do you share online? What else do you do online.
Downloading and Installing Autodesk Revit 2016
Social Media 101 An Overview of Social Media Basics.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Technology can help us: Communicate with others Gather information Share ideas Be entertained Technology has improved our quality of life!
Create speaking avatars and use them as an effective learning tool.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
Securing Your Facebook Identity Nicholas A. Davis, CISA, CISSP UW-Madison Division of Information Technology 11/10/2015 UNIVERSITY OF WISCONSIN1.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
The Do’s and Don’ts of Responding … and How to Handle with Care Reviews:
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
6° of Darkness or Using Webs of Trust to Solve the Problem of Global Indexes.
What’s Next? Why are we here and what are we going to do? Why are we here and what are we going to do?
Anonymity and Privacy Issues --- re-identification
Rings of Responsibility
2004/051 >> Supply Chain Solutions That Deliver Users.
Introduction With the development of the Internet a phenomenon known as 'electronic commerce' or 'ecommerce' for short, has been growing. Ecommerce has.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
ROLE OF ANONYMIZATION FOR DATA PROTECTION Irene Schluender and Murat Sariyar (TMF)
Privacy, anonymity and other confusing words Przemek Jaroszewski CERT Polska/NASK.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Recommender Systems & Collaborative Filtering
Chapter 12: Researching Your Question!
Reliable and UNRELIABLE Sources
Navi 下一步工作的设想 郑 亮 6.6.
Libel and Satire Opinion Writing.
Sharing my own personal information
Presentation transcript:

Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen, John Riedl University of Minnesota

CDT Spring Research Forum Story: Finding “Subversives” “.. few things tell you as much about a person as the books he chooses to read.” – Tom Owad, applefritter.com

CDT Spring Research Forum Session Outline  Exposure: undesired access to a person’s information  Privacy Risks  Preserving Privacy  Bias and Sabotage: manipulating a trusted system to manipulate users of that system

CDT Spring Research Forum Why Do I Care?  As a businessperson  The nearest competitor is one click away  Lose your customer’s trust, they will leave  Lose your credibility, they will ignore you  As a person  Let’s not build Big Brother

CDT Spring Research Forum Risk of Exposure in One Slide + + = Your private data linked! algorithms Seems bad. How can privacy be preserved? Private Dataset YOU Public Dataset YOU

movielens.org -Started ~1995 -Users rate movies ½ to 5 stars -Users get recommendations -Private: no one outside GroupLens can see user’s ratings

Anonymized Dataset -Released Ratings, some demographic data, but no identifiers -Intended for research -Public: anyone can download

movielens.org Forums -Started June Users talk about movies -Public: on the web, no login to read -Can forum users be identified in our anonymized dataset?

CDT Spring Research Forum Research Questions  RQ1: RISKS OF DATASET RELEASE: What are risks to user privacy when releasing a dataset?  RQ2: ALTERING THE DATASET: How can dataset owners alter the dataset they release to preserve user privacy?  RQ3: SELF DEFENSE: How can users protect their own privacy?

CDT Spring Research Forum Motivation: Privacy Loss  MovieLens forum users did not agree to reveal ratings  Anonymized ratings + public forum data = privacy violation?  More generally: dataset 1 + dataset 2 = privacy risk?  What kind of datasets?  What kinds of risks?

CDT Spring Research Forum Vulnerable Datasets  We talk about datasets from a sparse relation space  Relates people to items  Is sparse (few relations per person from possible relations)  Has a large space of items i1i1 i2i2 i3i3 … p1p1 X p2p2 X p3p3 X …

CDT Spring Research Forum Example Sparse Relation Spaces  Examples  Customer purchase data from Target  Songs played from iTunes  Articles edited in Wikipedia  Books/Albums/Beers… mentioned by bloggers or on forums  Research papers cited in a paper (or review)  Groceries bought at Safeway  …  We look at movie ratings and forum mentions, but there are many sparse relation spaces

CDT Spring Research Forum Risks of re-identification  Re-identification is matching a user in two datasets by using some linking information (e.g., name and address, or movie mentions)  Re-identifying to an identified dataset (e.g., with name and address, or social security number) can result in severe privacy loss

CDT Spring Research Forum Former Governor of Massachusetts Story: Finding Medical records (Sweeney 2002) 87% of people in 1990 U.S. census identifiable by these!

CDT Spring Research Forum The Rebus Form + = Governor’s medical records!

CDT Spring Research Forum Related Work  Anonymizing datasets: k-anonymity  Sweeney 2002  Privacy-preserving data mining  Verykios et al 2004, Agrawal et al 2000, …  Privacy-preserving recommender systems  Polat et al 2003, Berkovsky et al 2005, Ramakrishnan et al 2001  Text mining of user comments and opinions  Drenner et al 2006, Dave et al 2003, Pang et al 2002

CDT Spring Research Forum RQ1: Risks of Dataset Release  RQ1: What are risks to user privacy when releasing a dataset?  RESULT: 1-identification rate of 31%  Ignores rating values entirely!  Can do even better if text analysis produces rating value  Rarely-rated items were more identifying

CDT Spring Research Forum Glorious Linking Assumption  People mostly talk about things they know => People tend to have rated what they mentioned  Measured P(u rated m | u mentioned m) averaged over all forum users: 0.82

CDT Spring Research Forum Algorithm Idea All Users Users who rated a popular item Users who rated a rarely rated item Users who rated both

>=16 mentions and we often 1-identify More mentions => better re-identification

CDT Spring Research Forum RQ2: ALTERING THE DATASET  How can dataset owners alter the dataset they release to preserve user privacy?  Perturbation: change rating values  Oops, Scoring doesn’t need values  Generalization: group items (e.g., genre)  Dataset becomes less useful  Suppression: hide data  IDEA: Release a ratings dataset suppressing all “rarely-rated” items

Drop 88% of items to protect current users against 1- identification 88% of items => 28% ratings

CDT Spring Research Forum RQ3: SELF DEFENSE  RQ3: How can users protect their own privacy?  Similar to RQ2, but now per-user  User can change ratings or mentions. We focus on mentions  User can perturb, generalize, or suppress. As before, we study suppression

Suppressing 20% of mentions dropped 1- ident some, but not all Suppressing >20% is not reasonable for a user

CDT Spring Research Forum Another Strategy: Misdirection  What if users mention items they did NOT rate? This might misdirect a re-identification algorithm  Create a misdirection list of items. Each user takes an unrated item from the list and mentions it. Repeat until not identified.  What are good misdirection lists?  Remember: rarely-rated items are identifying

Rarely-rated items don’t misdirect!Popular items do better, though 1-ident isn’t zero Better to misdirect to a large crowd Rarely-rated items are identifying, popular items are misdirecting

CDT Spring Research Forum Exposure: What Have We Learned?  REAL RISK  Re-identification can lead to loss of privacy  We found substantial risk of re-identification in our sparse relation space  There are a lot of sparse relation spaces  We’re probably in more and more of them available electronically  HARD TO PRESERVE PRIVACY  Dataset owner had to suppress a lot of their dataset to protect privacy  Users had to suppress a lot to protect privacy  Users could misdirect somewhat with popular items

CDT Spring Research Forum Advice: Keep Customer’s Trust  Share data rarely  Remember the governor: (zip + birthdate + gender) is not anonymous  Reduce exposure  Example: Google will anonymize search data older than 24 months

CDT Spring Research Forum AOL: 650K users, 20M queries  Data wants to be free  Government subpoena, research, commerce  People do not know the risks  AOL was text, this is items NY Times: searched for “dog that urinates on everything.”

CDT Spring Research Forum Discussion #1: Exposure  Examples of sparse relation spaces?  Examples of re-identification risks?  How to preserve privacy?