The End of Anonymity Vitaly Shmatikov. Tastes and Purchases slide 2.

Slides:



Advertisements
Similar presentations
1 Privacy Enhancing Technologies Elaine Shi Lecture 2 Attack slides partially borrowed from Narayanan, Golle and Partridge.
Advertisements

Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
De-anonymizing social networks Arvind Narayanan, Vitaly Shmatikov.
Offline & Online Marketing Go Hand in Hand. What?! Not Everything Involves the Web??
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
COMP 381. Exercise: Data Collection 1.Who are the “fact collectors”? (make a list—be specific) 2.What KINDS of ‘facts’ are stored about you/us? (make.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Yusuf Simonson Title Suggesting Friends Using the Implicit Social Graph.
Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen,
1 Preserving Privacy in Collaborative Filtering through Distributed Aggregation of Offline Profiles The 3rd ACM Conference on Recommender Systems, New.
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
Privacy Policy, Law and Technology Carnegie Mellon University Fall 2007 Lorrie Cranor 1 Data Privacy.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy and Networks CPS 96 Eduardo Cuervo Amre Shakimov.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
The Privacy Tug of War: Advertisers vs. Consumers Presented by Group F.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
Data Privacy and Security Prof Sunil Wattal. Consumer Analytics  Analytics with consumer data to derive meaningful insights on actions and behaviors.
Private Analysis of Graphs
Protecting Your Identity While Attending College.
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Communicating Health news and Information Case analysis of digital health stratergies of three major networks : ABC, CNN and CBS Rajiv K Krishnaswami.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
Case Study: Department of Revenue Data Breach National Association of State Auditors, Comptrollers and Treasurers March 21, 2013.
Li Xiong CS573 Data Privacy and Security Healthcare privacy and security: Genomic data privacy.
Privacy risks of collaborative filtering Yuval Madar, June 2012 Based on a paper by J.A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten & V. Shmatikov.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
A PowerPoint Presentation by Helen Chelan Johnson.
Data Privacy Vitaly Shmatikov CS slide 2 uHealth-care datasets Clinical studies, hospital discharge databases … uGenetic datasets $1000 genome,
Phishing scams Phishing is the fraudulent practice of sending s purporting to be from reputable companies in order to induce individuals to reveal.
Privacy: Lessons from the Past Decade Vitaly Shmatikov The University of Texas at Austin.
Dimensions of Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
CPS 82, Fall Privacy l Taxonomy of Privacy  Understanding Privacy, Daniel Solove, MIT Press 2008 l Information Processing  Aggregation  Identification.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
The Data Protection Act & Facebook Privacy Group CTutor: Corina Cirstea.
Cyberspace Law Committee Meeting, August 3, 2012 Big Data Lois Mermelstein The Law Office of Lois D. Mermelstein
Time to Encrypt our DNA? Stuart Bradley Humbert, M., Huguenin, K., Hugonot, J., Ayday, E., Hubaux, J. (2015). De-anonymizing genomic databases using phenotypic.
Internet Safety. Sexual Predators Sexual Predators Harmful images – disturbing, overly graphic, explicit Harmful images – disturbing, overly graphic,
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Anonymity and Privacy Issues --- re-identification
Teens Credit- Day 3 Independent Living December 2, /09.
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
The world’s libraries. Connected. Managing your Private and Public Data: Bringing down Inference Attacks against your Privacy Group Meeting in 2015.
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
(555) Affordable Mobile Site Design & Mobile Marketing Solutions! Local Internet Marketing - For Local Businesses.
Privacy-safe Data Sharing. Why Share Data? Hospitals share data with researchers – Learn about disease causes, promising treatments, correlations between.
ROLE OF ANONYMIZATION FOR DATA PROTECTION Irene Schluender and Murat Sariyar (TMF)
Big Data Analytics: An Ethical Question Leah Korganowski COMP 607 – Fall 2015.
O n the Relative De-anonymizability of Graph Data: Quantification and Evaluation Shouling Ji, Weiqing Li, Shukun Yang and Raheem Beyah Georgia Institute.
Christian Citizenship in a Digital World Lesson 1: Online Privacy.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Gift Card Risk Mitigation – Presentation A
© 2013 Jones and Bartlett Learning, LLC, an Ascend Learning Company All rights reserved. Page 1 Fundamentals of Information Systems.
How to Protect Yourself from ID Theft and Social Engineering
Privacy-preserving Release of Statistics: Differential Privacy
Internet Privacy and You
Q4 : How does Netflix recommend movies?
Lecture 27: Privacy CS /7/2018.
Challenges? We got challenges!!!
Published in: IEEE Transactions on Industrial Informatics
TELE3119: Trusted Networks Week 4
Presentation transcript:

The End of Anonymity Vitaly Shmatikov

Tastes and Purchases slide 2

Social Networks slide 3

Health Care and Genetics slide 4

Web Tracking slide 5

Solution: Anonymity! “… breakthrough technology that uses social graph data to dramatically improve online marketing … "Social Engagement Data" consists of anonymous information regarding the relationships between people” “The critical distinction … between the use of personal information for advertisements in personally-identifiable form, and the use, dissemination, or sharing of information with advertisers in non-personally-identifiable form.” slide 6

Phew… slide 7

“Privacy-Preserving” Data Release xnxn x n-1  x3x3 x2x2 x1x1 Data “anonymization” “de-identification” “sanitization” Privacy! slide 8

Some Privacy Disasters What went wrong? slide 9

Data are “anonymized” by removing personally identifying information (PII) – Name, Social Security number, phone number, , address… what else? Problem: PII has no technical meaning – Defined in disclosure notification laws (if certain information is lost, consumer must be notified) – In privacy breaches, any information can be personally identifying The Myth of the PII slide 10

User 1 User 2 User N Item 1Item 2 Item M The Curse of Dimensionality slide 11 Row = user record Column = dimension Thousands or millions of dimensions – Netflix movie ratings: 35,000 – Amazon purchases: 10 7

Similarity Netflix Prize dataset: Considering just movie names, for 90% of records there isn’t a single other record which is more than 30% similar Netflix Prize dataset: Considering just movie names, for 90% of records there isn’t a single other record which is more than 30% similar Average record has no “similar” records Sparsity and “Long Tail” slide 12 Fraction of subscribers

Global surveillance Phishing Employers, insurers, stalkers, nosy friends Spammers Abusive advertisers and marketers Privacy Threats slide 13

User 1 User 2 User N Item 1Item 2 Item M It’s All About the Aux slide 14 No explicit identifiers What can the adversary learn by combining this with auxiliary information? Information available to adversary outside of normal data release process Information available to adversary outside of normal data release process

De-anonymizing Sparse Datasets Auxiliary information slide 15

De-anonymization Objectives Fix some target record r in the original dataset Goal: learn as much about r as possible Subtler than “identify r in the released dataset” – Don’t fall for the k-anonymity fallacy! Silly example: released dataset contains k copies of each original record – this is k-anonymous! – Can’t identify the “right” record, yet the released dataset completely leaks everything about r slide 16

Aux as Noisy Projection slide 17

How Much Aux Is Needed? How much does the adversary need to know about a record to find a very similar record in the released dataset? – Under very mild sparsity assumption, O(log N), where N is the number of records What if not enough Aux is available? – Identifying a small number of candidate records similar to the target still reveals a lot of information slide 18

De-Anonymization in Practice Sweeney (1998): Massachusetts hospital discharge dataset + voter database Narayanan and Shmatikov (2006): Netflix Prize dataset + IMDb Narayanan and Shmatikov (2009): social networks slide 19

De-anonymizing the Netflix Dataset 500K users, 18,000 movies 213 dated ratings per user, on average Two is enough to reduce to 8 candidate records Four is enough to identify uniquely (on average) Works even better with relatively rare ratings “The Astro-Zombies” rather than “Star Wars” Long Tail effect: most people watch obscure crap slide 21

Exploiting Data Structure slide 22

“Jefferson High”: Romantic and Sexual Network Real data! slide 23

Phone Call Graphs 2 trillion edges Examples of outsourced call graphs Hungary2.5M nodes France7M nodes India3M nodes 3,000 companies providing wireless services in the U.S 3,000 companies providing wireless services in the U.S slide 24

Structural De-anonymization Goal: structural mapping between two graphs For example, Facebook vs. anonymized phone call graph slide 25

Winning the IJCNN/Kaggle Social Network Challenge “Anonymized” graph of Flickr used as challenge for a link prediction contest De-anonymization = “oracle” for true answers – 57% coverage – 98% accuracy [Narayanan, Shi, Rubinstein] ? slide 26

Social networks – again and again Stylometry (writing style) Location data – De Montjoye et al. (2013): mobility traces from a cell phone carrier - 4 points is enough Credit card transaction meta-data – De Montjoye et al. (2015) – 4 purchases is enough More De-Anonymization slide 27

Lesson #1: De-anonymization Is Robust 33 bits of entropy – 6-8 movies, 4-7 friends, etc. Perturbing data to foil de-anonymization often destroys utility We can estimate confidence even without ground truth Accretive and iterative: more de-anonymization  better de-anonymization slide 28

PII is info “with respect to which there is a reasonable basis to believe the information can be used to identify the individual.” Lesson #2: “PII” Is Technically Meaningless Any piece of data can be used for re-identification! Narayanan, Shmatikov CACM column, 2010 Narayanan, Shmatikov CACM column, 2010 “blurring of the distinction between personally identifiable information and supposedly anonymous or de-identified information” slide 29