Presentation on theme: "Privacy-Preserving Data Sharing Michael Siegenthaler Ken Birman Cornell University."— Presentation transcript:
Privacy-Preserving Data Sharing Michael Siegenthaler Ken Birman Cornell University
Introduction Today, personal data is typically stored electronically But systems at distinct organizations have no way to communicate with each other ID
SSNName… Alice Bob PatientIDName… X1234John X7890Bob SSNName… Cathy Robert General Hospital Acme Food and Drug Special Treatment Clinic, Inc. Legacy databases System Model (Each stored at at a data owner)
Example Query Drug interaction check at pharmacy –A pharmacist is dispensing a drug, doesn’t know what else the patient may be taking –Patient’s medical record is stored at primary care provider and various specialists Is it safe for the patient to take this drug?
Guarantees Data privacy –E.g. pharmacist receives yes/no answer, not the underlying data Query privacy –E.g. hospital does not learn which drug is currently being dispensed Anonymous communication –E.g. hospital and pharmacy do not learn each other’s identities
Anonymous Communication Onion skin routing –Providers P i –Encryption function E –Public keys K Pi Example: –Reference to patient 34 at Provider 2 routed through provider Provider 1
Requirements “Locate” remote records –Translate a real-world identifier (name, SSN, DOB...) into a data handle, an onion skin route that can be used to communicate with the providers where the data owners Execute the desired query –Use data handles to perform a privacy-preserving query
Global Search Mechanism Search for user with SSN Hierarchy of provider groups –Each group has a designated contact who tracks its membership
Adjust Bloom filter parameters for desired trade-off between privacy and performance
General HospitalAcme Food and Drug Random Intermediary Query Execution Prescription record with name/address stripped Record access request Yes/no answer Drug interaction query All messages are sent anonymously using a MIX The hospital does not learn the nature of the query The pharmacy does not learn which other drugs the patient is taking The random intermediary cannot do anything nefarious with the data it has received, since that data is out of context Example: A pharmacy checking for drug interactions
SELECT EXISTS ( SELECT * FROM conflicts CROSS JOIN nonces INNER JOIN remote(drug_history) ON nonces.nonce = drug_history.nonce WHERE conflicts.drug = drug_history.drug ); query_table drugnonce A____Ω(34) A____Ω(56) B____Ω(34) B____Ω(56) Query to find drug interactions Query formulated at the pharmacy: nonces nonce Ω(34) Ω(56) conflicts drug A____ B____
mix_host Split query: data gathering drug_history noncedrug 34A____ SEND ( SELECT nonce,drug FROM drug_history WHERE drug_history.nonce = Ω(34) ); Query sent to the data owner(s):
SELECT EXISTS ( SELECT * FROM query_table INNER JOIN drug_history ON query_table.nonce = drug_history.nonce WHERE conflicts.drug = drug_history.drug ); Split query: joining Query executed at the third-party MIX host: result exists 1 drug_history noncedrug 34A____ query_table drugnonce A____Ω(34) A____Ω(56) B____Ω(34) B____Ω(56)
Pharmacy mix_host_1 (on hospital’s behalf) mix_host_2 (on other pharmacy’s behalf) Answering the query (no conflict here) YES Is there a conflict? result exists 1 result exists 0 (conflict found)
Conclusion and Future Work Selective sharing of personal information across distributed databases –Data privacy –Query privacy –Anonymous communication Working on: how to enforce a policy on which data may be revealed to whom Also: how to prevent data mining attacks?