Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.

Similar presentations


Presentation on theme: "Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory."— Presentation transcript:

1 Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory University, Atlanta, GA, USA 2 CIISE, Concordia University, Montreal, QC, Canana Problem Statement We study the problem of anonymizing microdata with quasi-sensitive (QS) attributes which are not sensitive by themselves, but can be linked to external knowledge to reveal indirect sensitive information of an individual. (a) Original microdata with quasi-sensitive attribute symptoms (b) External knowledge that maps symptoms to disease (c) A generalized table that cannot prevent indirect disclosure of disease through symptoms Figure 1. Anonymizing data with QS attributes Preliminary Results With the Mondrian generalization and our suppression algorithm implemented in C++, we conducted experiments with: 1) a dataset with 3000 tuples augmented from the Adult dataset, with 8 QI attributes and 9 synthesized QS terms per tuple, and 2) an external table with 3000 pieces of knowledge labels linked to random QS terms with Poison distribution. The external knowledge table E has each row as a pair (Li, Si), i = 1, 2,..., |E|, where Li is a sensitive label and S i is a corresponding set of QS values. All sensitive labels that can be linked to the d tuples in a QI group G with quasi-identifying (QI) vector q is ∪ d i=1 K(tp i ), the sensitive label set of G. The attacker’s prior belief α (q,L) and posterior belief β (q,L) are the probabilities that a target tp with QI-vector q is linked to a label L before and after the data release. Definition (QS (c,l)-diversity). A group G satisfies QS (c,l)- diversity if and only if p 1 ≤c (p l + p l +1 +... + p | ∪ di=1K(tpi)| ), where p 1, p 2,..., p | ∪ di=1K(tpi)| are the values of β(q,L i ) in decreasing order. A table D ∗ satisfies QS (c,l)-diversity if every group satisfies QS (c,l)-diversity. Definition (QS t-closeness). A group G satisfies QS t-closeness if and only if the distance between α (q,L) and β (q,L) is no more than a threshold t. A table D ∗ satisfies QS t-closeness if every group satisfies QS t-closeness. Figure 5. Two-phase algorithm for QS t-closeness showing the trade-off between better privacy and smaller removal cost and benefit of the two-phase algorithm compared to generalization only approach. Algorithm Figure 2. Disclosure risks with QS attributes Formal notions of QS l-diversity and QS t-closeness that extend l-diversity and t-closeness to prevent indirect attribute disclosure due to QS attribute values. A two-phase algorithm that combines generalization and value suppression to achieve QS l-diversity and QS t-closeness. Contributions Definitions Phase 1 (QI generalization). Given D, an intermediate dataset Dg is obtained that satisfies k-anonymity. Phase 2 (QS suppression). Given Dg, a suppression algorithm is used to remove proper QS values (items) until every QI group satisfies QS (c,l)-diversity or QS t-closeness. Greedy search heuristics with dynamic reordering of tailsets that contain potential values to be removed in the next step to enable quick return of result Dynamic updates when a solution with a lower cost is found to enable continuous improvement of the result within a bounded time period. Figure 3. QS suppression search tree and algorithm features Figure 4. QS suppression for QS (c,l)-diversity showing adaptive QS suppression outperforms baseline DFS search significantly


Download ppt "Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory."

Similar presentations


Ads by Google