Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anti-discrimination and privacy protection in released datasets Sara Hajian Josep Domingo-Ferrer.

Similar presentations


Presentation on theme: "Anti-discrimination and privacy protection in released datasets Sara Hajian Josep Domingo-Ferrer."— Presentation transcript:

1 Anti-discrimination and privacy protection in released datasets Sara Hajian Josep Domingo-Ferrer

2 Data mining There are negative social perceptions about data mining, among which potential Privacy invasion Potential discrimination

3 Discrmination Discrimination is unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit.

4 Discrimination Example: U.S. federal laws prohibit discrimination on the basis of: Race, Color, Religion, Nationality, Sex, Marital status, Age, Pregnancy In a number of settings: Credit/insurance scoring Sale, rental, and financing of housing Personnel selection and wage Access to public accommodations, education, nursing homes, adoptions, and health care.

5 Discrimination Discrimination can be either direct or indirect: Direct discrimination occurs when decisions are made based on sensitive attributes. Indirect discrimination occurs when decisions are made based on non-sensitive attributes which are strongly correlated with biased sensitive ones.

6 Discrimination in Data mining Automated data collection and Data mining techniques such as classification rule mining have paved the way to making automated decisions: loan granting/denial insurance premium computation Personnel selection and wage

7 Discrimination in Data mining If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, discriminatory decisions may ensue. Anti-discrimination techniques have been introduced in data mining Discrimination discovery Discrimination prevention

8 Discrimination in Data mining Discrimination discovery Consists of supporting the discovery of discriminatory decisions hidden, either directly or indirectly, in a dataset of historical decision records.

9 Discrimination Discovery Different measures of discrimination power of the mined decision rules can be defined, according to the provision of different anti-discrimination regulations. Extended lift (elift) Selection lift (slift)

10 Discrimination in Data mining Discrimination prevention Consists of inducing patterns that do not lead to discriminatory decisions even if trained from a dataset containing them.

11 Discrimination Prevention How can we train an unbiased classifier when the training data is biased? As for privacy, the challenge is to find an optimal trade-off between (measurable) protection against unfair discrimination, and (measurable) utility of the data/models for data mining.

12 Discrimination Prevention Methods: Transform the source data Modify the data mining methods Modifying discriminatory models

13 The framework The framework for discrimination prevention can be described in terms of two phases: Discrimination Measurement Data Transformation

14 Data transformation The purpose is transform the original data DB in such a way to remove direct and/or indirect discriminatory biases, with minimum impact on the data and on legitimate decision rules, so that no unfair decision rule can be mined from the transformed data.

15 Data transformation As part of this effort, the metrics should be developed that specify which records should be changed, how many records should be changed and how those records should be changed during data transformation.

16 Utility measures Measuring direct discrimination removal Measuring indirect discrimination removal Measuring Data Quality Misses Cost (MC) Ghost Cost (GC)

17 Thanks for your attention


Download ppt "Anti-discrimination and privacy protection in released datasets Sara Hajian Josep Domingo-Ferrer."

Similar presentations


Ads by Google