Towards Accountable Machine Learning Systems

Towards Accountable Machine Learning Systems
Gihyuk Ko PhD Student, Department of Electrical and Computer Engineering Carnegie Mellon University December 15, 2016 *slides were borrowed and modified from Anupam Datta’s MIT Big Data Privacy Series talk on ‘Accountable Big Data Systems’ Analyzing a machine learning system to account for various properties: one is privacy. We recently submitted a paper on the use privacy,

Machine Learning Systems are Ubiquitous
Credit Web services Healthcare Education Law Enforcement … 빅 데이터를 사용하는 시스템은 이제 많은 곳에서 사용되고 있습니다. We all know that Machine Learning Systems are Ubiquitous. Machine learning and data analytics are increasingly being used in areas which have significant impact on peoples lives. [Decisions] For example, in predictive policing, the police pays you a visit based on advice from a computer. Transition: On the other hand these systems can be very opaque.

Machine Learning Systems Threaten Fairness Explicit Use [Datta, Tschantz, Datta 2015]
Online Advertising System 1816 Address High paying Job Ads Gender Major One drawback from a ML system being opaque is that it could threaten fairness, and sometimes they can do so by explicitly using the feature such as gender, race, etc. In 2015 our group had done an experiment on the online ad system (Google) which uses user data to output 311

Credit Decision Algorithm
Formalizing Explicit Use Quantitative Input Influence [Datta, Sen, Zick 2016] Credit Decision Algorithm Age 27 Workclass Private Education Preschool Marital Status Married Occupation Farming-Fishing Race White Gender Male Capital gain $41310 ….. Age 27 Workclass Private Education Preschool Marital Status Married Occupation Farming-Fishing Race White Gender Male Capital gain $41310 ….. How much causal influence do various inputs (features) have on a classifier’s decision about individuals or groups? Because… Okay… but why? Negative Factors: Occupation Education Level Positive Factors: Capital Gain Transparency Query Transparency Report

Challenge | Correlated Inputs
Conclusion: Measures of association not informative! Credit Decision Algorithm (only uses income) Age correlated Decision Income 하지만 Transition in: However, when inputs are correlated, input attribution can be challenging Transition out: Further, people need different kinds of transparency reports

Challenge | General Class of Transparency Queries
Individual Which input had the most influence in my credit denial? Group What inputs have the most influence on credit decisions of women? Transition in: Further, people need different kinds of transparency reports Transition out: To solve these problems .. Disparity What inputs influence men getting more positive outcomes than women?

Result | Quantitative Input Influence (QII)
A technique for measuring the influence of an input of a system on its outputs. Causal Intervention Deals with correlated inputs Quantity of Interest Supports a general class of transparency queries Transition in: To solve these problems we develop a family of measures known as quantitative input influence. Transition out: For the rest of this talk I will explain the definition of QII through these three key ideas.

Key Idea 1| Causal Intervention
Credit Decision Algorithm (only uses income) 45 30 23 64 Age 30 Decision $40K Income $30K $100K $90K $10K Slow it down. We go back to Joe’s credit denial. To determine whether Age had an influence on his outcome we replace his age with a random age from the population, and observe that his outcome does not change at all. On the other hand, when we replace his income, this has a large impact on his outcome. Essentially, by breaking the correlation between age and income, we can identify causal relations in the model. Intuitively, the influence of income Age Income 23 $10K 45 $90K 64 $100K 30 $40K Replace feature with random values from the population, and examine distribution over outcomes. Joe

QII for Individual Outcomes
𝑈 𝑖 Inputs: 𝑖∈𝑁 Classifier Outcome 𝑥 1 𝑥 2 … 𝑥 𝑖 𝑐 𝑋 𝑥 1 𝑥 2 … 𝑥 𝑖 𝑐 𝑋 −𝑖 𝑈 𝑖 𝑢 𝑖 Transition in: More formally More formally, this is what things look like. Consider a classifier that takes as input a vector of features x1 .. xn, and outputs 0 or 1. I want to measure the influence of input i on the classification of some individual x Transition out: One can generalize this.. 𝐏𝐫 𝑐 𝑋 =1 𝑋= 𝒙 Joe ] 𝐏𝐫 𝑐 𝑋 −𝑖 𝑈 𝑖 =1 𝑋= 𝒙 Joe ] Causal Intervention: Replace feature with random values from the population, and examine distribution over outcomes.

Key Idea 2| Quantity of Interest
Various statistics of a system: Classification outcome of an individual 𝐏𝐫 𝑐 𝑋 =𝑐( 𝒙 0 ) 𝑋= 𝒙 0 ] − 𝐏𝐫 𝑐 𝑋 −𝑖 𝑈 𝑖 =𝑐( 𝒙 0 ) 𝑋= 𝒙 0 ] Classification outcomes of a group of individuals 𝐏𝐫 𝑐 𝑋 =1 𝑋 is female] − 𝐏𝐫 𝑐 𝑋 −𝑖 𝑈 𝑖 =1 𝑋 is female] Disparity between classification outcomes of groups 𝐏𝐫 𝑐 𝑋 =1 𝑋 is male]−𝐏𝐫 𝑐 𝑋 =1 𝑋 is female] − 𝐏𝐫 𝑐 𝑋 −𝑖 𝑈 𝑖 =1 𝑋 is male]−𝐏𝐫 𝑐 𝑋 −𝑖 𝑈 𝑖 =1 𝑋 is female] Transition in: One can generalize this: Talk about disparate impact. General mathematical definition of quantity of interest. Transiton out: Putting these pieces together.

QII | Definition 𝜄 𝒬 𝒜 𝑖 = 𝒬 𝒜 𝑋 − 𝒬 𝒜 ( 𝑋 −𝑖 𝑈 𝑖 )
The Quantitative Input Influence (QII) of an input 𝑖 on a quantity of interest 𝒬 𝒜 (𝑋) of a system 𝒜 is the difference in the quantity of interest, when the input is replaced with random value via an intervention. 𝜄 𝒬 𝒜 𝑖 = 𝒬 𝒜 𝑋 − 𝒬 𝒜 ( 𝑋 −𝑖 𝑈 𝑖 ) Transition in: Putting these pieces together Transition out: To illustrate this simple QII definition,let’s look at some experiments.

A technique for measuring the influence of an input of a system on its outputs. Causal Intervention Deals with correlated inputs Quantity of Interest Supports a general class of transparency queries Transition: I’ll illustrate some of these concepts with an example of an explanation.

Experiments | Test Applications
arrests Predictive policing using the National Longitudinal Survey of Youth (NLSY) Features: Age, Gender, Race, Location, Smoking History, Drug History Classification: History of Arrests ~8,000 individuals income Income prediction using a benchmark census dataset Features: Age, Gender, Relationship, Education, Capital Gains, Ethnicity Classification: Income >= 50K ~30,000 individuals Transition in: We conducted to experiments. Transition out: Let’s look at a particular experiments. Implemented with Logistic Regression, Kernel SVM, Decision Trees, Decision Forest

Personalized Explanation | Mr X
Age 23 Workclass Private Education 11th Marital Status Never married Occupation Craft repair Relationship to household income Child Race Asian-Pac Island Gender Male Capital gain $14344 Capital loss $0 Work hours per week 40 Country Vietnam Age 23 Workclass Private Education 11th Marital Status Never married Occupation Craft repair Relationship to household income Child Race Asian-Pac Island Gender Male Capital gain $14344 Capital loss $0 Work hours per week 40 Country Vietnam Can assuage concerns of discrimination. income income

Personalized Explanation | Mr Y
Age 27 Workclass Private Education Preschool Marital Status Married Occupation Farming-Fishing Relationship to household income Other Relative Race White Gender Male Capital gain $41310 Capital loss $0 Work hours per week 24 Country Mexico Age 27 Workclass Private Education Preschool Marital Status Married Occupation Farming-Fishing Relationship to household income Other Relative Race White Gender Male Capital gain $41310 Capital loss $0 Work hours per week 24 Country Mexico Explanations of superficially similar people can be different. income

A technique for measuring the influence of an input of a system on its outputs. Causal Intervention Deals with correlated inputs Quantity of Interest Supports a general class of transparency queries

Machine Learning Systems Threaten Privacy Proxy Use [Datta, Fredrikson, Ko, Mardziel, Sen 2016]
Protected Attribute Pregnancy Associated Target’s Advertising System Scent-free Lotion Causally Used Pre-natal Vitamins Diaper Coupons Stationaries

Machine Learning Systems as Programs
Scent-free Lotion Pre-natal Vitamins Diaper Coupons How much associated are the subprogram’s outputs to the individuals’ protected attribute? How much causal influence do various subprograms have on a classifier’s decision? Subprogram Stationaries A. Use an existing association measure! OR ( IF { AND ( Shopped(Pre-natal vitamins), Shopped(Scent-free lotion), … ), Diaper Coupons, Nothing } IF { Shopped(Stationaries), Notebooks, Nothing } } ) One of the key insight we had was to view machine learning systems as ’programs’. By assuming the programs we can model all the intermediate computations as ’subprogram’ of Program

Recall: Causal Intervention
Credit Decision Algorithm (only uses income) 64 30 23 45 Age 30 Decision $40K Income $30K $100K $10K $90K Age Income 23 $10K 45 $90K 64 $100K 30 $40K Replace feature with random values from the population, and examine distribution over outcomes. Joe

Recall: QII for Individual Outcomes
𝑈 𝑖 Inputs: 𝑖∈𝑁 Classifier Outcome 𝑥 1 𝑥 2 … 𝑥 𝑖 𝑐 𝑋 𝑥 1 𝑥 2 … 𝑥 𝑖 𝑐 𝑋 −𝑖 𝑈 𝑖 𝑢 𝑖 𝐏𝐫 𝑐 𝑋 =1 𝑋= 𝒙 Joe ] 𝐏𝐫 𝑐 𝑋 −𝑖 𝑈 𝑖 =1 𝑋= 𝒙 Joe ] Causal Intervention: Replace feature with random values from the population, and examine distribution over outcomes.

Program Decomposition
OR ( IF { AND ( Shopped(Pre-natal vitamins), Shopped(Scent-free lotion), … ), Diaper Coupons, Nothing } IF { Shopped(Stationaries), Notebooks, Nothing } } ) 𝑐(𝑋)≡𝑐′(𝑋,𝑝(𝑋)) 𝑝(𝑋) 𝑐 𝑋 OR ( If { Diaper Coupons, Nothing } IF { Shopped(Stationaries), Notebooks, Nothing } } ) U 𝑐(𝑋,𝑈)≡𝑐′(𝑋,𝑈) 𝑐′(𝑋,𝑈) To analyze the program in more fine grained manner we define program decomposition Decomposition: we select a subprogram and view it as a separate random variable. Now, we think another program with the blue boxed part ’substituted’ by a separate random variable. By doing this we ’break any correlation’ of the new random variable that existed with the original variables. Define decomposition for each subprogram by replacing it with as the random variable.

𝑐 QII for Subprograms c’ c’ p(X) p(Y)
𝑦 1 𝑦 2 … 𝑦 𝑖 𝑌 𝑥 1 𝑥 2 𝑥 𝑖 𝑋 c’ p(Y) Inputs: 𝑖∈𝑁 Classifier Outcome 𝑥 1 𝑥 2 … 𝑥 𝑖 𝑋 𝑐 p(X) 𝑋 c’ 𝐏𝐫[𝑐′ 𝑋,𝑝(𝑋) =1] 𝐏𝐫[𝑐′ 𝑋,𝑝(𝑌) =1] Causal Intervention: Replace subprogram output with outcome of the same subprogram with different sample from the population, and examine distribution over outcomes.

Proxy Use | 2-phase definition
A subprogram that is associated to the protected attribute Use existing association measure! A subprogram that is influential to the output of the program Quantitative Subprogram Influence measure! 𝜺,𝜹 -proxy use With association measure 𝜺, influence measure 𝜹 Protected Attribute Associated Causally Used Influence: quantitatively measure how much the information on the subprogram Association: result of an intermediate computation is inference

Experiments | Test Applications
contra Advertisement targeting using the Indonesian Contraception Dataset Features: Education, Children, Husband’s Job, etc Classification: Contraception Methods Protected attribute: Religion ~6,000 individuals student Prediction of Academic Performance using Portuguese Student Alcohol dataset Features: Failures, Studytime, Father’s education level, Health status, etc Classification: Grade Protected attribute: Weekly alcohol consumption ~7,000 individuals Transition in: We conducted to experiments. Transition out: Let’s look at a particular experiments. Implemented with Logistic Regression, Decision Trees, Decision Forest

Proxy Use Detection | contra
Proxy use violation in submodel: ite(education < 3.5, ite(children < 2.5, ite(age < 30.50, true, false) education level is an indicator for religion: This is concerning as certain discrimination or bias could be done based on the education level contra

Proxy Use Detection | student
Proxy use violation in submodel: ite(school < 0.5, ite(studytime < 2.5, ite(Fedu < 3.50, false, true) studytime being less than 2.5 is an indicator for high weekly alcohol consumption: As study time is widely accepted to be directly correlated with the success in school, it’s okay to use! student

Accountable Machine Learning Systems
Use Restrictions in Machine Learning Systems [Datta, Fredrikson, Ko, Mardziel, Sen, Tschantz 2016] Do not use a protected information type (explicit or proxy use) for certain purposes with some exceptions Non-discrimination: Do not use race or gender for employment decisions; business necessity exceptions Use Privacy: Do not use health information for purposes other than those of healthcare context; exceptions for law enforcement Accountable Machine Learning Systems Oversight to detect violations and explain behaviors Correction to prevent future violations

Related Work: Quantitative Input Influence
Randomized Causal Intervention Feature Selection: Permutation Importance [Breiman 2001] Importance of Causal Relations [Janzing et al. 2013] Do not consider marginal influence or general quantities of interest Associative Measures Quantitative Information Flow: Appropriate for secrecy FairTest [Tramèr et al. 2015] Correlated inputs hide causality Interpretability-by-design Regularization for simplicity (Lasso) Bayesian Rule Lists [Letham et al. 2015] Potential loss in accuracy

Related Work: Accountability for Use Restrictions
Accounting for proxies and their causal use is missing Usage control in computer security, Sandhu and Park 2002 Information accountability, Weitzner et al. 2008 Audit algorithms for privacy policies, Garg, Jia, Datta 2011 Enforcing purpose restrictions in privacy policies, Tschantz, Datta, Wing 2012 Privacy compliance of big data systems, Sen, Guha, Datta, Rajamani, Tsai, Wing 2014 Fairness in big data systems Group fairness [Feldman+ 2015]: detection and repair of disparate impact; does not account for proxy usage in general Individual fairness [Dwork et al. 2011]: focus on correctness by construction not accountability

Open Problem Accountable deep learning to ensure non-discrimination and usage privacy

Thank you!

QII Supplementary

Challenge | Single Inputs have Low Influence
Classifier Only accept old, high-income individuals Young Age Decision Low Income Transition in: To see why we need something more than just influence of single inputs. The influence of individual inputs may not be significant in itself. For young, low income individual, age and income alone don’t influence the outcome. However, jointly age and income can significantly influence outcome Intersting things happen because of interactions between different features

Naïve Approach | Set QII
Replace 𝑋 𝑆 with a independent random value from the joint distribution of inputs 𝑆⊆𝑁. 𝜄 𝑆 =𝒬 𝑋 −𝒬( 𝑋 −𝑆 𝑈 𝑆 ) 𝑢 𝑖 𝑢 1 𝑆 𝑥 1 𝑥 2 … 𝑥 𝑖 𝑥 1 𝑥 2 … 𝑥 𝑖 𝑐 𝑐 Simple lifting of the randomized intervention from inputs 𝑖∈𝑁 to sets of inputs 𝑆⊆𝑁. Set and marginal influence are only stepping stones.

Marginal QII 𝜄 {age, income} −𝜄 {income}
Not all features are equally important within a set. Marginal QII: Influence of age and income over only income. 𝜄 {age, income} −𝜄 {income} But age is a part of many sets! 𝜄 {age, gender, job} −𝜄 {gender, job} 𝜄 {age} −𝜄 {} 𝜄 {age, gender} −𝜄 {gender} 𝜄 {age, job} −𝜄 {job} 𝜄 {age, gender, job} −𝜄 {gender, job} 𝜄 {age, gender, income} −𝜄 {gender, income} 𝜄 {age, gender, income,job} −𝜄 {gender, income,job}

Key Idea 3| Set QII is a Cooperative Game
set of agents value of subsets Voting Revenue Sharing Input Influence agents → features value → influence Transition in: Set QII can be viewed as a cooperative game. Cooperative games come in all shapes and sizes. Transition out: Turns out, according to a classic construction by Lloyd Shapley in 1953, … there is exactly one way of solving this problem, if we assume some very reasonable axioms.

Shapley Value [Shapley’53] For cooperative games, the only aggregation measure that satisfies symmetry, dummy, and monotonicity is: Need to compute sum over all subsets of features: Efficient approximation by randomly sampling sets Transition in: Turns out, according to a classic construction by Lloyd Shapley in 1953, … there is exactly one way of solving this problem, if we assume some very reasonable axioms. Transition out: Let’s look at what explanations look like in some of our experiments

A technique for measuring the influence of an input of a system on its outputs. Causal Intervention Deals with correlated inputs Quantity of Interest Supports a general class of transparency queries Cooperative Game Computes joint and marginal influence Performance QII measures can be approximated efficiently

Proxy Use Supplementary

Program Syntax Supports various families of classifiers including Decision Trees, families of Linear Classifiers, Random Forests, and so on.

Program Decomposition
OR ( IF { AND ( Shopped(Pre-natal vitamins), Shopped(Scent-free lotion), … ), Diaper Coupons, Nothing } IF { Shopped(Stationaries), Notebooks, Nothing } } ) Original program ≈ Sub-program OR ( If { Diaper Coupons, Nothing } IF { Shopped(Stationaries), Notebooks, Nothing } } ) U New program ≈ To analyze the program in more fine grained manner we define program decomposition Decomposition: we select a subprogram and view it as a separate random variable. Now, we think another program with the blue boxed part ’substituted’ by a separate random variable. By doing this we ’break any correlation’ of the new random variable that existed with the original variables.

Program Decomposition - Notations

Formalizing Proxy Use: 2-phase definition
A sub-program that is influential to the original program Qualitatively: Change [ 𝑝 1 ], does [ 𝑝 2 ] change? Quantitatively: QII A sub-program that is associated to the protected attribute Association measure: NMI 𝜺,𝜹 -proxy use: exhaustively search for every sub-programs Influence: quantitatively measure how much the information on the subprogram Association: result of an intermediate computation is inference

Towards Accountable Machine Learning Systems

Similar presentations

Presentation on theme: "Towards Accountable Machine Learning Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards Accountable Machine Learning Systems

Similar presentations

Presentation on theme: "Towards Accountable Machine Learning Systems"— Presentation transcript:

Similar presentations

About project

Feedback