Presentation on theme: "Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu."— Presentation transcript:
Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu
What is sample selection bias? Inductive learning: training data (x,y) is sampled from the universe of examples. In many applications: training data (x,y) is not sampled randomly. Insurance and mortgage data: you only know those people you give a policy. School data: self-select
Ubiquitous Loan Approval Drug screening Weather forecasting Ad Campaign Fraud Detection User Profiling Biomedical Informatics Intrusion Detection Insurance etc
Different types of sample selection bias There are different possibilities of how (x,y) is selected S=1 denotes (x,y) is chosen. S is independent from x and y. Total random sample. S is dependent on y not x. Class bias S is dependent on x not on y. Feature bias. S is dependent on both x and y. Both class and feature.
Our method Structural Discovery via automatic clustering Key Idea: (1)Binary divide. (2)Stop dividing when most of the labeled data in the cluster have the same label
Our method Structural Re-balancing via sample selection Key idea: (1)Select the same proportion from each cluster. (2)Select those confident and representative examples. (3)Label the unlabeled examples by neighbors
Our method Theoretical analysis: Lemma 3.1 answers that why select the same proportion of examples from each cluster can reduce sample selection bias? Lemma 3.2 derives a criterion to select confident examples.
Feature Bias Accuracy of corrected minus Accuracy of original
Class Bias Accuracy of corrected minus Accuracy of original