Presentation on theme: "Coping with Missing Data for Active Learning 02-750 Automation of Biological Research"— Presentation transcript:
Coping with Missing Data for Active Learning 02-750 Automation of Biological Research email@example.com
What is Missing? In active learning the category label is missing, and we can query an oracle, mindful of cost What else can be missing? – Features: we may not have enough for prediction – Feature combinations: beyond those the classifier is able to generate automatically (e.g. XOR, ratios) – Values of features: Not all instances have values for all their features. – Feature relevance: Some features are noisy or irrelevant – Feature redundancy: e.g. high feature co-variance
Reducing the Feature Space Feature selection – Subsample features using IG, MI, … Well studied, e.g. Yang & Pedersen ICML 1997 – Wrapper methods Inefficient but accurate, less studied Feature projection (to lower dimensions) – LDA, SVD, LSI Slow, well studied, e.g. Falluchi et al 2009 – Kernel functions on feature sub-spaces
Missing Feature Values Active learning of features – Not as extensively studied as active instance learning (See Saar-Tsechansky et al, 2007) – Determines which feature values to seek for given instances, or which features across the board – Can be combined with active instance learning But, what if there is no oracle? – Impossible to get feature values – Too costly or too time consuming – Do we ignore instances with missing features?
How to Cope with Missing Features ML training assumes feature completeness – Filter our features that are mostly missing – Filter out instances with missing features – Impute values for missing features – Radically change ML algorithms When do we do each of the above? – With lots of data and few missing features… – With sparse training data and few missing… – With sparse data and mostly missing features…
Missing Feature Imputation How do we estimate missing feature values? – Infer the mean value across all instances – Infer the mean value in neighborhood – Apply a classifier with other features as input and missing feature value as y (label) How do we know if it makes a difference? – Sensitivity analysis (extrema, pertubations) – Train without instances with missing features vs instances with imputed values for missing features
More on Missing Values Missing Completely at Random (MCAR) – It is generally impossible to prove MCAR or MAR Missing at Random (MAR) – Statisticians assume MAR as default Missing values that depend on observables – Imputation via classification/regression Missing valued that depend on unobservables Missing depending on the value itself
9 Imputation – Example [From: Fan 2008] How to impute the missing SCL for patient # 5? – Sample mean: (3.8 + 0.6 + 1.1 + 1.3)/4 = 1.7 – By age: (3.8+0.6)/2 = 2.2 – By sex: 1.1 – By education: 1.3 – By race: (3.8 + 0.6 + 1.3)/3 = 1.9 – By ADL: (1.1 + 1.3)/2 = 1.2 Who is/are in the same “slice” with #5? IDAgeSexEdu.RaceSCLADLPainComorb. 170F16W3.8485 270F16W0.6111 360M12B1.1231 485F21W1.3231 570M21W243
Further Reading Saar-Tsechansky & Provost http://www.springerlink.com/content/k5m57475n165 8723/fulltext.pdf http://www.springerlink.com/content/k5m57475n165 8723/fulltext.pdf Yang, Y., Pedersen J.P. A Comparative Study on Feature Selection in Text Categorization ICML 1997, pp412-420A Comparative Study on Feature Selection in Text Categorization Gelman chapter: http://www.stat.columbia.edu/~gelman/arm/missing.p df http://www.stat.columbia.edu/~gelman/arm/missing.p df Applications in biomed: Lavori, P., R. Dawson and D. Shera (1995) “A Multiple Imputation Strategy for Clinical Trialswith Truncation of Patient Data.” Statistics in Medicine 14: 1913-1925.