Coping with Missing Data for Active Learning 02-750 Automation of Biological Research

Slides:



Advertisements
Similar presentations
Robust Feature Selection by Mutual Information Distributions Marco Zaffalon & Marcus Hutter IDSIA IDSIA Galleria 2, 6928 Manno (Lugano), Switzerland
Advertisements

Panos Ipeirotis Stern School of Business
Chapter 5 One- and Two-Sample Estimation Problems.
Recommender System A Brief Survey.
Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection Marcus Hutter & Marco Zaffalon IDSIA IDSIA Galleria.
Sequential learning in dynamic graphical model Hao Wang, Craig Reeson Department of Statistical Science, Duke University Carlos Carvalho Booth School of.
Maximal Independent Subsets of Linear Spaces. Whats a linear space? Given a set of points V a set of lines where a line is a k-set of points each pair.
Handling Missing Data on ALSPAC
MCMC estimation in MlwiN
Fawaz Ghali Web 2.0 for the Adaptive Web.
Hydra-MIP: Automated Algorithm Configuration and Selection for Mixed Integer Programming Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown Department.
1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 22 Comparing Two Proportions.
Feature Selection Advanced Statistical Methods in NLP Ling 572 January 24, 2012.
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
PATH INTEGRAL FORMULATION OF LIGHT TRANSPORT Jaroslav Křivánek Charles University in Prague
Chapter 7 Classification and Regression Trees
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
2.4 – Factoring Polynomials Tricky Trinomials The Tabletop Method.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Posner and Keele; Rosch et al.. Posner and Keele: Two Main Points Greatest generalization is to prototype. –Given noisy examples of prototype, prototype.
1 Review Lecture: Guide to the SSSII Assignment Gwilym Pryce 5 th March 2006.
1 Lecture 5 PRAM Algorithm: Parallel Prefix Parallel Computing Fall 2008.
5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.
MS 101: Algorithms Instructor Neelima Gupta
Computational Facility Layout
Shannon Expansion Given Boolean expression F = w 2 ’ + w 1 ’w 3 ’ + w 1 w 3 Shannon Expansion of F on a variable, say w 2, is to write F as two parts:
0 x x2 0 0 x1 0 0 x3 0 1 x7 7 2 x0 0 9 x0 0.
An Introduction to Boosting Yoav Freund Banter Inc.
Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Active Learning. 2 Learning from Examples  Passive learning A random set of labeled examples A random set of labeled examples.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
A Comparative Study on Feature Selection in Text Categorization (Proc. 14th International Conference on Machine Learning – 1997) Paper By: Yiming Yang,
INSTANCE-BASE LEARNING
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.
Modern Topics in Multivariate Methods for Data Analysis.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Image Classification for Automatic Annotation
One-class Classification of Text Streams with Concept Drift
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
Reference based sensitivity analysis for clinical trials with missing data via multiple imputation Suzie Cro 1,2, Mike Kenward 2, James Carpenter 1,2 1.
Multiple Imputation using SOLAS for Missing Data Analysis
School of Computer Science & Engineering
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Using Transductive SVMs for Object Classification in Images
The bane of data analysis
Classification and Prediction
Michal Rosen-Zvi University of California, Irvine
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Clinical prediction models
Presentation transcript:

Coping with Missing Data for Active Learning Automation of Biological Research

What is Missing? In active learning the category label is missing, and we can query an oracle, mindful of cost What else can be missing? – Features: we may not have enough for prediction – Feature combinations: beyond those the classifier is able to generate automatically (e.g. XOR, ratios) – Values of features: Not all instances have values for all their features. – Feature relevance: Some features are noisy or irrelevant – Feature redundancy: e.g. high feature co-variance

Reducing the Feature Space Feature selection – Subsample features using IG, MI, … Well studied, e.g. Yang & Pedersen ICML 1997 – Wrapper methods Inefficient but accurate, less studied Feature projection (to lower dimensions) – LDA, SVD, LSI Slow, well studied, e.g. Falluchi et al 2009 – Kernel functions on feature sub-spaces

Missing Feature Values Active learning of features – Not as extensively studied as active instance learning (See Saar-Tsechansky et al, 2007) – Determines which feature values to seek for given instances, or which features across the board – Can be combined with active instance learning But, what if there is no oracle? – Impossible to get feature values – Too costly or too time consuming – Do we ignore instances with missing features?

Missing Data Feature:X1X2X3X4X5X6Y Inst 11.53YRG Inst 21.4?NRB031.1?-- Inst 3??N?1.2?+ Inst YRG880.1?+ Inst YRB65?2.2-- Inst 62.0?Y? Inst 7?2.8YRB Inst YRG ? Inst 9?3.3NRG ?

How to Cope with Missing Features ML training assumes feature completeness – Filter our features that are mostly missing – Filter out instances with missing features – Impute values for missing features – Radically change ML algorithms When do we do each of the above? – With lots of data and few missing features… – With sparse training data and few missing… – With sparse data and mostly missing features…

Missing Feature Imputation How do we estimate missing feature values? – Infer the mean value across all instances – Infer the mean value in neighborhood – Apply a classifier with other features as input and missing feature value as y (label) How do we know if it makes a difference? – Sensitivity analysis (extrema, pertubations) – Train without instances with missing features vs instances with imputed values for missing features

More on Missing Values Missing Completely at Random (MCAR) – It is generally impossible to prove MCAR or MAR Missing at Random (MAR) – Statisticians assume MAR as default Missing values that depend on observables – Imputation via classification/regression Missing valued that depend on unobservables Missing depending on the value itself

9 Imputation – Example [From: Fan 2008] How to impute the missing SCL for patient # 5? – Sample mean: ( )/4 = 1.7 – By age: ( )/2 = 2.2 – By sex: 1.1 – By education: 1.3 – By race: ( )/3 = 1.9 – By ADL: ( )/2 = 1.2 Who is/are in the same “slice” with #5? IDAgeSexEdu.RaceSCLADLPainComorb. 170F16W F16W M12B F21W M21W243

Further Reading Saar-Tsechansky & Provost /fulltext.pdf /fulltext.pdf Yang, Y., Pedersen J.P. A Comparative Study on Feature Selection in Text Categorization ICML 1997, pp A Comparative Study on Feature Selection in Text Categorization Gelman chapter: df df Applications in biomed: Lavori, P., R. Dawson and D. Shera (1995) “A Multiple Imputation Strategy for Clinical Trialswith Truncation of Patient Data.” Statistics in Medicine 14: