Maximizing long-term ROI for Active Learning Systems

Slides:

Advertisements

Similar presentations

Recommender System A Brief Survey.

Advertisements

Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.

Is Random Model Better? -On its accuracy and efficiency-

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Modeling and Simulation By Lecturer: Nada Ahmed. Introduction to simulation and Modeling.

Random Forest Predrag Radenković 3237/10

Imbalanced data David Kauchak CS 451 – Fall 2013.

Fast Algorithms For Hierarchical Range Histogram Constructions

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Yue Han and Lei Yu Binghamton University.

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.

Part I: Classification and Bayesian Learning

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.

Online Learning Algorithms

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Semi-Supervised Learning with Concept Drift using Particle Dynamics applied to Network Intrusion Detection Data Fabricio Breve Institute of Geosciences.

Active Learning for Class Imbalance Problem

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.

Data Mining and Decision Support

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

NTU & MSRA Ming-Feng Tsai

Data Mining for Surveillance Applications Suspicious Event Detection Dr. Bhavani Thuraisingham.

Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.

Interactive Data Mining and Business Applications Rayid Ghani Collaboration with Chad Cumby, Divna Djordjevic, Andy Fano, Marko Krema, Mohit Kumar, Abhimanyu.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

Introduction to Machine Learning, its potential usage in network area,

Contextual Intelligence as a Driver of Services Innovation

Data Mining to Predict and Prevent Errors in Health Insurance Claims Processing Mohit Kumar, Rayid Ghani and Zhu-Song Mei Copyright © 2010 Accenture All.

Transfer Learning in Astronomy: A New Machine Learning Paradigm

Vijay Srinivasan Thomas Phan

iSRD Spam Review Detection with Imbalanced Data Distributions

Modeling IDS using hybrid intelligent systems

Presentation transcript:

Maximizing long-term ROI for Active Learning Systems Thesis Proposal Maximizing long-term ROI for Active Learning Systems

Interactive Classification Goal: Optimize life-time Return On Investment Majority transactions automatically cleared Learning Model to Flag Transactions for Manual Intervention Large volume (in millions) of transactions coming in Domain specific transaction processing Machine Learning model Transactions processed successfully Minority transactions flagged for auditing Defining Characteristics Expensive domain experts Skewed class distribution (minority events) Concept/Feature drift Biased sampling of labeled historical data Lots of unlabeled data Goal: optimize roi over long term Lower false positive rates based on learning model

Interactive Classification Applications Fraud detection (Credit Card, Healthcare) Network Intrusion detection Video Surveillance Information Filtering / Recommender Systems Error prediction/Quality Control Health Insurance Claims Rework Characteristics Skewed class distribution (Rare events) Biased sampling of labeled data Concept drift Expensive domain experts

Health Insurance Claim Process - Rework Underpayments Overpayments Why does rework happen: Complexity in the system, provider contract, member benefits, 20% of all claims need manual pricing

Why is solving Claims Rework important? Inefficiencies in the healthcare process result in large monetary losses affecting corporations and consumers For large (10 million+) insurance plan, estimated $1 billion in loss of revenue $91 billion over-spent in US every year on Health Administration and Insurance (McKinsey study’ Nov 2008) 131 percent increase in insurance premiums over past 10 years Claim payment errors drive a significant portion of these inefficiencies Increased administrative costs and service issues of health plans Overpayment of Claims - direct loss Underpayment of Claims – loss in interest payment for insurer, loss in revenue for provider Some statistics 33% of all the workforce is involved in taking care of these errors For 6 million member insurance plan, $400 million identified overpayments Source: [Anand and Khots, 2008] For large (10 million+) insurance plan, estimated $1 billion in loss of revenue Source: Discussion with domain experts

Interactive Classification Setting – Machine Learning Setup Unlabeled + Labeled Data Ranked List scored by classifier Trained Classifier Classifier trained from labeled data Human (user/expert) in the loop using the results but also providing feedback at a cost Goal: Maximize long-term Return on Investment (equivalent to the productivity of the entire system)

Factorization of the problem Cost-Sensitive Exploitation Factorization of the problem Cost (Time of human expert) Exploration (Future classifier performance) Exploitation (Relevancy to the expert) Cost-Sensitive Active Learning Exploration-Exploitation Tradeoffs Related Work Exploitation: learning to rank Exploration: All active learning literature Exploration+Exploitation: Reinforcement learning, has a notion of cost or budget but doesn’t manipulate cost proactively; the notion of showing similar examples to reduce the cost Cost-sensitive active learning: budgeted active learning; feature-based active learning; multiple & noisy oracles; Standard Ranking / Relevance Feedback Active Learning

Factorization of the problem – characterization of the models Cost (Time of human expert) Exploration (Future classifier performance) Exploitation (Relevancy to the expert) Uniform Each instance has same value Variable Each instance has different value which is dependent on the properties of the instance Markovian Each instance has dynamically changing value depending on the (ordered) history of instances already observed, in addition to the factors for Variable model

Example Cases for Factorization of Cost Model Uniform: Speculative versus definitive language usage distinction for biomedical abstracts [Settles et al., 2008] Variable: Part Of Speech tagging Annotation time dependent on the sentence length with longer documents taking more time to label [Ringger et al., 2008] Markovian: Claims Rework Error Prediction If similar claims are shown to the auditors in sequence reducing the cognitive switching costs, the time taken to label reduces [Ghani and Kumar, 2011]

Example Cases for Factorization of Exploitation Model Uniform: Claims Rework Error Prediction If we only account for the administrative overhead of fixing a claim [Kumar et al., 2010] Variable: Claims Rework Error Prediction If we take into account the savings based on the adjustment amount of the claim [Kumar et al., 2010] Markovian: Claims Rework Error Prediction Root cause detection [Kumar et al., 2010]

Example Cases for Factorization of Exploration Model Uniform: Extracting contact details from email signature lines Random strategy gives results comparable to other strategies [Settles et al., 2008] Variable: KDD Cup 1999, Network Intrusion detection Sparsity based strategy gives good performance [Ferdowsi et al., 2011] Dependent on the properties of the examples (or population) which can be pre-determined. Markovian: Uncertainty based active sampling strategy Most commonly used strategy

Problem Statement How can we maximize long term ROI of active learning systems for interactive classification problems?

Proposed Hypothesis Jointly managing the cost, exploitation and exploration factors will lead to increased long term ROI compared to managing them independently

Proposed Contributions A framework to jointly manage cost, exploitation and exploration Extensions of Active Learning along the following dimensions Differential utility of a labeled example Dynamic cost of labeling an example Tackling concept drift

Proposed Framework Choice of Cost model Choice of Exploitation model Choice of Exploration model Utility metric Algorithms to optimize the utility metric

Choice of Models Exploitation Model Cost Model Exploration Model Markovian Exploitation Model Variable Uniform Complexity increases across the faces of the cube – color coding for the same Uniform Variable Markovian Variable Markovian Cost Model Uniform Exploration Model

Utility Metric Domain dependent May or may not have a simple instantiation in the domain Possible instantiations for Claims Rework domain Return on Investment (Haertal et al, 2008) Corresponds to the business goal of the deployed systems Return: Cumulative dollar value of claims adjusted Investment: Cumulative time (equivalent dollar amount) for auditing the claims Does not take into account the classifier improvement/degradation Amortized Return on Investment Amortized return: Calculate the net present value of the returns based on the expected future classifier improvement Return: Cumulative dollar value of claims adjusted + net present value of the increased returns due to future classifier improvement Takes into account exploration and exploitation Evaluation metric is given by the business goal For Claims Rework, return on the claims saved wrt the cost

Algorithm to optimize the utility metric Optimization straightforward if a well defined utility metric exists for the domain Computational approximations may still be required for practical feasibility Cases where a utility metric is not well defined based on the constituent cost/exploration/exploitation models, approaches to explore Rank fusion based approach Each model provides a ranking which are combined to get a final ranking Explore relevant approaches from reinforcement learning Upper Confidence Bounds for Trees (Kocsis and Szepesvári, 2006) Multi-armed bandit with dependent arms (Pandey et al, 2007) In Claims Rework domain, if cost model is defined in terms of audit complexity and the equivalent dollar conversion is not available/feasible

Interactive Classification Framework-Experimental Setup Cost (Time of human expert) Exploration (Future classifier performance) Exploitation (Relevancy to the expert) Ranked List Trained Classifier (1,…,t-1) Unlabeled Data (t) Labeled Data (1,…,t-1) Labeled Data (t) Performance evaluation done on the set of labeled instances obtained at each iteration

Evaluation Compare various approaches with multiple baselines Random Pure Exploitation Exploitation=Var/Mar; Exploration=Uniform; Cost=Uniform Pure Exploration Exploration=Var/Mar; Exploitation=Uniform; Cost=Uniform Pure Cost sensitive Cost=Var/Mar; Exploitation=Uniform; Exploration=Uniform Evaluation metric Domain dependent Claims Rework Cumulative Return on Investment Aligned with business goal Based on the set of instances labeled in each iteration, obtain true return (dollar value saved) and true investment (cost of auditor’s time)

Preliminary results Graph with results from framework

Generalizing Active Learning for Handling Temporal Drift What is temporal drift? Changing data distribution Changing nature of classification problem Adversarial actions Related Work Traditional active learning assumes static unlabeled pool Stream-based active learning (Chu et al., 2011) assumes no memory to store the instances and makes online decisions to request labels Not completely realistic as labeling requires human effort and is usually not real-time Learning approaches from data streams with concept drift predominantly use ensembles over different time period (Kolter and Maloof, 2007)

Proposed Setup for Temporal Active Learning Periodically changing unlabeled pool, corresponding to the experimental setup for interactive framework Cumulative streaming pool Recent streaming pool Novel setup Three components for handling temporal drift Instance selection strategy Type of model: Ensemble or Single Instance or model weighing scheme

Proposed Instance Selection Strategies Model Weight Drift Strategy Feature Weight Drift Strategy Feature Distribution Drift Strategy

Detecting Drift – Change in Models over Time Claims rework domain 15 models built over 15 time periods Similarity between the models based on cosine measure

Preliminary results Evaluation metric: Precision at 5 percentile Represented in graph as percentage of the best strategy at a given iteration to give a sense that the mentioned strategies are not the best strategies at all iterations Uncertainty begins to perform poorly at later iterations and feature drift based strategy starts performing better

Proposed Work More experiments and analysis for claims rework data with data from different clients More experiments based on synthetic dataset with longer observation sequence to analyze the performance of sampling strategies Generation of synthetic data based on Gaussian Mixture models to mimic real data

Cost-Sensitive Exploitation (Time of human expert) Exploration (Future classifier performance) Exploitation (Relevancy to the expert) Also related work

More Like This strategy Select Top m% claims Labeled Data Cluster Rank Ranked List scored by classifier Online Strategy

Online “More-like-this” Algorithm Require a labeled set L and an unlabeled set U Train classifier C on L Label U using C Select top m% scored unlabeled examples UT Cluster the examples UT U L into k clusters Rank the k clusters using a exploitation metric For each cluster ki in k Rank examples in ki For each example in ki Query expert for label of If precision of cluster ki is < Pmin and number of labels > Nmin, Next

Offline Comparison – MLT vs Baseline 9% relative improvement over baseline for Precision at 2nd percentile metric

Live System Deployment ~$10 Million savings /year for a typical insurance company Number of claims audited: Baseline system: 200 More-Like-This: 307 90% relative improvement over baseline 27% reduction in audit time over baseline

Summary Problem Statement How to maximize long term ROI of active learning systems for interactive classification problems

Summary Thesis Contributions Characterization of the interactive classification problem Defining the cost/exploration/exploitation models Uniform Variable Markovian Generalization (Extensions?) of Active Learning along the following dimensions Differential utility of a labeled example Dynamic cost of labeling an example Tackling concept drift A framework to jointly manage these considerations

Summary Evaluation Empirical Evaluation of the proposed framework Using evaluation metric motivated by real business tasks Datasets Real world dataset: Health Insurance Claims Rework Synthetic dataset Comparison with multiple baselines based on underlying cost/exploitation/exploration models Methodological contribution Novel experimental setup Intend to make the synthetic dataset and its generators public

Summary Proposed Work: Temporal Active Learning Creation of synthetic datasets Evaluation and analysis of proposed strategies on synthetic and claims rework dataset

Summary Proposed Work: Framework for interactive classification Evaluate multiple utility metrics/optimization algorithm for Claims Rework domain Augment temporal drift synthetic data for evaluating framework Evaluate multiple utility metrics/optimization algorithm for synthetic dataset Cost Model Exploitation Model Exploration Model Uniform Variable Markovian

Thanks

Our proposed approach – framework Problem Description High level factorization of the problem Related Work Triangle Our proposed approach – framework Broad categorization of the models Choice of models Choice of utility metric Choice of optimization Proposed work (various aproaches) Temporal active learning Some initial results Cost sensitive exploitation Summary Problem statemnt Contributions Evaluation

Thesis Contributions Problem Statement: How to generalize active learning to incorporate differential utility of a labeled example(dynamic/variable exploitation), dynamic cost of labeling an example, concept drift in a unified framework that makes the deployment of such learning systems practical Contributions Characterization of the interactive learning problem Generalization of Active Learning along the following dimensions Differential utility of a labeled example Dynamic cost of labeling an example Tackling concept drift Cost-Sensitive Exploitation A unified framework to solve these considerations jointly First solution: Optimizing joint utility function based on cost, exploration utility and exploitation utility Second solution: Using Upper Confidence Bound approach with contextual multi-armed bandit setup to incorporate the different factors Empirical Evaluation of the proposed framework Using evaluation metric motivated by real business tasks Datasets Synthetic dataset Real world dataset: Health Insurance Claims Rework Comparison with multiple baselines based on underlying factors

Situating the thesis work wrt related work Efficiency & Representation Feature level feedback Feature acquisition Batch active learning Cost-sensitive Active Learning PrActive Learning Differential Utility Dynamic cost Concept Drift Proactive Learning Unreliable Oracle Oracle variation

Problem Statement How to generalize active learning to incorporate differential utility of a labeled example(dynamic/variable exploitation), dynamic cost of labeling an example, concept drift in a unified framework that makes the deployment of such learning systems practical