Learning with Cost Intervals Xu-Ying Liu and Zhi-Hua Zhou LAMDA Group National Key Laboratory for Novel Software Technology Nanjing.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Brief introduction on Logistic Regression

MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.

A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.

Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.

Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.

Linear Programming: Simplex Method and Sensitivity Analysis

Support vector machine

1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.

Chapter 10: Estimating with Confidence

1 Test-Cost Sensitive Naïve Bayes Classification X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology.

On Predictability and Profitability: Would AI induced Trading Rules be Sensitive to the Entropy of time Series Nicolas NAVET – INRIA France

Robust Allocation of a Defensive Budget Considering an Attacker’s Private Information Mohammad E. Nikoofal and Jun Zhuang Presenter: Yi-Cin Lin Advisor:

Planning under Uncertainty

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.

T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.

Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Analyzing Multi-channel MAC Protocols for Underwater Sensor Networks Presenter: Zhong Zhou.

Point and Confidence Interval Estimation of a Population Proportion, p

Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.

Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.

Chapter 10: Iterative Improvement

IEEM 3201 One and Two-Sample Estimation Problems.

1 Validation and Verification of Simulation Models.

1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.

BCOR 1020 Business Statistics

Support Vector Machines

S ystems Analysis Laboratory Helsinki University of Technology Using Intervals for Global Sensitivity and Worst Case Analyses in Multiattribute Value Trees.

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人：黃子齊

by B. Zadrozny and C. Elkan

1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.

Review: Two Main Uses of Statistics 1)Descriptive : To describe or summarize a collection of data points The data set in hand = all the data points of.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.

Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )

+ Warm-Up4/8/13. + Warm-Up Solutions + Quiz You have 15 minutes to finish your quiz. When you finish, turn it in, pick up a guided notes sheet, and wait.

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

Universit at Dortmund, LS VIII

Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.

MSE-415: B. Hawrylo Chapter 13 – Robust Design What is robust design/process/product?: A robust product (process) is one that performs as intended even.

Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.

Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.

Chapter 1 Introduction n Introduction: Problem Solving and Decision Making n Quantitative Analysis and Decision Making n Quantitative Analysis n Model.

1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Chapter 10: Confidence Intervals

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.

One Sample Mean Inference (Chapter 5)

Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

Optimization of functions of one variable (Section 2)

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

5.3 Algorithmic Stability Bounds Summarized by: Sang Kyun Lee.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Approximation Algorithms based on linear programming.

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

Evaluating Classifiers

CSCI B609: “Foundations of Data Science”

Presentation transcript:

Learning with Cost Intervals Xu-Ying Liu and Zhi-Hua Zhou LAMDA Group National Key Laboratory for Novel Software Technology Nanjing University, China {liuxy, The 16 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Lower Risk Imprecise Misclassification Cost In many real applications, different mistakes often have different costs. Learning methods should minimize the total cost instead of simply minimizing the error rate Cancer Patient Healthy Person Existing cost-sensitive learning methods assume that precise cost values are given Higher Risk It is often difficult to get precise cost values … What can we do ? e.g. = 1 e.g. = 10

Outline  Introduction  Our Method  Experiments  Conclusion

Although the user knows that one type of mistake is more severe than another type, it is often difficult to provide precise cost values However, obtaining a cost interval is feasible in most cases cost Possible cost values Cost Interval Lower Bound Upper Bound Cost Interval

Lower Risk Cancer Patient Healthy Person Higher Risk But, I think missing a patient is about 5 to 10 times more serious than troubling a healthy person I am not able to tell you the exact cost values Lower risk: 1 Higher risk: [5, 10] Cost Interval (cont.)

Natural upper and lower bounds Stock price $3 Buying Price $5 Stop Loss $8 Stop Profit Hold Gain at least $2 Gain at most $5 Transforming from confidence intervals Expert opinions Breast Cancer Report: the risk for breast cancer is greater for women with ADH of (IRR=5.0; 95% CI= ) Incidence rate ratio is about confidence interval Possible ways to obtain cost intervals:... Cost Interval (cont.)

Impreciseness of cost Precise cost values are always known Previous CSL [Ting&Zheng, ECML’98] Precise cost values are known at test phrase ROC Nothing is known on cost Cost Intervals Precise cost values are never known From the view of Cost Sensitive Learning: Related Work if used directly to deal with costs, it assume that: Costs vary in, and nothing is known about costs (including which class has higher cost ! )

Outline  Introduction  Our Method  Experiments  Conclusion

c - = 1, c + = c  [ C min, C max ] true cost C* : the true value of c Basic Idea Ideally, if C* is known however, C* is unknown Surrogate Cost C S The risk should be small enough for every possible cost value c

c - = 1, c + = c  [ C min, C max ] true cost C* : the true value of c Basic Idea however, C* is unknown Surrogate Cost C S The risk should be small enough for every possible cost value c However, infinite number of constraints !

Relaxation c risk If cost distribution is known: c~v Try to solve a relaxation with a small number of informative constraints the worst case risk the mean risk The optimal solution of minimizing the worst case risk can make all the constraints in the original formulation hold Minimizing the mean risk will reduce the overall distortion from the true risk Minimize the worst case risk as well as the mean risk

Loss Function: Loss of “+”Loss of “-” Step 1. Minimize the worst case risk: train SVM CISVM: Cost-Interval Sensitive SVM The CISVM Method To guarantee that all constrains in the original formulation will hold, minimizing the worst case risk is put as the first target

CSSVM: Cost-Sensitive SVM [Brefeld et al, ECML’03] which was designed for precise cost values Loss of “+” Loss of “-” Loss Function: Intuitively, given a cost interval, maybe we can apply CSSVM by assuming the true cost as: The upper bound of the interval -> CSMax The lower bound of the interval -> CSMin The mean of the interval -> CSMean CSSVM

CISVM ： CSMax ： CSMin ： CSMean ： Some Theoretical Results The optimal solutions of minimizing the worst case risk and the mean risk are different CSMax minimizes the worst case risk CSMin minimizes the best case risk CSMean minimizes the mean risk. The optimal solution of CSMean is different from that of CSMax and CSMin

yf Negative example Positive example 1 1 yf Negative example Positive example 1 1 yf Negative example Positive example 1 1 yf Negative example Positive example 1 1 Loss Functions

Robust The smaller the maximal distortion from the true risk, the better CSMax: distort = 4d, robust = 0.25 CSMean: distort = 2d, robust = 0.5 CSMin: distort = 4d, robust = 0.25 CISVM: distort = d, robust = 1

If we know more information, we can do better e.g., if we know the cost distribution: Cost c drawn i.i.d. from distribution v(c) Goal: To minimize Utilizing Cost Distribution Repeat, and ensemble an example-dependent cost-sensitive learner By Theorem 3, we get Thus, C is independent on X The CODIS (COst DIStribution) method:

Outline  Introduction  Our Method  Experiments  Conclusion

Settings Compared methods: CISVM, CSMin, CSMean, CSMax, SVM (cost-blind) All with RBF kernel Data: 15 UCI data sets; switching +/-; 25 cost intervals Overall 750 tasks

Evaluation 30 times hold-out tests, 2/3 training, 1/3 testing Quadrisection points of the cost interval are considered in turn as the true cost P1 P2 P3 P4 P5

the lower, the better Results Loss against SVM (cost-blind) 25 cost intervals CISVM Always predict high-cost class Cost-blind SVM

Results (cont.) Row vs. column w/t/l counts: after t-tests (95% SI) Bold: sign-tests (95% SI) All cost-sensitive methods are significantly better than cost-blind CISVM is significantly better than the other methods, except that it is comparable with CSMin when the true cost is P1, and CSMax when the true cost is P4 and P5

Using the “true cost” does not always lead to the best performance Results (cont.) Counts of SBPs (Significantly Best Performances), t-tests 95% SI M1: cost-blind SVM M2: CSMin M3: CSMean M4: CSMax M5: CISVM CISVM is the best, no matter which one among P1 to P5 is taken as the true cost CISVM is overall the best

By utilizing information on cost distributions, the performance of CODIS is much better Results (cont.)

Main contribution: The first study on learning with cost intervals The CISVM method Future work: To improve the surrogate risk... Thanks! Conclusion