Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.

Slides:



Advertisements
Similar presentations
Diagnostic Metrics Week 2 Video 3. Different Methods, Different Measures  Today we’ll continue our focus on classifiers  Later this week we’ll discuss.
Advertisements

...visualizing classifier performance in R Tobias Sing, Ph.D. (joint work with Oliver Sander) Modeling & Simulation Novartis Pharma AG 3 rd BaselR meeting.
...visualizing classifier performance Tobias Sing Dept. of Modeling & Simulation Novartis Pharma AG Joint work with Oliver Sander (MPI for Informatics,
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Authorship Verification Authorship Identification Authorship Attribution Stylometry.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Classification and risk prediction
Model Evaluation Metrics for Performance Evaluation
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Jeremy Wyatt Thanks to Gavin Brown
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
ROC & AUC, LIFT ד"ר אבי רוזנפלד.
Evaluation – next steps
Introduction to Machine Learning Approach Lecture 5.
CSc288 Term Project Data mining on predict Voice-over-IP Phones market Huaqin Xu.
Data Mining – Credibility: Evaluating What’s Been Learned
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Evaluating Classifiers
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Chapter 4 Pattern Recognition Concepts continued.
Evaluation – next steps
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
הערכת טיב המודל F-Measure, Kappa, Costs, MetaCost ד " ר אבי רוזנפלד.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Sensitivity & Specificity Sam Thomson 8/12/10. Sensitivity Proportion of people with the condition who have a positive test result Proportion of people.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
An Exercise in Machine Learning
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Laboratory Medicine: Basic QC Concepts M. Desmond Burke, MD.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Information Retrieval Quality of a Search Engine.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
Information Organization: Evaluation of Classification Performance.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Evolving Decision Rules (EDR)
Evaluation – next steps
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.
Chapter 7 – K-Nearest-Neighbor
Measuring Success in Prediction
Machine Learning Week 10.
Data Mining Classification: Alternative Techniques
Features & Decision regions
Evaluation and Its Methods
Model Evaluation and Selection
Evaluating Models Part 1
Computational Intelligence: Methods and Applications
Dr. Sampath Jayarathna Cal Poly Pomona
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Dr. Sampath Jayarathna Cal Poly Pomona
Information Organization: Evaluation of Classification Performance
Presentation transcript:

Tutorial 2 LIU Tengfei 2/19/2009

Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource

Classifier output of Weka(1)

Classifier output of Weka(2)

TP rate, FP rate(1) Consider a diagnostic test A false positive(FP): the person tests positive, but actually does not have the disease. A false negative(FN): the person tests negative, suggesting he is healthy, but he actually does have the disease. Note: True positive/negative are similar

TP rate, FP rate(2) TP rate = true positive rate FP rate = false positive rate

TP rate, FP rate(3) Definition: TP rate = TP/(TP+FN) FP rate = FP/(FP+TN) From the actual value point of view

ROC curve(1) ROC = receiver operating characteristic Y:TP rate X:FP rate

ROC curve(2) Which method (A or B) is better? compute ROC area: area under ROC curve

Precision, Recall(1) Precision = TP/(TP + FP) Recall = TP/(TP + FN) Precision: is the probability that a retrieved document is relevant. Recall: is the probability that a relevant document is retrieved in a search.

Precision, Recall(2) F-measure = 2*(precision*recall)/(precision + recall) Precision, recall and F-measure come from information retrieval domain.

Confusion matrix Example: using J48 to process iris.arff

Other performance measures * p are predicted values and a are actual values

Resource 1. Wiki page for TP, FP, ROCWiki page for TP, FP, ROC 2. Wiki page for Precision and RecallWiki page for Precision and Recall 3. Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Chapter 5

Thank you !