Machine learning system design Prioritizing what to work on

Slides:



Advertisements
Similar presentations
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Advertisements

Classification Classification Examples
Imbalanced data David Kauchak CS 451 – Fall 2013.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Learning Algorithm Evaluation
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Indian Statistical Institute Kolkata
Decision Theory Naïve Bayes ROC Curves
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Evaluating Classifiers
Anomaly detection Problem motivation Machine Learning.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Overview DM for Business Intelligence.
Lecture 2: Introduction to Machine Learning
Learning with large datasets Machine Learning Large scale machine learning.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Evaluation – next steps
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Ensemble Methods: Bagging and Boosting
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
Week 1 - An Introduction to Machine Learning & Soft Computing
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED Machine Learning and Failure Prediction in Hard Disk Drives Dr. Amit Chattopadhyay Director.
Evaluating Classification Performance
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Machine Learning – Classification David Fenyő
Evaluating Classifiers
An Empirical Comparison of Supervised Learning Algorithms
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Perceptrons Lirong Xia.
Logistic Regression Classification Machine Learning.
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Evaluating Classifiers (& other algorithms)
Lecture 6: Introduction to Machine Learning
Chapter 11 Practical Methodology
Model Evaluation and Selection
Intro to Machine Learning
Machine Learning Algorithms – An Overview
Evaluating Classifiers
Jia-Bin Huang Virginia Tech
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
COSC 4368 Intro Supervised Learning Organization
Perceptrons Lirong Xia.
Presentation transcript:

Machine learning system design Prioritizing what to work on Soft Computing Week 7 Machine learning system design Prioritizing what to work on By Yosi Kristian.

Spam classification example Machine learning system design Spam classification example

Building a spam classifier From: cheapsales@buystufffromme.com To: yosi@stts.edu Subject: Buy now! Deal of the week! Buy now! Rolex w4tchs - $100 Med1cine (any kind) - $50 Also low cost M0rgages available. From: Edo@gmail.com To: yosi@stts.edu Subject: Christmas plan? Hey Bro, Was talking to Mom about plans for Xmas. When do you get off work? Edo Spam Non Spam

Building a spam classifier Supervised learning. features of email. spam (1) or not spam (0). Features : Choose 100 words indicative of spam/not spam. From: cheapsales@buystufffromme.com To: yosi@stts.edu Subject: Buy now! Deal of the week! Buy now! Note: In practice, take most frequently occurring words ( 10,000 to 50,000) in training set, rather than manually pick 100 words.

Building a spam classifier How to spend your time to make it have low error? Collect lots of data E.g. “honeypot” project. Develop sophisticated features based on email routing information (from email header). Develop sophisticated features for message body, e.g. should “discount” and “discounts” be treated as the same word? How about “deal” and “Dealer”? Features about punctuation? Develop sophisticated algorithm to detect misspellings (e.g. m0rtgage, med1cine, w4tches.)

Machine learning system design Error analysis Soft Computing Week 7

Recommended approach Start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data. Plot learning curves to decide if more data, more features, etc. are likely to help. Error analysis: Manually examine the examples (in cross validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.

Error Analysis 500 examples in cross validation set Algorithm misclassifies 100 emails. Manually examine the 100 errors, and categorize them based on: What type of email it is What cues (features) you think would have helped the algorithm classify them correctly. Pharma: Replica/fake: Steal passwords: Other: Deliberate misspellings: (m0rgage, med1cine, etc.) Unusual email routing: Unusual (spamming) punctuation:

The importance of numerical evaluation Should discount/discounts/discounted/discounting be treated as the same word? Can use “stemming” software (E.g. “Porter stemmer”) universe/university. Error analysis may not be helpful for deciding if this is likely to improve performance. Only solution is to try it and see if it works. Need numerical evaluation (e.g., cross validation error) of algorithm’s performance with and without stemming. Without stemming: With stemming: Distinguish upper vs. lower case (Mom/mom):

Error metrics for skewed classes Machine learning system design Error metrics for skewed classes

Cancer classification example Train logistic regression model . ( if cancer, otherwise) Find that you got 1% error on test set. (99% correct diagnoses) Only 0.50% of patients have cancer. function y = predictCancer(x) y = 0; %ignore x! return

Precision/Recall in presence of rare class that we want to detect (Of all patients where we predicted , what fraction actually has cancer?) Recall (Of all patients that actually have cancer, what fraction did we correctly detect as having cancer?)

Trading off precision and recall Machine learning system design Trading off precision and recall

no. of predicted positive true positives Trading off precision and recall precision = no. of predicted positive Logistic regression: Predict 1 if Predict 0 if true positives recall = no. of actual positive Suppose we want to predict (cancer) only if very confident. 1 0.5 Recall Precision Suppose we want to avoid missing too many cases of cancer (avoid false negatives). More generally: Predict 1 if threshold.

How to compare precision/recall numbers? F1 Score (F score) How to compare precision/recall numbers? Precision(P) Recall (R) Average F1 Score Algorithm 1 0.5 0.4 0.45 0.444 Algorithm 2 0.7 0.1 0.175 Algorithm 3 0.02 1.0 0.51 0.0392 Average: F1 Score:

Data for machine learning Machine learning system design Data for machine learning

Designing a high accuracy learning system Training set size (millions) Accuracy E.g. Classify between confusable words. {to, two, too}, {then, than} For breakfast I ate _____ eggs. Algorithms Perceptron (Logistic regression) Winnow Memory-based Naïve Bayes “It’s not who has the best algorithm that wins. It’s who has the most data.” [Banko and Brill, 2001]

Large data rationale Assume feature has sufficient information to predict accurately. Example: For breakfast I ate _____ eggs. Counterexample: Predict housing price from only size (feet2) and no other features. Useful test: Given the input , can a human expert confidently predict ?

Large data rationale Use a learning algorithm with many parameters (e.g. logistic regression/linear regression with many features; neural network with many hidden units). Use a very large training set (unlikely to overfit)