The Elements of Statistical Learning

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Pattern Recognition and Machine Learning
Data Mining Classification: Alternative Techniques
Support Vector Machines
Computer vision: models, learning and inference Chapter 8 Regression.
An Overview of Machine Learning
Supervised Learning Recap
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Statistical Decision Theory, Bayes Classifier
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification and Prediction: Regression Analysis
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Collaborative Filtering Matrix Factorization Approach
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Outline Separating Hyperplanes – Separable Case
This week: overview on pattern recognition (related to machine learning)
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
1 Bios 760R, Lecture 1 Overview  Overview of the course  Classification and Clustering  The “curse of dimensionality”  Reminder of some background.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Usman Roshan Dept. of Computer Science NJIT
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
A Brief Introduction to Bayesian networks
CS 9633 Machine Learning Support Vector Machines
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
Boosting and Additive Trees (2)
CH 5: Multivariate Methods
Machine learning, pattern recognition and statistical data modelling
Chapter 2: Overview of Supervised Learning
Machine Learning Basics
Overview of Supervised Learning
Machine Learning Week 1.
Unfolding Problem: A Machine Learning Approach
Collaborative Filtering Matrix Factorization Approach
Instance Based Learning
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Introduction to Radial Basis Function Networks
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Unfolding with system identification
Linear Discrimination
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Support Vector Machines 2
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

The Elements of Statistical Learning http://www-stat.stanford.edu/~tibs/ElemStatLearn/download.html http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf

Elements of Statistical Learning Printing 10 with corrections Contents 1. Introduction 2. MG Overview of Supervised Learning 3. MG Linear Methods for Regression 4.XS Linear Methods for Classification 5. Basis Expansions and Regularization 6. Kernel Smoothing Methods 7. Model Assessment and Selection 8. Model Inference and Averaging 9. KP Additive Models, Trees, and Related Methods 10. Boosting and Additive Trees 11. JW Neural Networks 12. Support Vector Machines and Flexible Discriminants 13. Prototype Methods and Nearest-Neighbors 14. Unsupervised Learning Spectral clustering kernel PCA, sparse PCA, non-negative matrix factorization archetypal analysis, nonlinear dimension reduction, Google page rank algorithm, a direct approach to ICA 15. KP Random Forests 16. Ensemble Learning 17. BJ Undirected Graphical Models 18. High-Dimensional Problems

Elements of Statistical Learning There is no true interpretation of anything; interpretation is a vehicle in the service of human comprehension. The value of interpretation is in enabling others to fruitfully think about an idea. –Andreas Buja Supervised Least Squares Nearest Neighbor Unsupervised

Elements of Statistical Learning Basics Supervised Learning Outcome/Output (usually quantitative) feature set (measurements) training data (feature measurements plus known outcome) Unsupervised Learning Outcome is unavailable or unknown Example problems available at http://www-stat.stanford.edu/ElemStatLearn Is it spam? prostate cancer digit recognition microarray

Elements of Statistical Learning Basics/Terminology variable types quantitative qualitative (AKA categorical, discrete, factors) values in a finite set, G = {Virginica, Setosa and Versicolor} classes – 10 digits, G = {0, 1, . . . , 9} ordered categorical G = { small, medium, large } not metric typically represented as codes, e.g., {0,1} or {-1,1} (AKA targets) most useful encoding - dummy variables, e.g. vector of K bits to for K-level qualitative measurement or more efficient version Symbols/terminology Input variable (feature), 𝑋 Output variable Quantitative, 𝑌 Qualitative, 𝐺 variables can be matrices (bold) or vectors. All vectors are column vectors hat ( e.g., 𝑌 )variables are estimates

Elements of Statistical Learning Linear Models (least squares) (2.3.1) input 𝑋 𝑇 = 𝑋 1 , 𝑋 2 …, 𝑋 𝑝 output 𝑌 𝑇 = 𝑌 1 , 𝑌 2 …, 𝑌 𝑝 𝑌 = 𝛽 0 + 𝑗=1 𝑝 𝑋 𝑗 𝛽 𝑗 intercept is the bias, 𝛽 0 if included in 𝑋, this can be written as an inner product (dot product), assumed from now on 𝑥 0 , 𝑥 1 ,…, 𝑥 𝑛 , 𝑦 0 , 𝑦 1 ,…, 𝑦 𝑛 = 𝑥 0 , 𝑥 1 ,…, 𝑥 𝑛 ∙ 𝑦 0 , 𝑦 1 ,…, 𝑦 𝑛 = 𝑥 0 𝑦 0 + 𝑥 1 𝑦 1 + … + 𝑥 𝑛 𝑦 𝑛 𝑌 =𝑓 𝑋 = 𝑋 𝑇 𝛽

Elements of Statistical Learning Linear Models (least squares) Minimize residual sum of squares, RSS RSS 𝛽 = 𝑖=1 𝑁 𝑦 𝑖 − 𝑥 𝑖 𝑇 𝛽 2 RSS 𝛽 = 𝑦−𝑿𝛽 𝑇 𝑦−𝑿𝛽 taking the derivative and setting it to zero (unless the 𝑿 𝑇 𝑿 is singular*) 𝛽 = 𝑿 𝑇 𝑦 𝑿 𝑇 𝑿 𝑦 𝑖 = 𝑦 𝑥 𝑖 = 𝑥 𝑖 𝑇 𝛽 * determinant is zero, has no inverse

Elements of Statistical Learning Linear Models (least squares) for classification output G = { Blue, Orange } = {0,1} 𝐺= Orange if 𝑌 >0.5 Blue if 𝑌 ≤0.5 if data comes from two bivariate Gaussian distributions solution is close to optimum (linear solution is best, Chapter 4) if dta comes from Gaussian mixture, not close to optimal

Elements of Statistical Learning Nearest neighbor methods (2.3.2) output is assigned based on the nearest observations in the training set, 𝒯 𝑌 𝑥 = 1 𝑘 𝑥 𝑖 ∈ 𝑁 𝑘 𝑥 𝑦 𝑖 where 𝑁 𝑘 𝑥 is the neighborhood of the k closest points to 𝑥 𝑖 Apparently less misclassification, but the effective number of parameters ( 𝑁 𝑘 ) is different error is zero when k=1

Elements of Statistical Learning Nearest neighbor methods

Elements of Statistical Learning Choosing k Simulated data 10 Blue means generated from N((1, 0)T , I) 10 Orange means generated from N((0,1)T , I) For each class, generate 100 observations pick a mean randomly, P=0.1 generate a Gaussian distributed random observation with N(mk, I/5) training data = 200 points test data = 10,000 points Bayes error rate (purple) 𝑝= 𝐶 𝑖 ≠ 𝐶 𝑚𝑎𝑥 𝑥∈ 𝐻 𝑖 𝑃 𝑥| 𝐶 𝑖 𝑝 𝐶 𝑖 𝑑𝑥 where Ci is a class and H is the area classified as Ci

Elements of Statistical Learning 2.4 Statistical Decision Theory 2.5 Local Methods in High Dimensions 2.6 Statistical Models, Supervised Learning and Function Approximation 2.7 Structured Regression Models 2.8 Classes of Restricted Estimators 2.9 Model Selection and the Bias–Variance Tradeoff

Elements of Statistical Learning

Elements of Statistical Learning

Elements of Statistical Learning

Elements of Statistical Learning

Elements of Statistical Learning