1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Slides:

Advertisements

Similar presentations

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Dimension reduction (1)

Kernel methods - overview

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Principal Component Analysis

Lecture Notes for CMPUT 466/551 Nilanjan Ray

Dimensional reduction, PCA

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Project 4 out today –help session today –photo session today Project 2 winners Announcements.

Principle of Locality for Statistical Shape Analysis Paul Yushkevich.

Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

This week: overview on pattern recognition (related to machine learning)

Multimodal Interaction Dr. Mike Spann

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.

Perceptual and Sensory Augmented Computing Machine Learning WS 13/14 Machine Learning – Lecture 3 Probability Density Estimation II Bastian.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.

ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.

Linear Models for Classification

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Machine Learning 5. Parametric Methods.

Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

1 Bios 760R, Lecture 1 Overview  Overview of the course  Classification and Clustering  The “curse of dimensionality”  Reminder of some background.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.

1 C.A.L. Bailer-Jones. Machine learning and pattern recognition Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

LECTURE 10: DISCRIMINANT ANALYSIS

Machine learning, pattern recognition and statistical data modelling

Machine learning, pattern recognition and statistical data modelling

Machine Learning Basics

Machine Learning Dimensionality Reduction

In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

Machine Learning Math Essentials Part 2

CS4670: Intro to Computer Vision

INTRODUCTION TO Machine Learning

Feature space tansformation methods

Generally Discriminant Analysis

CS4670: Intro to Computer Vision

LECTURE 09: DISCRIMINANT ANALYSIS

Announcements Project 2 artifacts Project 3 due Thursday night

Announcements Project 4 out today Project 2 winners help session today

Announcements Artifact due Thursday

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Hairong Qi, Gonzalez Family Professor

Announcements Artifact due Thursday

The “Margaret Thatcher Illusion”, by Peter Thompson

Presentation transcript:

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling Lecture 2. Data exploration Coryn Bailer-Jones

2 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Last week... ● supervised vs. unsupervised learning ● generalization and regularization ● regression vs. classification ● linear regression (fit via least squares) – assume global linear fit; stable (low variance) but biased ● k nearest neighbours – assumes local constant fit; less stable (high variance) but less biased ● more complex models permit lower errors on training data – but we want models to generalize – need to control complexity / nonlinearity (regularization) ⇒ assume some degree of smoothness. But how much?

3 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction 2-class classification: K-nn and linear regression © Hastie, Tibshirani, Friedman (2001) with enough training data, wouldn't k-nn be best?

4 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction The curse of dimensionality ● for p=10, to capture 1% of data must cover 63% of range of each input variable (95% for p=100) ● as p increases – distance to neighbours increases – most neighbours are near boundary ● to maintain density (i.e. properly sample variance), number of templates must increase as N p Data uniformly distributed in unit hypercube Define neighbour volume with edge length e (e<1) neighbour volume = e p p = no. of dimensions r = fraction of unit data volume e = r 1/p

5 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Overcoming the curse ● Avoid it by dimensionality reduction – throw away less relevant inputs – combine inputs – use domain knowledge to select/define features ● Make assumptions about the data – structured regression ● this is essential: an infinite number of functions pass through a finite number of data points – complexity control ● e.g. smoothness in a local region

6 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Data exploration ● density modelling – smoothing ● visualization – identify structure, esp. nonlinear ● dimensionality reduction – overcome 'the curse' – stabler, simpler, more easily understood models – identify relevant variables (or combinations thereof)

7 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Density estimation (non-parametric)

8 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Density estimation: histograms Bishop (1995)

9 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Kernel density estimation K() is a fixed kernel function with bandwidth h. K = no. neighbours N = total no. points V = volume occupied by K neighbours Simple (Parzen) kernel:

10 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Gaussian kernel Bishop (1995) where N is entire data set

11 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction K-NN density estimation K = no. neighbours N = total no. points V = volume occupied by K neighbours Overcome fixed kernel size: Vary search volume size, V, until reach K neighbours Bishop (1995)

12 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Histograms and 1D kernel density estimation From MASS4 section 5.6. See R scripts on web.

13 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction 2D kernel density estimation From MASS4 section 5.6. See R scripts on web.

14 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Classification via (parametric) density modelling

15 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Maximum likelihood estimate of parameters

16 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Example: modelling PDF with two Gaussians class 1 = (0.0, 0.0) = (0.5, 0.5) class 2 = (1.0, 1.0) = (0.7, 0.3) See R scripts on web page

17 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Capturing variance: Principal Components Analysis (PCA)

18 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Principal Components Analysis For given data vector a, minimizing b is equivalent to maximizing c

19 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Principal Components Analysis: the equations

20 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA example: MHD stellar spectra N=5144 optical spectra 380 – 520 nm in p=820 bins Area normalized Show variance in spectral type (SpT) (Bailer-Jones et al. 1998)

21 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: average spectrum

22 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: first 20 eigenvectors

23 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: admix. coefs. vs. SpT

24 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra

25 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA reduced reconstruction

26 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Reconstruction quality for the MHD spectra shape of curve also depends on signal-to-noise level

27 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Reconstruction of an M star Key: - no. of PCs used - normalized reconstruction error:

28 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA: Explanation is not discrimination PCA has no class information, so cannot provide optimal discrimination

29 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA summary ● Linear projection of data which captures and orders variance – PCs are linear combinations of data which are uncorrelated and of highest variance – equivalent to a rotation of the coordinate system ● Data compression via a reduced reconstruction ● New data can be projected onto the PCs ● Reduced reconstruction acts as a filter – removes rare features (low variance measured across whole data set) – poorly reconstructs non-typical objects

30 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA as filter residual reconstructed spectrum (R=25, E=5.4%) original spectrum

31 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA ● What happens if there are fewer vectors than dimensions, i.e. N < p ?

32 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Summary ● curse of dimensionality ● density estimation – non-parametric: histograms, kernel method, k-nn ● trade-off between number of neighbours and volume size – parametric: Gaussian; fitting via maximum likelihood ● Principal Components Analysis – Principal Components ● are the eigenvectors of the covariance matrix ● are orthonormal ● ordered set describing directions of maximum variance – reduced reconstruction: data compression – a linear transformation (coordinate rotation)