Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Similar presentations


Presentation on theme: "1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling."— Presentation transcript:

1 1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling Lecture 2. Data exploration Coryn Bailer-Jones

2 2 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Last week... ● supervised vs. unsupervised learning ● generalization and regularization ● regression vs. classification ● linear regression (fit via least squares) – assume global linear fit; stable (low variance) but biased ● k nearest neighbours – assumes local constant fit; less stable (high variance) but less biased ● more complex models permit lower errors on training data – but we want models to generalize – need to control complexity / nonlinearity (regularization) ⇒ assume some degree of smoothness. But how much?

3 3 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction 2-class classification: K-nn and linear regression © Hastie, Tibshirani, Friedman (2001) with enough training data, wouldn't k-nn be best?

4 4 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction The curse of dimensionality ● for p=10, to capture 1% of data must cover 63% of range of each input variable (95% for p=100) ● as p increases – distance to neighbours increases – most neighbours are near boundary ● to maintain density (i.e. properly sample variance), number of templates must increase as N p Data uniformly distributed in unit hypercube Define neighbour volume with edge length e (e<1) neighbour volume = e p p = no. of dimensions r = fraction of unit data volume e = r 1/p

5 5 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Overcoming the curse ● Avoid it by dimensionality reduction – throw away less relevant inputs – combine inputs – use domain knowledge to select/define features ● Make assumptions about the data – structured regression ● this is essential: an infinite number of functions pass through a finite number of data points – complexity control ● e.g. smoothness in a local region

6 6 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Data exploration ● density modelling – smoothing ● visualization – identify structure, esp. nonlinear ● dimensionality reduction – overcome 'the curse' – stabler, simpler, more easily understood models – identify relevant variables (or combinations thereof)

7 7 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Density estimation (non-parametric)

8 8 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Density estimation: histograms Bishop (1995)

9 9 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Kernel density estimation K() is a fixed kernel function with bandwidth h. K = no. neighbours N = total no. points V = volume occupied by K neighbours Simple (Parzen) kernel:

10 10 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Gaussian kernel Bishop (1995) where N is entire data set

11 11 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction K-NN density estimation K = no. neighbours N = total no. points V = volume occupied by K neighbours Overcome fixed kernel size: Vary search volume size, V, until reach K neighbours Bishop (1995)

12 12 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Histograms and 1D kernel density estimation From MASS4 section 5.6. See R scripts on web.

13 13 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction 2D kernel density estimation From MASS4 section 5.6. See R scripts on web.

14 14 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Classification via (parametric) density modelling

15 15 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Maximum likelihood estimate of parameters

16 16 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Example: modelling PDF with two Gaussians class 1 = (0.0, 0.0) = (0.5, 0.5) class 2 = (1.0, 1.0) = (0.7, 0.3) See R scripts on web page

17 17 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Capturing variance: Principal Components Analysis (PCA)

18 18 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Principal Components Analysis For given data vector a, minimizing b is equivalent to maximizing c

19 19 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Principal Components Analysis: the equations

20 20 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA example: MHD stellar spectra N=5144 optical spectra 380 – 520 nm in p=820 bins Area normalized Show variance in spectral type (SpT) (Bailer-Jones et al. 1998)

21 21 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: average spectrum

22 22 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: first 20 eigenvectors

23 23 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra: admix. coefs. vs. SpT

24 24 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction MHD stellar spectra

25 25 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA reduced reconstruction

26 26 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Reconstruction quality for the MHD spectra shape of curve also depends on signal-to-noise level

27 27 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Reconstruction of an M star Key: - no. of PCs used - normalized reconstruction error:

28 28 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA: Explanation is not discrimination PCA has no class information, so cannot provide optimal discrimination

29 29 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA summary ● Linear projection of data which captures and orders variance – PCs are linear combinations of data which are uncorrelated and of highest variance – equivalent to a rotation of the coordinate system ● Data compression via a reduced reconstruction ● New data can be projected onto the PCs ● Reduced reconstruction acts as a filter – removes rare features (low variance measured across whole data set) – poorly reconstructs non-typical objects

30 30 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA as filter residual reconstructed spectrum (R=25, E=5.4%) original spectrum

31 31 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction PCA ● What happens if there are fewer vectors than dimensions, i.e. N < p ?

32 32 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Summary ● curse of dimensionality ● density estimation – non-parametric: histograms, kernel method, k-nn ● trade-off between number of neighbours and volume size – parametric: Gaussian; fitting via maximum likelihood ● Principal Components Analysis – Principal Components ● are the eigenvectors of the covariance matrix ● are orthonormal ● ordered set describing directions of maximum variance – reduced reconstruction: data compression – a linear transformation (coordinate rotation)


Download ppt "1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling."

Similar presentations


Ads by Google