Unsupervised Learning: Principle Component Analysis

Slides:



Advertisements
Similar presentations
3D Geometry for Computer Graphics
Advertisements

Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Dimensionality Reduction PCA -- SVD
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Slides by Olga Sorkine, Tel Aviv University. 2 The plan today Singular Value Decomposition  Basic intuition  Formal definition  Applications.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Principal Component Analysis
3D Geometry for Computer Graphics
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks.
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5.
3D Geometry for Computer Graphics
Continuous Latent Variables --Bishop
Dan Simon Cleveland State University
Techniques for studying correlation and covariance structure
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
Basics of Neural Networks Neural Network Topologies.
Unsupervised learning
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Unsupervised Learning Motivation: Given a set of training examples with no teacher or critic, why do we learn? Feature extraction Data compression Signal.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
CSE 446 Dimensionality Reduction and PCA Winter 2012 Slides adapted from Carlos Guestrin & Luke Zettlemoyer.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Feature Extraction 主講人:虞台文.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Dimensionality Reduction and Principle Components Analysis
Dimensionality Reduction
PREDICT 422: Practical Machine Learning
Dimensionality Reduction
Ch 12. Continuous Latent Variables ~ 12
Principle Component Analysis (PCA) Networks (§ 5.8)
University of Ioannina
LECTURE 10: DISCRIMINANT ANALYSIS
Lecture 8:Eigenfaces and Shared Features
Lecture: Face Recognition and Feature Reduction
Machine Learning Dimensionality Reduction
PCA vs ICA vs LDA.
A principled way to principal components analysis
Probabilistic Models with Latent Variables
Techniques for studying correlation and covariance structure
Principal Component Analysis
Introduction PCA (Principal Component Analysis) Characteristics:
Recitation: SVD and dimensionality reduction
Parallelization of Sparse Coding & Dictionary Learning
Dimension reduction : PCA and Clustering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensionality Reduction
Feature space tansformation methods
Numerical Analysis Lecture 17.
Principal Components What matters most?.
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
LECTURE 09: DISCRIMINANT ANALYSIS
Principal Component Analysis
Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.
Presentation transcript:

Unsupervised Learning: Principle Component Analysis

Unsupervised Learning Dimension Reduction (化繁為簡) Generation (無中生有) only having function input only having function output function function Random numbers

Dimension Reduction function vector x vector z (Low Dim) (High Dim) http://reuter.mit.edu/blue/images/research/manifold.png http://archive.cnx.org/resources/51a9b2052ae167db310fda5600b89badea85eae5/isomapCNXtrue1.png Looks like 3-D Actually, 2-D

3 3 3 3 3 Dimension Reduction In MNIST, a digit is 28 x 28 dims. Most 28 x 28 dim vectors are not digits 3 3 3 3 3 用這個例子http://newsletter.sinica.edu.tw/file/file/4/485.pdf -20。 -10。 0。 10。 20。

Open question: how many clusters do we need? 0 0 1 Clustering Cluster 3 Open question: how many clusters do we need? K-means Clustering 𝑋= 𝑥 1 ,⋯, 𝑥 𝑛 ,⋯, 𝑥 𝑁 into K clusters Initialize cluster center 𝑐 𝑖 , i=1,2, … K (K random 𝑥 𝑛 from 𝑋) Repeat For all 𝑥 𝑛 in 𝑋: Updating all 𝑐 𝑖 : 1 0 0 0 1 0 Cluster 1 Cluster 2 1 𝑥 𝑛 is most “close” to 𝑐 𝑖 𝑏 𝑖 𝑛 Otherwise 𝑐 𝑖 = 𝑥 𝑛 𝑏 𝑖 𝑛 𝑥 𝑛 𝑥 𝑛 𝑏 𝑖 𝑛

Clustering Hierarchical Agglomerative Clustering (HAC) root Step 1: build a tree Step 2: pick a threshold https://www.semanticscholar.org/paper/Integrating-frame-based-and-segment-based-dynamic-Chan-Lee/7d8eaf612500420a125b3cfca98701114cf68873/pdf

Distributed Representation Clustering: an object must belong to one cluster Distributed representation 小傑是強化系 強化系 0.70 放出系 0.25 變化系 0.05 操作系 0.00 具現化系 特質系 克己派(Abnegation)、友好派(Amity)、無畏派(Dauntless)、直言派(Candor)以及博學派(Erudite) 小傑是

Distributed Representation function vector x vector z (High Dim) (Low Dim) Feature selection Principle component analysis (PCA) 𝑥 2 ? Select 𝑥 2 𝑥 1 [Bishop, Chapter 12] 𝑧=𝑊𝑥

PCA 𝑧=𝑊𝑥 Large variance Reduce to 1-D: 𝑧 1 = 𝑤 1 ∙𝑥 Small variance 𝑥 Project all the data points x onto 𝑤 1 , and obtain a set of 𝑧 1 We want the variance of 𝑧 1 as large as possible 𝑤 1 𝑉𝑎𝑟 𝑧 1 = 𝑧 1 𝑧 1 − 𝑧 1 2 𝑧 1 = 𝑤 1 ∙𝑥 𝑤 1 2 =1

PCA Project all the data points x onto 𝑤 1 , and obtain a set of 𝑧 1 We want the variance of 𝑧 1 as large as possible 𝑉𝑎𝑟 𝑧 1 = 𝑧 1 𝑧 1 − 𝑧 1 2 𝑤 1 2 =1 𝑧=𝑊𝑥 Reduce to 1-D: 𝑧 1 = 𝑤 1 ∙𝑥 𝑧 2 = 𝑤 2 ∙𝑥 We want the variance of 𝑧 2 as large as possible 𝑊= 𝑤 1 𝑇 𝑤 2 𝑇 ⋮ 𝑉𝑎𝑟 𝑧 2 = 𝑧 2 𝑧 2 − 𝑧 2 2 𝑤 2 2 =1 𝑤 1 ∙ 𝑤 2 =0 Orthogonal matrix

Warning of Math

PCA 𝑧 1 = 𝑤 1 ∙𝑥 𝑧 1 = 1 𝑁 𝑧 1 = 1 𝑁 𝑤 1 ∙𝑥 = 𝑤 1 ∙ 1 𝑁 𝑥 = 𝑤 1 ∙ 𝑥 𝑧 1 = 1 𝑁 𝑧 1 = 1 𝑁 𝑤 1 ∙𝑥 = 𝑤 1 ∙ 1 𝑁 𝑥 = 𝑤 1 ∙ 𝑥 𝑉𝑎𝑟 𝑧 1 = 1 𝑁 𝑧 1 𝑧 1 − 𝑧 1 2 𝑎∙𝑏 2 = 𝑎 𝑇 𝑏 2 = 𝑎 𝑇 𝑏 𝑎 𝑇 𝑏 = 1 𝑁 𝑥 𝑤 1 ∙𝑥− 𝑤 1 ∙ 𝑥 2 = 𝑎 𝑇 𝑏 𝑎 𝑇 𝑏 𝑇 = 𝑎 𝑇 𝑏 𝑏 𝑇 𝑎 = 1 𝑁 𝑤 1 ∙ 𝑥− 𝑥 2 Find 𝑤 1 maximizing = 1 𝑁 𝑤 1 𝑇 𝑥− 𝑥 𝑥− 𝑥 𝑇 𝑤 1 𝑤 1 𝑇 𝑆 𝑤 1 = 𝑤 1 𝑇 1 𝑁 𝑥− 𝑥 𝑥− 𝑥 𝑇 𝑤 1 𝑤 1 2 = 𝑤 1 𝑇 𝑤 1 =1 = 𝑤 1 𝑇 𝐶𝑜𝑣 𝑥 𝑤 1 𝑆=𝐶𝑜𝑣 𝑥

Corresponding to the largest eigenvalue 𝜆 1 Find 𝑤 1 maximizing 𝑤 1 𝑇 𝑆 𝑤 1 𝑤 1 𝑇 𝑤 1 =1 𝑆=𝐶𝑜𝑣 𝑥 Symmetric positive-semidefinite (non-negative eigenvalues) Using Lagrange multiplier [Bishop, Appendix E] 𝑔 𝑤 1 = 𝑤 1 𝑇 𝑆 𝑤 1 −𝛼 𝑤 1 𝑇 𝑤 1 −1 𝑆 𝑤 1 −𝛼 𝑤 1 =0 𝜕𝑔 𝑤 1 𝜕 𝑤 1 1 =0 𝑆 𝑤 1 =𝛼 𝑤 1 𝑤 1 : eigenvector 𝜕𝑔 𝑤 1 𝜕 𝑤 2 1 =0 a symmetric {\displaystyle n} × {\displaystyle n} real matrix {\displaystyle M} is said to be positive definite if the scalar {\displaystyle z^{\mathrm {T} }Mz} is positive for every non-zero column vector {\displaystyle z} of {\displaystyle n} real numbers zTMz>=0 Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/LA_2016/Lecture/special%20matrix%20(v2).ecm.mp4/index.html … 𝑤 1 𝑇 𝑆 𝑤 1 =𝛼 𝑤 1 𝑇 𝑤 1 =𝛼 Choose the maximum one 𝑤 1 is the eigenvector of the covariance matrix S Corresponding to the largest eigenvalue 𝜆 1

Corresponding to the 2nd largest eigenvalue 𝜆 2 Find 𝑤 2 maximizing 𝑤 2 𝑇 𝑆 𝑤 2 𝑤 2 𝑇 𝑤 2 =1 𝑤 2 𝑇 𝑤 1 =0 𝑔 𝑤 2 = 𝑤 2 𝑇 𝑆 𝑤 2 −𝛼 𝑤 2 𝑇 𝑤 2 −1 −𝛽 𝑤 2 𝑇 𝑤 1 −0 𝜕𝑔 𝑤 2 𝜕 𝑤 1 2 =0 𝑆 𝑤 2 −𝛼 𝑤 2 −𝛽 𝑤 1 =0 1 𝑤 1 𝑇 𝑆 𝑤 2 −𝛼 𝑤 1 𝑇 𝑤 2 −𝛽 𝑤 1 𝑇 𝑤 1 =0 𝜕𝑔 𝑤 2 𝜕 𝑤 2 2 =0 = 𝑤 1 𝑇 𝑆 𝑤 2 𝑇 = 𝑤 2 𝑇 𝑆 𝑇 𝑤 1 … = 𝑤 2 𝑇 𝑆 𝑤 1 = 𝜆 1 𝑤 2 𝑇 𝑤 1 =0 𝑆 𝑤 1 = 𝜆 1 𝑤 1 𝛽=0: 𝑆 𝑤 2 −𝛼 𝑤 2 =0 𝑆 𝑤 2 =𝛼 𝑤 2 𝑤 2 is the eigenvector of the covariance matrix S Corresponding to the 2nd largest eigenvalue 𝜆 2

PCA - decorrelation 𝑧=𝑊𝑥 𝐶𝑜𝑣 𝑧 =𝐷 𝐶𝑜𝑣 𝑧 = 1 𝑁 𝑧− 𝑧 𝑧− 𝑧 𝑇 =𝑊𝑆 𝑊 𝑇 𝑧 1 𝑧 2 𝑥 2 𝑧=𝑊𝑥 𝑥 1 𝐶𝑜𝑣 𝑧 =𝐷 Diagonal matrix 𝐶𝑜𝑣 𝑧 = 1 𝑁 𝑧− 𝑧 𝑧− 𝑧 𝑇 =𝑊𝑆 𝑊 𝑇 𝑆=𝐶𝑜𝑣 𝑥 =𝑊𝑆 𝑤 1 ⋯ 𝑤 𝐾 =𝑊 𝑆 𝑤 1 ⋯ 𝑆 𝑤 𝐾 =𝑊 𝜆 1 𝑤 1 ⋯ 𝜆 𝐾 𝑤 𝐾 = 𝜆 1 𝑊𝑤 1 ⋯ 𝜆 𝐾 𝑊𝑤 𝐾 = 𝜆 1 𝑒 1 ⋯ 𝜆 𝐾 𝑒 𝐾 =𝐷 Diagonal matrix

End of Warning So many good explaination !!!!!!!!!!!!!!!!! http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

PCA – Another Point of View Basic Component: ……. u1 u2 u3 u4 u5 1 1 1 1 0 1 0 1 ⋮ 1x 1x 1x ≈ + + u1 u3 u5 https://pvirie.wordpress.com/2016/03/29/linear-autoencoders-do-pca/ https://pvirie.wordpress.com/2013/10/01/pattern-covariance-analysis-of-components/ 𝑐 1 𝑐 2 ⋮ 𝑐 K 𝑥≈ 𝑐 1 𝑢 1 + 𝑐 2 𝑢 2 +⋯+ 𝑐 K 𝑢 𝐾 + 𝑥 Represent a digit image Pixels in a digit image component

PCA – Another Point of View 𝑥− 𝑥 ≈ 𝑐 1 𝑢 1 + 𝑐 2 𝑢 2 +⋯+ 𝑐 K 𝑢 𝐾 = 𝑥 Reconstruction error: Find 𝑢 1 ,…, 𝑢 𝐾 minimizing the error (𝑥− 𝑥 )− 𝑥 2 𝐿= min 𝑢 1 ,…, 𝑢 𝐾 𝑥− 𝑥 − 𝑘=1 𝐾 𝑐 𝑘 𝑢 𝑘 2 PCA: 𝑧=𝑊𝑥 𝑥 𝑧 1 𝑧 2 ⋮ 𝑧 𝐾 = 𝑤 1 T 𝑤 2 T ⋮ 𝑤 𝐾 T 𝑥 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 minimizing L Proof in [Bishop, Chapter 12.1.2]

𝑥− 𝑥 ≈ 𝑐 1 𝑢 1 + 𝑐 2 𝑢 2 +⋯+ 𝑐 K 𝑢 𝐾 = 𝑥 …… ≈ … … … … … Reconstruction error: Find 𝑢 1 ,…, 𝑢 𝐾 minimizing the error (𝑥− 𝑥 )− 𝑥 2 𝑥 1 − 𝑥 ≈ 𝑐 1 1 𝑢 1 + 𝑐 2 1 𝑢 2 +⋯ 𝑥 2 − 𝑥 ≈ 𝑐 1 2 𝑢 1 + 𝑐 2 2 𝑢 2 +⋯ 𝑥 3 − 𝑥 ≈ 𝑐 1 3 𝑢 1 + 𝑐 2 3 𝑢 2 +⋯ …… Link PCA with MDS [1] T. Cox and M. Cox. Multidimensional Scaling. Chapman Hall, Boca Raton, 2nd edition, 2001. u1 u2 𝑐 1 1 𝑐 1 2 𝑐 1 3 ≈ … … 𝑐 2 1 𝑐 2 2 𝑐 2 3 Minimize Error … … … Matrix X

This is the solution of PCA 𝑥 1 − 𝑥 u1 u2 𝑐 1 1 𝑐 1 2 𝑐 1 3 ≈ … … 𝑐 2 1 𝑐 2 2 𝑐 2 3 Minimize Error … … … Matrix X M x N M x K K x K K x N X U ∑ V ≈ K columns of U: a set of orthonormal eigen vectors corresponding to the k largest eigenvalues of XXT This is the solution of PCA SVD: http://speech.ee.ntu.edu.tw/~tlkagk/courses/LA_2016/Lecture/SVD.pdf

If 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 PCA looks like a neural network with one hidden layer (linear activation function) Autoencoder If 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 𝑥 = 𝑘=1 𝐾 𝑐 𝑘 𝑤 𝑘 To minimize reconstruction error: 𝑥− 𝑥 𝑐 𝑘 = 𝑥− 𝑥 ∙ 𝑤 𝑘 𝐾=2: 𝑤 1 1 𝑐 1 𝐾=2 𝑤 2 1 𝑥− 𝑥 𝑤 3 1

If 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 PCA looks like a neural network with one hidden layer (linear activation function) Autoencoder If 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 𝑥 = 𝑘=1 𝐾 𝑐 𝑘 𝑤 𝑘 To minimize reconstruction error: 𝑥− 𝑥 𝑐 𝑘 = 𝑥− 𝑥 ∙ 𝑤 𝑘 𝐾=2: 𝑐 1 𝐾=2 𝑥− 𝑥 𝑤 1 2 𝑐 2 𝑤 2 2 𝑤 3 2

If 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 PCA looks like a neural network with one hidden layer (linear activation function) Autoencoder If 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 𝑥 = 𝑘=1 𝐾 𝑐 𝑘 𝑤 𝑘 To minimize reconstruction error: 𝑥− 𝑥 𝑐 𝑘 = 𝑥− 𝑥 ∙ 𝑤 𝑘 𝐾=2: 𝑤 1 1 𝑥 1 𝑥 2 𝑥 3 𝑐 1 𝐾=2 𝑤 2 1 𝑥− 𝑥 𝑤 1 2 𝑤 3 1 𝑐 2 𝑤 2 2 𝑤 3 2

If 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 PCA looks like a neural network with one hidden layer (linear activation function) Autoencoder If 𝑤 1 , 𝑤 2 ,… 𝑤 𝐾 is the component 𝑢 1 , 𝑢 2 ,… 𝑢 𝐾 𝑥 = 𝑘=1 𝐾 𝑐 𝑘 𝑤 𝑘 To minimize reconstruction error: 𝑥− 𝑥 𝑐 𝑘 = 𝑥− 𝑥 ∙ 𝑤 𝑘 𝐾=2: It can be deep. Deep Autoencoder 𝑥 1 𝑥 2 𝑥 3 Minimize error 𝑐 1 𝑤 1 2 𝐾=2 𝑥− 𝑥 𝑥− 𝑥 𝑤 1 2 𝑐 2 𝑤 2 2 𝑤 2 2 Gradient Descent? 𝑤 3 2 𝑤 3 2

Using 4 components is good enough PCA - Pokémon Inspired from: https://www.kaggle.com/strakul5/d/abcsds/pokemon/princi pal-component-analysis-of-pokemon-data 800 Pokemons, 6 features for each (HP, Atk, Def, Sp Atk, Sp Def, Speed) How many principle components? 𝜆 𝑖 𝜆 1 + 𝜆 2 + 𝜆 3 + 𝜆 4 + 𝜆 5 + 𝜆 6 [ 0.45190665 0.18225358 0.12979086 0.12011089 0.07142337 0.04451466] 𝜆 1 𝜆 2 𝜆 3 𝜆 4 𝜆 5 𝜆 6 ratio 0.45 0.18 0.13 0.12 0.07 0.04 Using 4 components is good enough

PCA - Pokémon HP Atk Def Sp Atk Sp Def Speed PC1 0.4 0.5 0.3 PC2 0.1 0.0 0.6 -0.3 0.2 -0.7 PC3 -0.5 -0.6 PC4 0.7 -0.4 強度 防禦(犧牲速度)

PCA - Pokémon HP Atk Def Sp Atk Sp Def Speed PC1 0.4 0.5 0.3 PC2 0.1 0.0 0.6 -0.3 0.2 -0.7 PC3 -0.5 -0.6 PC4 0.7 -0.4 特殊防禦(犧牲攻擊和生命) 生命力強 幸福蛋 雷吉艾斯 壺壺

PCA - Pokémon http://140.112.21.35:2880/~tlkagk/pokemon/pca.html The code is modified from http://jkunst.com/r/pokemon-visualize-em-all/

PCA - MNIST = 𝑎 1 𝑤 1 + 𝑎 2 𝑤 2 +⋯ images 30 components: Eigen-digits

Weakness of PCA Unsupervised Linear PCA LDA http://www.astroml.org/book_figures/chapter7/fig_S_manifold_PCA.html

Weakness of PCA Pixel (28x28) -> tSNE (2) PCA (32) -> tSNE (2) Non-linear dimension reduction in the following lectures

Acknowledgement 感謝 彭冲 同學發現引用資料的錯誤 感謝 Hsiang-Chih Cheng 同學發現投影片上的錯 誤

Appendix http://4.bp.blogspot.com/_sHcZHRnxlLE/S9EpFXYjfvI/AAAAAAAABZ0/_oEQiaR3 WVM/s640/dimensionality+reduction.jpg https://lvdmaaten.github.io/publications/papers/TR_Dimensionality_Reduction _Review_2009.pdf