Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis.

Similar presentations


Presentation on theme: "CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis."— Presentation transcript:

1 CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis

2 Desired Features of the Search Engines Meaning based –More relevant results Multilingual –Query in English, e.g. –Fetch document in Hindi, e.g. –Show it in English

3 Precision (P) and Recall (R) Tradeoff between P and R Actual (A) Obtained (O) Intersection: shaded area (S) P= S/O R= S/A

4 Impediments to Good P and R Synonymy: A word in the document will not match its synonym in the query, bringing down Recall –E.g., “Planes for Bangkok”: query –“Flights to Bangkok”: text in the document Polysemy: A word in the query will bring up documents containing the same word, but used in a different sense, bringing down Precision –E.g., “Planes for Bangkok”: query –“Cartesian Planes”: text in the document

5 Principal Component Analysis

6 Eaample: IRIS Data (only 3 values out of 150) IDPetal Length (a 1 ) Petal Width (a 2 ) Sepal Length (a 3 ) Sepal Width (a 4 ) Classific ation 0015.13.51.40.2Iris- setosa 0517.03.24.71.4,Iris- versicol or 1016.33.36.02.5Iris- virginica

7 Training and Testing Data Training: 80% of the data; 40 from each class: total 120 Testing: Remaining 30 Do we have to consider all the 4 attributes for classification? Do we have to have 4 neurons in the input layer? Less neurons in the input layer may reduce the overall size of the n/w and thereby reduce training time It will also likely increase the generalization performance (Occam Razor Hypothesis: A simpler hypothesis (i.e., the neural net) generalizes better

8 The multivariate data X 1 X 2 X 3 X 4 X 5 … X p x 11 x 12 x 13 x 14 x 15 … x 1p x 21 x 22 x 23 x 24 x 25 … x 2p x 31 x 32 x 33 x 34 x 35 … x 3p x 41 x 42 x 43 x 44 x 45 … x 4p … x n1 x n2 x n3 x n4 x n5 … x np

9 Some preliminaries Sample mean vector: For the i th variable: µ i = (Σ n j=1 x ij )/n Variance for the i th variable: σ i 2 = [Σ n j=1 (x ij - µ i ) 2 ]/ [n-1] Sample covariance: c ab = [Σ n j=1 ((x aj - µ a )(x bj - µ b ))]/ [n-1] This measures the correlation in the data In fact, the correlation coefficient r ab = c ab / σ a σ b

10 Standardize the variables For each variable x ij Replace the values by y ij = (x ij - µ i )/σ i 2 Correlation Matrix

11 Short digression: Eigenvalues and Eigenvectors AX=λX a 11 x 1 + a 12 x 2 + a 13 x 3 + … a 1p x p =λx 1 a 21 x 1 + a 22 x 2 + a 23 x 3 + … a 2p x p =λx 2 … a p1 x 1 + a p2 x 2 + a p3 x 3 + … a pp x p =λx p Here, λs are eigenvalues and the solution For each λ is the eigenvector

12 Short digression: To find the Eigenvalues and Eigenvectors Solve the characteristic function det(A – λI)=0 Example: -9 4 7 -6 Characteristic equation (-9-λ)(-6- λ)-28=0 Real eigenvalues: -13, -2 Eigenvector of eigenvalue -13: (-1, 1) Eigenvector of eigenvalue -2: (4, 7) Verify: -9 4 -1 -1 = -13 7 -6 1 1 λ 0 I= 0 λ

13 Next step in finding the PCs Find the eigenvalues and eigenvectors of R

14 Example 49 birds: 21 survived in a storm and 28 died. 5 body characteristics given X 1 : body length; X 2 : alar extent; X 3 : beak and head length X 4 : humerus length; X 5 : keel length Could we have predicted the fate from the body charateristic

15 Eigenvalues and Eigenvectors of R ComponentEigen value First Eigen- vector: V 1 V2V2 V3V3 V4V4 V5V5 13.6120.4520.4620.4510.4710.398 20.532-0.0510.3000.3250.185-0.877 30.3860.6910.341-0.455-0.411-0.179 40.302-0.4200.548-0.6060.3880.069 50.1650.374-0.530-0.3430.652-0.192

16 Which principal components are important? Total variance in the data= λ 1 + λ 2 + λ 3 + λ 4 + λ 5 = sum of diagonals of R= 5 First eigenvalue= 3.616 ≈ 72% of total variance 5 Second ≈ 10.6%, Third ≈ 7.7%, Fourth ≈ 6.0% and Fifth ≈ 3.3% First PC is the most important and sufficient for studying the classification

17 Forming the PCs Z 1 = 0.452X 1 +0.462X 2 +0.451X 3 +0.471X 4 +0.398X 5 Z 2 = -0.051X 1 +0.300X 2 +0.325X 3 +0.185X 4 -0.877X 5 For all the 49 birds find the first two principal components This becomes the new data Classify using them

18 For the first bird X 1 =156, X 2 =245, X 3 =31.6, X 4 =18.5, X 5 =20.5 After standardizing Y 1 =(156-157.98)/3.65=-0.54, Y 2 =(245-241.33)/5.1=0.73, Y 3 =(31.6-31.5)/0.8=0.17, Y 4 =(18.5-18.46)/0.56=0.05, Y 5 =(20.5-20.8)/0.99=-0.33 PC 1 for the first bird= Z 1 = 0.45X(-0.54)+ 0.46X(0.725)+0.45X(0.17)+0.47X(0.05)+0.39X(- 0.33) =0.064 Similarly, Z 2 = 0.602

19 Reduced Classification Data Instead of Use X1X1 X2X2 X3X3 X4X4 X5X5 49 rows Z1Z1 Z2Z2 49rows

20 Other Multivariate Data Analysis Procedures Factor Analysis Discriminant Analysis Cluster Analysis

21 Latent Semantic Analysis and Singular Value Decomposition Slides based on “Introduction to Information Retrieval”, Manning, Raghavan and Schutze, Cambridge University Press, 2008.

22 Term Document Matrix Terms as rows Docs as columns

23 Singular value Decomposition of Term vs. Document Matrix

24 Low Rank Approximation Given an M× N matrix C and a positive integer k, we wish to find an M× N matrix C k of rank at most k, so as to minimize the Frobenius norm of the matrix difference X = C − C k, defined to be

25 Example: Term-Document Matrix

26 Singular Values

27 Trucated SVD Matrix Retain only the first two Singular Values

28 Pros and Cons A kind of soft clustering on terms Documents pass through the LSA processing So do the queries No known efficient method of computation currently (billions of documents!) IMP: tries to capture association, a recurring notion in AI


Download ppt "CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis."

Similar presentations


Ads by Google