Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised Learning II Feature Extraction

Similar presentations


Presentation on theme: "Unsupervised Learning II Feature Extraction"— Presentation transcript:

1 Unsupervised Learning II Feature Extraction

2 Feature Extraction Techniques
Unsupervised methods can also be used to find features which can be useful for categorization. There are unsupervised methods that represent a form of smart feature extraction. Transforming the input data into the set of features still describing the data with sufficient accuracy In pattern recognition and image processing, feature extraction is a special form of dimensionality reduction

3 What is feature reduction?
Original data reduced data Linear transformation

4 High-dimensional data
Gene expression Face images Handwritten digits

5 Example Car Machine Learning Course Km/Hour Mile/Hour Learning Time
Interest Rate Final Grade

6 Why feature reduction? Most machine learning and data mining techniques may not be effective for high-dimensional data When the input data to an algorithm is too large to be processed and it is suspected to be redundant (much data, but not much information) Analysis with a large number of variables generally requires a large amount of memory and computation power or a classification algorithm which overfits the training sample and generalizes poorly to new samples The intrinsic dimension may be small. For example, the number of genes responsible for a certain type of disease may be small. Km/s mi/s

7 Why feature reduction? Visualization: projection of high-dimensional data onto 2D or 3D. Data compression: efficient storage and retrieval. Noise removal: positive effect on query accuracy.

8 Feature reduction versus feature selection
All original features are used The transformed features are linear combinations of the original features. Feature selection Only a subset of the original features are used. Continuous versus discrete

9 Application of feature reduction
Face recognition Handwritten digit recognition Text mining Image retrieval Microarray data analysis Protein classification

10 Algorithms Feature Extraction Techniques Principal component analysis
Independent component analysis Non-negative matrix factorization Singular value decomposition

11 What is Principal Component Analysis?
Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most of the sample's information. Useful for the compression and classification of data. By information we mean the variation present in the sample, given by the correlations between the original variables. The new variables, called principal components (PCs), are uncorrelated, and are ordered by the fraction of the total information each retains.

12 Geometric picture of principal components (PCs)
the 1st PC is a minimum distance fit to a line in X space the 2nd PC is a minimum distance fit to a line in the plane perpendicular to the 1st PC PCs are a series of linear least squares fits to a sample, each orthogonal to all the previous.

13 Background Mathematics
A matrix is a set of elements, organized into rows and columns rows columns is a matrix with m rows and n columns, where the entries of A are real number

14 Vector The ith element of a vector x is denoted xi

15 Basic Notation

16 Just subtract elements Multiply each row by each column
Basic Operations Addition, Subtraction, Multiplication Just add elements Just subtract elements Multiply each row by each column 6.837 Linear Algebra Review

17 Multiplication Is AB = BA? Maybe, but maybe not!
The product of two matrices is the matrix Heads up: multiplication is NOT commutative! AB  BA in general ! 6.837 Linear Algebra Review

18 Multiplication Example
Result is 3 x 3 [3 x 3]

19 Vector Operations Vector: N x 1 matrix
Interpretation: a line in N dimensional space Dot Product, Cross Product, and Magnitude defined on vectors only y v x 6.837 Linear Algebra Review

20 Basic concepts Transpose: reflect vector/matrix on line: Vector norms:
Note: (Ax)T=xTAT (We’ll define multiplication soon…) Vector norms: Lp norm of v = (v1,…,vk) is (Σi |vi|p)1/p Common norms: L1, L2

21 (use the head-to-tail method to combine vectors)
Vectors: Dot Product Interpretation: the dot product measures to what degree two vectors are aligned A A+B = C (use the head-to-tail method to combine vectors) B C B A 6.837 Linear Algebra Review

22 Vectors: Dot Product Think of the dot product as a matrix multiplication Outer product of the vectors The magnitude is the dot product of a vector with itself The dot product is also related to the angle between the two vectors – but it doesn’t tell us the angle

23 Linear Equations Linear algebra provides a way of representing and operating on sets of linear equations

24 Types of matrices Identity matrix A square matrix whose elements aij = 0, for i > j is called upper triangular, i.e., A square matrix whose elements aij = 0, for i < j is called lower triangular, i.e.,

25 Types of matrices Identity matrix Both upper and lower triangular, i.e., aij = 0, for i  j , i.e., is called a diagonal matrix, simply

26 Types of matrices Identity matrix In particular, a11 = a22 = … = ann = 1, the matrix is called identity matrix. Properties: AI = IA = A Examples of identity matrices: and

27 Types of matrices The inverse of a matrix If matrices A and B such that AB = BA = I, then B is called the inverse of A (symbol: A-1); and A is called the inverse of B (symbol: B-1). Example: Show B is the the inverse of matrix A. Ans: Note that Can you show the details?

28 Types of matrices The transpose of a matrix The matrix obtained by interchanging the rows and columns of a matrix A is called the transpose of A (write AT). Example: The transpose of A is For a matrix A = [aij], its transpose AT = [bij], where bij = aji.

29 Types of matrices Symmetric matrix A matrix A such that AT = A is called symmetric, i.e., aji = aij for all i and j. A + AT must be symmetric. Why? Example: is symmetric. A matrix A such that AT = -A is called skew-symmetric, i.e., aji = -aij for all i and j. A - AT must be skew-symmetric. Why?

30 Types of matrices Orthogonal matrix A matrix A is called orthogonal if AAT = ATA = I, i.e., AT = A-1 Example: prove that is orthogonal. Since, Hence, AAT = ATA = I. Can you show the details? We’ll see that orthogonal matrix represents a rotation in fact!

31 Properties of matrix (AB)-1 = B-1A-1 (AT)T = A and (lA)T = l AT
(A + B)T = AT + BT (AB)T = BT AT

32 Properties of matrix Example: Prove (AB)-1 = B-1A-1.
Since (AB) (B-1A-1) = A(B B-1)A-1 = I and (B-1A-1) (AB) = B-1(A-1 A)B = I. Therefore, B-1A-1 is the inverse of matrix AB.

33 Determinants Consider a 2  2 matrix:
Determinant of order 2 Consider a 2  2 matrix: Determinant of A, denoted , is a number and can be evaluated by

34 Determinants easy to remember (for order 2 only)..
Determinant of order 2 easy to remember (for order 2 only).. + - Example: Evaluate the determinant:

35 Determinants The following properties are true for determinants of any order. If every element of a row (column) is zero, e.g., , then |A| = 0. |AT| = |A| |AB| = |A||B| determinant of a matrix = that of its transpose

36 Determinants Example: Show that the determinant of any orthogonal matrix is either +1 or –1. For any orthogonal matrix, A AT = I. Since |AAT| = |A||AT | = 1 and |AT| = |A|, so |A|2 = 1 or |A| = 1.

37 Determinants of order 3 Example:

38 Determinants Inverse Number Example t[3214]=3+0+1+0=4
t[n n-1,… 2 1]=n(n-1)/2

39 Cofactor of matrix Cofactor matrix of
The cofactor for each element of matrix A:

40 Cofactor of matrix Cofactor matrix of is then given by:

41 Determinants of order 3 Consider an example:
Its determinant can be obtained by:

42 Inverse of a 33 matrix Inverse matrix of is given by:

43 Inverse of matrix Additive, switch Switch two rows and assume new matrix is A1. Based on the formula, |A|=-| A1 |. Therefore, if two rows are same, |A|=-|A|=0.

44 Inverse of matrix

45 Matrices as linear transformations
(stretching) (rotation)

46 Matrices as linear transformations
(reflection) (projection) (shearing)

47 Vector spaces Formally, a vector space is a set of vectors which is closed under addition and multiplication by real numbers. A subspace is a subset of a vector space which is a vector space itself, e.g. the plane z=0 is a subspace of R3 (It is essentially R2.). We’ll be looking at Rn and subspaces of Rn Our notion of planes in R3 may be extended to hyperplanes in Rn (of dimension n-1) Note: subspaces must include the origin (zero vector).

48 Linear system & subspaces
Linear systems define certain subspaces Ax = b is solvable iff b may be written as a linear combination of the columns of A The set of possible vectors b forms a subspace called the column space of A (1,2,1) (0,3,3)

49 Linear system & subspaces
The set of solutions to Ax = 0 forms a subspace called the null space of A.  Null space: {(0,0)}  Null space: {(c,c,-c)}

50 Linear independence and basis
Vectors v1,…,vk are linearly independent if c1v1+…+ckvk = 0 implies c1=…=ck=0 i.e. the nullspace is the origin (2,2) (0,1) (1,1) (1,0) Recall nullspace contained only (u,v)=(0,0). i.e. the columns are linearly independent.

51 Linear independence and basis
If all vectors in a vector space may be expressed as linear combinations of v1,…,vk, then v1,…,vk span the space. (0,0,1) (.1,.2,1) (0,1,0) (.3,1,0) (1,0,0) (.9,.2,0)

52 Projections (2,2,2) b = (2,2) (0,0,1) (0,1,0) (1,0,0) a = (1,0)

53 Eigenvalues & eigenvectors
How can we characterize matrices? The solutions to Ax = λx in the form of eigenpairs (λ,x) = (eigenvalue,eigenvector) where x is non-zero To solve this, (A – λI)x = 0 λ is an eigenvalue iff det(A – λI) = 0

54 Eigenvalues & eigenvectors
Eigenvalues λ = 2, 1 with eigenvectors (1,0), (0,1) Eigenvectors of a linear transformation A are not rotated (but will be scaled by the corresponding eigenvalue) when A is applied. (0,1) Av v (1,0) (2,0)


Download ppt "Unsupervised Learning II Feature Extraction"

Similar presentations


Ads by Google