Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James.

Similar presentations


Presentation on theme: "Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James."— Presentation transcript:

1 Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James

2 Outline Introduction Nearest Neighborhood Estimators  Regression Estimator  Maximum Likelihood Estimator  Revised Maximum Likelihood Estimator Comparison Summary 2

3 Intrinsic Dimension Definition The least number of parameters required to generate a dataset Minimum number of dimensions that describes a dataset without significant loss of feature 3

4 Ex 1: Intrinsic Dimension Flatten (Unroll) x z y x y Int Dim = 2 4

5 Ex 2: Intrinsic Dimension 28 X 28 One Image: 784 Dimensional 128 56

6 Ex 2: Intrinsic Dimension Int Dim = 2 [Isomap Project, J. Tenenbaum & J. Langford, Stanford] 6 Top & Bottom Loop No Loop

7 Applications Biometrics  Facial Recognition, Fingerprints, Iris Genetics 7

8 Why do we need to reduce dimensionality? Low dimensional datasets are more efficient Not even supercomputers can handle very high-dimensional matrices Data in 1,2 and 3 dimensions can be visualized 8

9 Ex: Facial Recognition in MN 5 Million People 2 Images per Person (Front and Profile) 1028 X 1028 Pixels per Image (1 Megapixel) Total Memory Required:  n = 5,000,000  p = (2)(1028)(1028)= 2.11 Million Dimensions  Matrix Size: (5 x 10 6 )(2.11 x 10 6 ) = 10 billion cells  Memory: 2(10 x 10 12 ) = 20 x 10 12 = 20 Terabytes

10 Intrinsic Dimension Estimators Objective: To find a simple formula that uses nearest neighbor (NN) information to quickly estimate intrinsic dimension 10

11 Intrinsic Dimension Estimators Project Description: Through simulation, we will compare the effectiveness of three proposed NN intrinsic dimension estimators. 11

12 Intrinsic Dimension Estimators Note: Traditional methods for estimating Intrinsic Dimension, such as PCA, fail on non-linear manifolds. 12

13 Intrinsic Dimension Estimators Nearest-Neighbor Methods Regression Estimator K. Pettis, T. Bailey, A. Jain & R. Dubes, 1979 Maximum Likelihood Estimator E. Levina, & P. Bickel, 2005 D. MacKay and Z. Ghahramani, 2005 13

14 The distance from x 2 to x 3 Distance Matrix 123...N 1 0d 1,2 d 1,3 d 1,n 2 d 2,1 0d 2,3 d 2,n 3 d 3,1 d 3,2 0d 3,n......... N d n,1 d n,2 d n,3... 0 D i,j : Euclidean distance from x i to x j 14

15 The distance between x 2 and the k th NN to x 2 Nearest Neighbor Matrix 123...N 1 0t 1,2 t 1,3 t 1,n 2 0t 2,2 t 2,3 t 2,n 3 0t 3,2 t 3,3 t 3,n......... N 0t n,2 t n,3... t n,n T i,k : Euclidean distance between x i and the k th NN to x i 15

16 Notation m: Intrinsic Dimension p: Dimension of the Raw Dataset n: Number of Observations f(x): density pdf for observation x T x,k or T k : distance from observation x to k th NN N(t,x): # obs within dist t of observation x 16

17 t N(t,x) = 3 Notation t1t1 t3t3 t2t2 x p = 2 m = 1 N = 12 17

18 NN Regression Estimator Density of Distance to k th NN (Single Observation, appx as Poisson) 1 Expected Distance to k th NN (Single Observation) 2a Sample-Averaged Distance to k th NN 2b Expected Distance to Sample-Averaged k th NN 3

19 Regression Estimator 19 Trinomial Distribution Binomial Distribution Distance to K th NN pdf Assumptions f(x) is constant n is large f(x)V t is small

20 Regression Estimator Approximate as Poisson Expected distance to K th NN

21 CnCn G k,m Estimate m using simple linear regression 21

22 Ex: Swiss Roll Dataset 22 m=0.49

23 Datasets Faces: Raw Dimension = 4096, Int Dim ~ 3 to 5 Gaussian Sphere Raw Dim = 3 Int Dim = 3 Swiss Roll Raw Dim = 3 Int Dim = 2 Dbl Swiss Roll Raw Dim = 3 Int Dim = 2 23

24 Results Regression Estimator FACES ~ 3.0 ~ 2.0 ~ 3.5 24 K = N / 100

25 NN Maximum Likelihood Estimator Counting Process Binomial (appx as Poisson) 1 Joint Counting Probability Joint Occurrence Density 2 Log-likelihood Function 3 4

26 Maximum Likelihood Estimator 26 N(t,x) = # Counts within Distance t of x # Counts btw Distance r and s is BIN

27 Maximum Likelihood Estimator

28 28 Joint pdf of Distances to K NN

29 29 Log-Likelihood Function

30 Averaging over N observations Averaging inverses over N observations (Using MLE) E. Levina & P. Bickel D. MacKay & Z. Ghahramani 30

31 Results MLE Estimator (Revised MacKay & Ghahramani) FACES ~ 3.0 ~ 2.0 ~ 2.1 ~ 3.5 31 K = N / 100

32 Comparison 32

33 Comparison 33

34 Comparison 34

35 Comparison 35

36 Comparison 36

37 Comparison 37

38 Isomap 38

39 Summary The regression and revised MLE estimators share similar characteristics when intrinsic dimension is small As intrinsic dimension increases, the estimators become more dependent on K Distribution type does not appear to be highly influential when the intrinsic dimension is small 39

40 Thank You! Dr. Kang James & Dr. Barry James Dr. Steve Trogdon

41 Example Swiss Roll Data Int Dim = 2


Download ppt "Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James."

Similar presentations


Ads by Google