Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.

Similar presentations


Presentation on theme: "1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005."— Presentation transcript:

1 1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005

2 2 Introduction Graph Heat Diffusion Model NHDC and PHDC algorithms Connections with other models Experiments Conclusions and future work Outline

3 3 Introduction Kondor & Lafferty (NIPS2002)  Construct a diffusion kernel on a graph  Handle discrete attributes  Apply to a large margin classifier  Achieve goof performance in accuracy on 5 data sets from UCI Lafferty & Kondor (JMLR2005)  Construct a diffusion kernel on a special manifold  Handle continuous attributes  Restrict to text classification  Apply to SVM  Achieve good performance in accuracy on WEbKB and Reuters Belkin & Niyogi (Neural Computation 2003)  Reduce dimension by heat kernel and local distance Tenenbaum et al (Science 2000)  Reduce dimension by local distance

4 4 Introduction We inherit the ideas  Local information is relatively accurate in a nonlinear manifold.  The way heat diffuses on a manifold is related to the density of the data on the manifold: the point where heat diffuses rapidly is one that has high density.  For example, in the ideal case when the manifold is the Euclidean space, heat diffuses in the same way as Gaussian density:  The way heat diffuses on a manifold can be understood as a generalization of the Gaussian density from Euclidean space to manifold.  Learn local information by k nearest neighbors.

5 5 Introduction We think differently:  Unknown manifold in most cases.  Unknown solution for the known manifold.  The explicit form of the approximation to the solution in (Lafferty & Lebanon JMLR2005): is a rare case.  Establish the heat diffusion equation directly on a graph that is formed by K nearest neighbors.  Always have an explicit form in any case.  Form a classifier by the solution directly.

6 6 Illustration The first heat diffusion The second heat diffusion

7 7 Illustration

8 8

9 9 Heat received from A class: 0.018 Heat received from B class: 0.016 Heat received from A class: 0.002 Heat received from B class: 0.08 SVM

10 10 Graph Heat Diffusion Model Given a directed weighted graph G=(V,E,W), where V={1,2,…,n}, E={(i,j): if there is an edge from i to j}, W=( w(i,j) ) is the weight matrix. The edge (i,j) is imagined as a pipe that connects i and j, w(i,j) is the pipe length. Let f(i,t) be the heat at node i at time t. At time t, i receives M(i,j,t,dt) amount of heat from its neighbor j during a period of dt.

11 11 Graph Heat Diffusion Model Suppose that M(i,j,t,dt) is proportional to the time period dt. Suppose that M(i,j,t,dt) is proportional to the heat difference f(j,t)-f(i,t). Moreover, the heat flows from j to i through the pipe and therefore the heat diffuses in the pipe in the same way as it does in the Euclidean space as described before.

12 12 Graph Heat Diffusion Model The heat difference f(i,t+dt) and f(i,t) can be expressed as: It can be expressed as a matrix form: Let dt tends to zero, the above equation becomes:

13 13 NHDC and PHDC algorithm - Step 1 [Construct neighborhood graph]  Define graph G over all data points both in the training data set and in the test data set.  Add edge from j to i if j is one of the K nearest neighbors of i.  Set edge weight w(i,j)=d(i, j) if j is one of the K nearest neighbors of i, where d(i, j) be the Euclidean distance between point i and point j.

14 14 NHDC and PHDC algorithm - Step 2 [Compute the Heat Kernel]  Using equation

15 15 NHDC and PHDC algorithm - Step 3 [ Compute the Heat Distribution ]  Set f(0): for each class c, nodes labeled by class c, has an initial unit heat at time 0, all other nodes have no heat at time 0.  In PHDC, use equation to compute the heat distribution.  In NHDC, use equation

16 16 NHDC and PHDC algorithm - Step 4 [ Classify the nodes ]  For each node in the test data set, classify it to the class from which it receives most heat.

17 17 Connections with other models The Parzen window approach (when the window function takes the normal form) is a special case of the NHDC. It is a non-parametric method for probability density estimation: The class-conditional density for class k Assign x to a class whose value is maximal.

18 18 Connections with other models The Parzen window approach (when the window function takes the normal form) is a special case of the NHDC. In our model, let K=n-1, then the graph constructed in Step 1 will be a complete graph. The matrix H will be Heat that x p receives from the data points in class k

19 19 Connections with other models KNN is a special case of the NHDC. For each test data, assign it to the class that has the maximal number in its K nearest neighbors.

20 20 Connections with other models KNN is a special case of the NHDC. In our model, letβtend to infinity, then the matrix H becomes Heat that x p receives from the data points in class k The number of the cases in class q in its K nearest neighbor.

21 21 Connections with other models PHDC can approximate NHDC. If γis small, then Since the identity matrix has no effect on the heat distribution, PHDC and NHDC has similar classification accuracy when γ is small.

22 22 Connections with other models PHDC NHDC KNNPWA

23 23 Experiments 2 artificial Data sets Spiral-100 Spiral-1000 Compare with Parzen window (The window function takes the normal form), KNN and SVM. The result is the average of the ten-cross validation.

24 24 Experiments Results AlgorithmNHDCPHDCKNNPWASVM Spiral-10084 678334 Spiral-1000 99.699.899.399.7 68.7 Credit-g76.176.0675.5972.3571.5 Diabetes76.376.2275.7874.9676.6 Glass72.9973.1270.6471.5668.1 Iris97.3697.7997.3697.0796 Sonar88.7589.0782.8688.2884.8 Vehicle72.9072.9371.4172.4588.5

25 25 Conclusions and future work Avoid the difficulty of finding the explicit expression for the unknown geometry Avoid the difficult of finding a closed form heat kernel for some complicated geometries. Both NHDC and PHDC are efficient in accuracy. There is space to develop it further. The assumption in the local heat diffusion is not fully justified. We are now using a directed graph. Converting it into a undirected graph may be more reasonable because that in reality heat diffuses symmetrically. Apply it to SVM?


Download ppt "1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005."

Similar presentations


Ads by Google