Presentation is loading. Please wait.

Presentation is loading. Please wait.

This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented.

Similar presentations


Presentation on theme: "This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented."— Presentation transcript:

1 This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented by: Jeff Boisvert April 11, 2007

2 1 Well begun is half done. Aristotle Outline Introduction and Background The Algorithm Examples Conclusions/Future Work Critique of Paper

3 2 Motivation –discover ‘interesting’ patterns in data –Subjective ‘interestingness’  user –Often too many patterns to assess manually Setting –assume an available set of candidate patterns (freq item sets, etc) –Have user rank a subset of the candidate patterns –Learn from the users ranking –Have user rank more patterns –Learn –… Introduction and Background www.amazon.com

4 3 SVM –I think we have been presented with this enough Clustering –K-clusters - Minimize the maximum distance of each pattern to the nearest sample in a cluster Distance measure –Jaccard distance (between two patterns) Ranking –Linear - i.e. 2 < 3 (difference in ranking would be 3-2 = 1) –Log-Linear - i.e. log(2) < log(3) (difference in ranking would be 0.176) Introduction and Background

5 4 Outline Introduction and Background The Algorithm Examples Conclusions/Future Work Critique of Paper An algorithm must be seen to be believed. Donald Knuth

6 5 The Algorithm Overview 1.Prune candidate patterns and micro-clustering 2.Cluster N patterns into k clusters 3.Present k patterns to user for ranking 4.Refine the model with new user rankings 5.Re-rank all N patterns with new model 6.Reduce N=a*N 7.Go to step 2 Areas to discuss –(1) Preprocessing – pruning and micro-clustering –Clustering – see introduction –(2) Selecting the k patterns to present to the user –(3) Modeling the users knowledge/ranking *** Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN

7 6 The Algorithm (Preprocessing) Pruning –get representative patterns from candidates –start with maximal’s –merge candidates into maximal's –representative pattern = maximal –discard patterns, keep micro-cluster's (maximal’s) Micro-clustering –Two patterns are merged if: D(P 1,P 2 ) < epsilon –D is the Jaccard distance –Epsilon provided by the user (i.e. 0.1) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN www.johndeerelandscapes.com Zaiane, COMPUT 695 notes

8 7 The Algorithm (k patterns) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN Clustering patterns –Really have N micro-clusters but … Selecting Patterns –Criteria 1 – patterns presented should not be redundant Redundant patterns often rank close to each other Redundant if same composition/frequency –Criteria 2 – helps refine model of users knowledge of interesting pattern (not uninteresting patterns) Method [Gonzalez, 1985. Clustering to minimize the maximum intercluster distance] –Randomly select the first pattern –Second pattern – maximum distance from first pattern –Third pattern – max distance to the nearest of the first and second patterns –… Which k patterns to present to user?

9 8 The Algorithm (refine model 1) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN *** main contribution of the paper –How to model the users knowledge? –So far we have only ranked k out of N patterns… Interestingness –Difference between observed frequency and expected frequency f o (P) and f e (P) –Observed from input –Expected calculated from the model of the users knowledge f e (P) = M(P,θ) –If f o (P) and f e (P) are different the pattern is interesting Ranking –if the user ranks P i as more interesting than P j : R[f o (P i ),f e (P i )] > R[f o (P j ),f e (P j )] –Log-linear model R[f o (P),f e (P)] = log f o (P) - log f e (P) –This is a constraint on the model optimization

10 9 Will have k constraints The Algorithm (refine model 2) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN Log-Linear Model –Say we have a pattern (P) in a data set of s items, f e (P) is: –Recall ordering of patterns by user as a constraint: –Define a weight vector and new representation of the constraint above: R[f o (P i ),f e (P i )] > R[f o (P j ),f e (P j )]

11 10 The Algorithm (Re-rank all N patters) Log-Linear Model (cont.) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN Modified from www.nasa.com SVM Black Box –Can now rank ALL N patterns with interesting measure: R[f o (P i ),f e (P i )] > R[f o (P j ),f e (P j )] R[f o (P),f e (P)] = K[v(P),w]

12 11 The Algorithm (Reduce N) Reduce number of patterns –Discard some patterns N=aN –a is specified by the user –Will reduce the number of patterns to present to user at end –Stop when reached the max number of iterations also specified by the user END OF ALGORITHM Biased belief model –Not presented –Identical formulation to log-linear but assign a users belief probability to each transaction Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN m = number of transactions x k (P) = 1 if the transaction k contains P P k = users belief probability

13 12 The Algorithm Overview 1.Pre-process - prune / micro-clustering 2.Cluster N patterns into k clusters, present to user 3.Refine the model with new user rankings, re-rank patterns 4.Reduce N=a*N 5.Stop when reached max number of iterations Input parameters –a = shrinking ratio –k = number of user feedback patters –niter =number of iterations to consider (will control number of patterns in output) –Epsilon – micro-clustering parameter –Model type – log-linear vs. biased belief –Ranking type – linear vs. log

14 13 Outline Introduction and Background The Algorithm Example Conclusions/Future Work Critique of Paper Few things are harder to put up with than the annoyance of a good example. Mark Twain

15 14 Example 1 Transactions (35) Get microclusters (19) Pick a pattern #1 0 8 1 Pick a pattern #2 Pick pattern #k If k = 2 present: 0 8 1 6 4 2 to the user for ranking Refine Log-linear Model With new f e use SVM to rank all 19 transactions Reduce N Sort transactions by rank, take the top aN, say a=0.1, take the top 17 (19*0.9)

16 15 Example - 2 Their results on item sets: –Use data to simulate a persons prior knowledge –Partition data into 2 subsets, one background one for observed data –Background = users prior –Accuracy measured by –Data set: 49,046 transactions 2,113 items average length of 74 –First 1000 transactions are observed set –8,234 closed frequent item sets –Micro-clustering reduces to 769 –Compare top k ranked patterns

17 16 Example - 3 Their results on sequences: –1609 sentences –967 closed sequential patterns –Full feedback: use k = 967

18 17 Example - 4 Their results compared to other algorithms: –Same data as example 3 (1609 sentences) –They claim theirs is better… Selective Sampling Yu, KDD ‘05. Top-N Shen and Zhai, SIGIR ‘05

19 18 Outline Introduction and Background The Algorithm Examples Conclusions/Future Work Critique of Paper "I would never die for my beliefs because I might be wrong.” Bertrand Russell

20 19 Conclusions –Interactive with user –Tries to learn the users knowledge –Flexible (but flexible = many parameters) –Does not work well with sparse data Proposed future work –Study different models for sparse data –Better feedback strategies to maximize learning –Apply to other data types/sets

21 20 Outline Introduction and Background The Algorithm Examples Conclusions/Future Critique of Paper “He has a right to critcize, who has a heart to help.” Abraham Lincoln

22 21 Critique Sensitivity to input parameters Guidance selecting input parameters Order of paper Details/graphs in examples No examples that actually use a ‘user’s interactive feedback’

23 22 Questions It is better to know some of the questions than all of the answers. James Thurber It is not the answer that enlightens, but the question." Eugene Ionesco A wise man’s question contains half the answer. Solomon Ibn Gabirol


Download ppt "This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented."

Similar presentations


Ads by Google