Presentation is loading. Please wait.

Presentation is loading. Please wait.

COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.

Similar presentations


Presentation on theme: "COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel."— Presentation transcript:

1 COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel Hill 11/04/2009

2 What Are Graphs? Graph: a set of nodes connected by a set of edges nodes and edges can have labels edges can have directions 12 1 2

3 Graph Classification: Example Negative set: Positive set:

4 Graph Classification: Example Negative set: Positive set:

5 Graph Classification: Example Negative set: Positive set:

6 Graph Representation graphs Represented by

7 Interesting Properties in Data some most Determined by structure

8 Graph Classification Classify graphs Classify becomes positivenegativepositive negative Function is determined by structure

9 Graph Classification Using Frequent Subgraph Patterns The positive graphs should have Some common subgraph patterns that negative graphs don’t have Generate classifiers Frequent subgraph mining in the positive set (frequency >= threshold) Feature selection High dimensional data points classification

10 Graph Classification Using Frequent Subgraph Patterns The positive graphs should have Some common subgraph patterns that negative graphs don’t have Generate classifiers Frequent subgraph mining in the positive set Feature selection High dimensional data points classification

11 Graph Classification Using Discriminative Subgraph Patterns Frequent subgraph mining in the positive set Feature selection Mining discriminative/significant subgraph patterns merge Scoring function:Pattern redundancy: Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2 Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2

12 Graph Classification Using Discriminative Subgraph Patterns Frequent subgraph mining in the positive set Feature selection Mining discriminative/significant subgraph patterns merge Scoring function:Pattern redundancy: Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2 Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2

13 Previous Discriminative Pattern Mining Methods Each tree node represents a subgraph pattern Each node is a supergraph of its parent node, with one more edge One subgraph pattern corresponds to only one node Pattern redundancy: Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5 Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4 Pattern 1 is redundant given pattern 2 Scoring function:

14 1. Heuristic Exploration Order Pattern 1 Pattern 2 Pattern redundancy: Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5 Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4 Pattern 1 is redundant given pattern 2

15 Heuristic Exploration Order: Delta Score Pattern p Pattern p’ Delta score of p = score of p – score of p’It’s like looking for maximum of a function Large derivative Large absolute value

16 Heuristic Exploration Order: Delta Score Pattern p Pattern p’ Delta score of p = score of p – score of p’

17 Workflow of Pattern Exploration Collect frequent edges in the positive set and insert into a heap H If H not empty terminate Pop from H the pattern p with the highest delta score Extend pattern p and insert new non-redundant patterns into H A frequency threshold t p is needed

18 2. Use Co-occurrences of Patterns D D C B A D D C B AA Can be approximated by Co-occurrence D D C B A D D C B AA Graph G Graph G’

19 When Co-occurrence Is Superior Separately: A-B: N1, N2, P1, P2, P3, P4 B-C: N3, N4, P1, P2, P3, P4 Co-occurrence of A-B and B-C: P1, P2, P3, P4 No negative graphs

20 Co-occurrence Generation Candidate co-occurrence 1 Candidate co-occurrence 2 Candidate co-occurrence 3 Candidate co-occurrence 4 Candidate co-occurrence n For each new pattern p: Pattern p Union of pattern p and candidate co- occurrence k insert merging candidate k and pattern p can improve the score of p most significantly A co-occurrence is a set of subgraph patterns: {p 1, p 2, …, p m }

21 3. Use Association Rules to Classify Association Rule: {p 1, p 2, p 3, …, p n }  “positive” Input of COM (Co-Occurrence rule Miner): Positive graph set, negative graph set Frequency threshold t p of classification rule in the positive set; frequency threshold t n in the negative set Output of COM: A set of association rules

22 Association Rule Generation Each candidate co- occurrence corresponds to a candidate association rule If a rule satisfies >=t p and <=t n, it is a resulting rule Terminate when each positive graph is covered Remove redundant rules

23 Experiments: Datasets Protein datasets: Six SCOP families Chemical datasets: Six PubChem bioassays

24 Experiments: Parameters & Evaluation Protein datasets: t p = 30%, t n = 0% Chemical datasets:t p = 1%, t n = 0.4%

25 Experimental Results: Protein Datasets

26 Experimental Results: Chemical Datasets

27 Conclusions Using heuristic pattern exploration order and co-occurrences can improve runtime efficiency of mining discriminative patterns Using association rules can achieve competitive classification accuracy

28 Questions & Suggestions


Download ppt "COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel."

Similar presentations


Ads by Google