Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rotate! Base Clique Motifs Bipartitie graph, G9.1:

Similar presentations


Presentation on theme: "Rotate! Base Clique Motifs Bipartitie graph, G9.1:"— Presentation transcript:

1 Rotate! Base Clique Motifs Bipartitie graph, G9.1:
Inv(1,2,3,4,5) recommend Stock(A,B,C,D,E) 1 2 4 3 A B C D E SI-Raster Edge Table (Traditional) Unipartite G1.1: Proteins(1,2,3,4,5) interactions 1 3 4 2 ExpBase SI cTrees 3 2 4 1 5 Edge Tbl B A C D E 1 3 2 4 Base SI cTrees S I 5 1 2 3 4 5 A B C D E 1 2 3 4 5 2 1 3 4 5 Adj Matrix 2 1 3 4 5 I A B C D E S Adjacency Matrix 2 1 3 4 5 3Lev Stride=5 NPZ pTrees Lev=2 Lev=1 Lev=0 1 2 3 4 5 Edge Map 1 2 3 4 5 A B C D E SI-Raster Edge Map ExpBase IS cTrees 1 2 4 5 3 2 1 3 4 5 3Lev Stride=5 NPZ SI pTrees Lev=2 Lev=1 Lev=0 1 4 2 3 5 Base IS cTrees I S B A C D E Create EBcTrees Isomorphic EBCMs counted from cTree counts: Rotate! 2 1,4 SI BCMotifs 2 4,1 SI BCMotifs 2 1,3 SI BCMotifs 1 3,1 SI BCMotif 1 1,2 SI BCMotif Create Expanded Base cTrees 2 3,3 SI EBCMotifs 2 4,2 SI EBCMotifs 1 2,4 SI EBCMotif 1 5,1 SI EBCMotif 2 1 3 4 5 Base cTrees B A C D E 1 2 3 4 IS-Raster EdgeTbl The number of isomorphic copies of an EBC Motif can be counted by analyzing cTree counts: I Adjacency Matrix B A C D E S 1 Bipartite BcTrees are induced subgraphs (also cliques), EBcTrees are max cliques. Mine for other motifs? Is motif mining even useful in the Investor-Stock case? (Maybe it would be useful to know that the 3-3 motif occurs many times (3 investors recommending 3 stock). Motifs seem to be of greatest interest in the context of Protein-Protein interaction graphs in which the two label sets are the same and therefore there is just one Base cTreeSet and one EBcTreeSet to create (easier) and the h-k motifs are not distinct from the k-h motifs. Question: in PPI graphs, would the counts of Expanded Base Clique Motifs provide important information? Thus for this unipartite graph there are: 1 2,2 EBC Motif 1 4,1 EBC Motif In addition: ,3 BC Motifs 11 1,2 BC Motifs 1 2 4 EB cTrees (oa) 1 B A C D E 2 3 4 5 IS-Raster EdgeMap 3Lev St=5 NPZ IS pTs Lev=2 Lev=1 Lev=0 2 1 3 4 5

2 cliqueTrees Stock BCTs I S Investor BCTs S I Stock EBCTs I S Inv EBGTs
Bipart G11: Inv(12345) rec Stk(ABCDE) Stock BCTs I S 1 4 2 3 5 B A C E D NPZpTr st=5 L=2 L=1 L=0 1 3 2 4 Investor BCTs S I 1 3 2 4 5 A C B D E G11 Stock EBCTs I S 1 2 4 3 5 B A C E D 1 2 A B C 3 H1 Stock EBCTs 1 B A C D E 2 3 4 5 EdgeMap EdgeTbl Adj Matrix Graph Traditional data structures 1 A B C D E 2 3 4 5 New DSs: NPZpT st=5 L=2 L=1 L=0 1 4 5 3 oa oa Stock EBCTs I S 1 4 2 3 5 A B D C E Inv EBGTs 1 3 2 5 4 B A C E D =C a MaxClique.Then 1 of must be a BC, say Expanding it gives C. Thus, for Bipartite Graphs, every MaxClique is an EBCT. 1 H1: On Day() I(123) recommend S(ABC) NPZ pTree (stride=3) L=3 L=2 L=1 L=0 1 2 3 =C a MaxClique.Then 1 of must be a BC, say Expanding it gives C. Thus, for Tripartite Graphs, every MaxClique is an EBCT. 1 1 2 A B C 3 DI StockBaseCliqueTrees D I S 1 2 A B C 3 DI StockBaseCliqueTrees D I S oaa aoa 1 2 3 A B C 1 2 3 A B C aoa oaa 1 2 3 A B C 1 2 A B C 3

3 Stock Day Investor cTrees Day Stock Investor cTrees Stock Investor
Base CliqueTrees for 3HG2 TriEdgeTable (S,D,I,R) has 6 key sort orders, SDI,DSI,SID,ISD,DIS,IDS. The Adjacency Matrix (data cube) has 1 for each existing TriEdge (that Investor recommended that stock on that day). There are 6 Base cTreeSets and 1 operator, aoa, to generate Expanded Base cliqueTrees. S A B D α 1 I 2 3 4 5 R C E CtI Stock Day Investor cTrees S (1st sort dim) D (2nd) I (3rd) 4Level Stride=5 rasterSDI NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 3 4 2 5 4Level Stride=5 rasterDSI NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 3 4 2 5 D A S B C E 1 I 2 3 4 5 R Day Stock Investor cTrees CtI 4Level Stride=5 rasterSID NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 5 2 4 3 B A C D E 2 1 3 4 5 CtD Stock Investor Day cTrees S I α R

4 Investor Stock Day cTrees Day Investor Stock cTrees Investor Day
Base CliqueTrees for 3HG2 TriEdgeTable (S,D,I,R) last 3 key sort orders, ISD,DIS,IDS. 4Level Stride=5 rasterISD NPZ pTrees (same as SID on pevious slide) Lev=3 Lev=2 Lev=1 Lev=0 1 5 2 4 3 B A C D E 2 1 3 4 5 CtD Investor Stock Day cTrees α ISDR 4Level Stride=5 rasterDIS NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 4 5 3 1 4 B A C D E 2 3 5 Day Investor Stock cTrees CtS DISR 1 4 B A C D E 2 3 5 Investor Day Stock cTrees CtS α IDSR 4Level Stride=5 rasterIDS NPZ pTrees Lev=3 Lev=2 Lev=1 Lev=0 1 4 5 3

5 Stock Day Investor Base cTrees Day Stock Investor Base cTrees Stock
Maximal Base CliqueTrees for 3HG2 1 4 3 2 5 Stock Day Investor Base cTrees B A C D E 2 1 3 4 5 CtI 1 4 3 2 5 aoa oaa (all of these will be Max Cliques) We can count the S=1 D=1 I=4 motifs? COMBO(5,4)=5 = 11 113? 10+6C(4,3)+C(5,3) = 54 112? C3,2+6C4,2+C5,2 = 83 Day Stock Investor Base cTrees B A C D E 2 1 3 4 5 CtI 1 3 4 2 5 1 3 4 2 5 aoa oaa (all of these Max Cliques, only 3 new ones) Stock Investor Day Base cTrees B A C D E 2 1 3 4 5 CtD 1 2 5 3 4 1 2 5 3 4 aoa oaa (all of these Max Cliques, only 3 new ones)

6 Investor Stock Day cTrees Day Investor Stock cTrees Investor Day
Base CliqueTrees for 3HG2 last 3. Investor Stock Day cTrees 2 1 3 4 5 1 5 2 4 3 1 5 4 2 3 1 2 5 4 3 B A C D E aoa oaa (all of these will be Max Cliques) Day Investor Stock cTrees 1 4 5 3 1 2 4 5 3 1 3 2 4 5 2 1 3 4 5 B A C D E aoa oaa (all of these will be Max Cliques) Investor Day Stock cTrees 2 1 3 4 5 1 4 5 3 1 3 4 5 2 1 2 3 4 5 B A C D E aoa oaa (all of these will be Max Cliques)

7 Maximal Base CliqueTrees for 3HG2
aoa then oaa on the 6 cTrees (removing duplicates - no covers since aoa then oaa gives Maximal Cliques only). We get 34 MCs below. Theorem: These 34 MCs are the only Maxmal Cliques. Proof: Let C be MaxClique, v1Part1(C), w1Part2(C), {z1..zn}=Part3(C). Apply aoa to that BaseClique, B. aoa(B)={v1,w1..wm,z1..zn} is a clique W={w1..wm}Part2(C) else C is not max. oaa(aao(B))={v1..vk,W,Z} is clique. V={v1..vk}Part1(C) else C not mx. Thus {V,W,Z} is a MaxClique  C and therefore {V,W,Z}=C. Thus C is one of the Expanded Base Cliques under aoa then oaa. General thm: {a..ao(a..oa(…oa..a(B)|B=BaseClique} is the MaxCliqueSet. Thus, for a bipartite graph, the MCS is {ao(B) | B a BaseClique}. (Seems to say that only one of the 6 cTrees will generatea all of MCS?) 1 4 3 2 5 1 2 5 3 4 B A C D E B A C D E 2 1 3 4 5 1 2 5 4 3 B A C D E 1 3 4 2 5 B A C D E 1 5 2 4 3 B A C D E 1 2 3 4 5 B A C D E 1 4 3 2 5 B A C D E 1 5 2 4 3 B A C D E 1 3 2 4 5 B A C D E 1 2 5 3 4 B A C D E 1 2 5 3 4 B A C D E

8 Maximal Base CliqueTrees for 3HG2
aoa then oaa on the 6 cTrees (removing duplicates - no covers since aoa then oaa gives Maximal Cliques only). We get 34 MCs below. Thm: The 34 MCs are only MaxCliques. Pf: C=MaxClique={V,W,Z}. aoa{v,w,Z})={v,W’,Z}, WW’. oaa(aoa{v,w,Z})={V’,W’,Z}, V’V If w’W’-W then v’V-V’ (w’C, v’C) and then aoa{v’,w’,Z}={v’,W”,Z}, {w’,W} 1 4 3 2 5 1 2 5 3 4 B A C D E B A C D E 2 1 3 4 5 1 2 5 4 3 B A C D E 1 3 4 2 5 B A C D E 1 5 2 4 3 B A C D E 1 2 3 4 5 B A C D E 1 4 3 2 5 B A C D E 1 5 2 4 3 B A C D E 1 3 2 4 5 B A C D E 1 2 5 3 4 B A C D E 1 2 5 3 4 B A C D E

9 Stock-Day-Investor BaseCliqueTrees (leaves Inv)
Base CliqueTrees for 3PART HyperGraph, 3PHG2 {12345}=Investors recommending Stocks={ABCDE} on Days={,,,,}, 74 recommendations ACD  124 ABCDE  1234 ABCDE  124 AE  124 A  123 ABCD  12 B  ABCD  12 ABE  14 ABCDE  2345 ABCDE  12 CD  1234 CD  1234 CDE  234 CDE  234 oaa results E  E  E  E  A  B  C  C  C  D  D  D  D  2 aoa results Stock-Day-Investor BaseCliqueTrees (leaves Inv) ACD  ABCDE  ABCDE  AE  ABCD  ABCD  ABE  ABCDE  ABCDE  CD  CD  CDE  CDE  ACDE  ABCDE  oaa ABCD  aoa on these CD  D  1 3 1 3 1 4 1 3 1 3 1 2 1 2 1 5 1 3 1 2 1 4 1 2 1 4 1 3 1 2 1 4 1 2 1 4 1 3 1 1 3 1 2 1 4 1 3 1 3 ABCDE  CD  CD  A  B  C  D  oaa ABCD  124 ABC  124 ABCD  ABCD  124 ABCE  aoa on these ACD  AE  CD  124 C  124 D  124 ACD  ABCDE  AE  CD  CD  E  A  B  C  D  D  C  E  E  E  A  B  C  C  D  D  D  C  12 C  E  Stock-Investor-Day BaseCTrees (leaves Days) AC  1 ABCDE  oaa results ACDE  2 AE  4 ABC  1 ABCDE  2 ABCDE  ABCDE  CDE  ACDE  ABCD  ABCE  aao results C  12 C  E  2 E  B A C D E 1 2 3 4 5 CtS CtD CtI 1 5 1 5 1 2 1 4 1 4 1 4 1 2 1 2 1 1 5 1 5 1 3 1 3 1 3 1 5 1 3 1 3 1 3 1 5 1 3 1 4 AC  1 ABCDE  ACDE  2 AE  4 ABC  1 ABCDE  2 ABCDE  ABCDE  CDE  ACDE  ABCD  ABCE  ABCDE  ABE  Inv-Day-Stock BaseClTrees (leaves Stocks) aoa results ABCDE  ABE  aao results ACDE  ABCDE  B A C D E 1 2 3 4 5 CtS CtD CtI 1 4 1 5 1 3 1 4 1 1 5 1 5 1 5 1 5 1 5 1 5 1 1 5 1 5 1 5 1 4 1 4 1 3 aao ABCDE  ABCDE  23 ABCDE  124 aoa on these ABCDE  ABCDE  12 ABCDE  ABCDE  2 ABCDE  1 oaa AB  ABC  12 ABCDE  12 aao on these AC  12 A  B  C  12 AC  1 ACDE  2 ABCDE  ABCDE  ABCDE  ABCDE  ABCDE  ABCDE  ABCDE  2 ABCDE  ABCDE  ABCDE 

10 Edge Count Clique Thms Graph C is a clique iff |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!)
(VC,EC) is a k-clique iff  induced k-1 subgraph, (VD,ED) is a (k-1)-clique. Apriori Clique Mining Alg Uses an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet. By SGE, CCSk+1= all s of CSk pairs w k-1 common vertices. Let CCCSk+1 be a union of 2 k-cliques w k-1 common vertices. Let v,w be the kth vertices (different) of the w k-cliques: CCSk+1 iff (PE)(v,w)=1. Breadth-1st Clique Alg: CLQK=all Kcliques. Find CLQ3 w CS0. A Kclique and 3clique sharing an edge form a (K+1)clique iff all K-2 edges from the non-shared Kclique vertices to the non-shared 3clique vertex exist. Next find CLQ4, then CLQ5, … Depth-1st Clique Alg: Find a Largest MaxClique v. If (x,y)E and Count(NewPtSet(v,w,x,y)CLQ3pTree(v,w)&CLQ3pTree(x,y)): 0, 4 v’s form a max4Clique (i.e., v,w,x,y). 1, 5 v’s form a max5Clique (i.e., v,w,x,y,NewPt) 2, 6 v’s form max6Clique if NewPairE, else form 2 max5Cliques. 3, 7 v’s form max7Clique if each NewPairE, elseif 1 or 2 NewPairsE each 6VertexSets (vwxy + 2 EdgeEndpts) form Max6Clique, elseif 0 NewPairsE, each 5VertexSet (vwxy + 1 NewVertex) forms maximal 5Clique…. Theorem:  hCliqueNewPtSet, those h vertices together with v,w,x,y form a maximal h+4Clique, where NPS(v,w,x,y)=CLQ3(v,w)&CLQ3(x,y). GRAPH (linear edges, 2 vertices) kHYPERGRAPH (edges=k vertices) kPARTITE GRAPH (V=!Vi i=1..k (x,y)Ex,ysame Vi ) kPARTITE HYPERGRAPH (V=!Vi i=1..k (x1..xk)Exj,xjsame Vi ) 2graph=2hypergraph. Bipartite Clique Mining finds MaxCliques at cost of pairwise &s. Each LETpTreeMCLQ unless  pairwise & with same count.A&B, B w Ct(A&B)=Ct(A) is a MCLQ.  potential for a k-plex [k-core] mining alg here. Instead of Ct(A&B)=Ct(A), consider. E.g., Ct(A&B)=Ct(A)-1. Each such pTree, C, would be missing just 1vertex (1 edge). Taking any MCLQ as above, ANDing in CpTree would produce a 1-plex. ANDing in k such C’s would produce a k-plex. In fact, suppose we have produced a k-plex in such a manner, then ANDing in any C with Ct(C)=Ct(A)-h would produce a (K+h)-plex. &i=1..nAi is a [i=1..nCt(Ai)]-Core Tripartite Clique Mining Algorithm? In a Tripartite Graph edges must start and end in different vertex parts. E.g., PART1=tweeters; PART2=hashtags; PART3=tweets. Tweeters-to-hashtags is many-to-many? Tweeters-to-tweets is many-to-many (incl. retweets)?; hashtags-to-tweets is many-to-many? Multipartite Graphs Bipartite, Tripartite (have 2,3 PARTs resp.) … The rule is that no edge can start and end in the same PART. HyperClique Mining: A 3hyperGraph has 3 vertex PARTS and each edge is a planar triangle (vertex triple), one from each PART. Stock recommender is 3PARThyperGraph (Investors, Stocks, Days) A triangular "edge" connects Investor #k, Stock X, and Day n if k recommended X on day n. A 3PARThyperClique is a community s.t. all the investors in the clique recommend all the stocks in the clique on each of the days in the clique (A strong signal?) Tweet example: PART1=tweeters; PART2=hashtags; PART3=tweets. Conjecture: KmultiCliques and KhyperCliques in 1-1 corresp. (K vertex set)? So, one of the mining processes only? Represent these common objects w cliqueTrees (cTrees). Cliques, Kplexes. Kcores are subgraphs (communities) defined using internal edge count. A Motif is a subgraph defined using external “isomorphisms in the graph” counting. A motif must occur (isomorphically) in the graph more times than “expected”. Criticism: Some authors argue[62] motif structure does not necessarily determine function. Recent research[64] shows the connections of a motif to the network, is too important to draw function inferences just from local structure.[65] Research shows certain topological features of biological networks naturally give rise to canonical motifs,.[66] Are Stock-Inv or Stock-Inv-Day Motifs useful? Some questions/theorems/thoughts: All K-Paths are isomorphic (thus, there’s alway a Kpath motif) A ShortestKPath is an Induced subgraph. What does sequence FG(1PathMotif)=|V|, FG(2PathMotif),…tell us? Sequence of FG(Shortest1Path), FG(Shortest2Path), …? Sequence FG(MaxShortest1Path), FG(MaxShortest2Path)… tell us? where a MaxS2P is not part of a S3P. Extend to HyperEdges? What is a path in, e.g., a 3HyperGraph? Both? 2HGInterface3HyperGraphPath. 1HGI3HGP. (In general, hHGIkHGP, where 0<h<k) At the other extreme (all SPs are length=1: Or? I’ll bet most important motifs, M(V’,E’) in G are “Shortest Path Motifs”: x,yV’,  a G-ShortestPath in M running from x to y. I.e., M is made up of G-SPs. A Clique is a SPMotif (made up entirely of Shortest1Paths)

11 MOTIFs: Cliques, k-plexes, k-cores and other communities are subgraphs defined by internal edge count. A Motif is a subgraph defined by isomorphism count(external). Wikipedia: motifs are recurrent and statistically significant sub-graphs or patterns. They may reflect functional properties. Motif detection is computationally challenging. Most find induced Motifs. A graph, G′, is a subgraph of G (G′⊆G) if V′⊆V and E′⊆E∩(V′×V′). If G′⊆G and G′ contains all ‹u,v›∈E with u,v∈V′, G′ is induced sub-graph. G′ and G are isomorphic (G′↔G), if  a bijection f:V′→V with ‹u,v›∈E′⇔‹f(u),f(v)›∈E u,v∈V′. G″⊂G and  an isomorphism between G″ and G′, G′ appears in G). The number of appearances G′ in G is the frequency FG of G′ in G, FG(G’). G is recurrent or frequent in G, when FG(G’)>threshold (pattern=frequent subgraph). Motif discovery includes exact counting, sampling, pattern growth. Motif discovery has 2 steps: calculate the # of occurrences; evaluating the significance. Mfinder implements full enumeration and sampling. Brute force exact counting (Milo et al.[3], was computationally feasible only for small motifs of size < 5 vertices. Kashtan et al [9] edge sampling NM alg, estimate concentrations of induced subgraphs for directed or undirected networks starting from an edge (subgraph size 2) then continues choosing random nbr edges until subgraph size=n. Finaly the subgraph is expanded to include all of the edges that exist in the network between these n nodes. It finds motifs up to size=6 and thus, most significant motifs. mfinderSampling: Es=set of picked edges. Vs= set of all nodes that are touched by the edges in E. Initilize Vs and Es=. 1. Pick random edge, e1=(vi,vj). Update Es={e1}, Vs={vi,vj} 2. Make list L of all nbr edges of Es. Omit from L all edges between vertices in Vs Pick random edge e= {vk,vl} from L. Update Es=Es⋃{e}, Vs=Vs⋃{vk,vl}. 4. Repeat 2-3 until |Vs|=n. 5. Calculate the probability to sample the picked n-node subgraph. Apply to G9 below: A 1 3 B 1 3 C 1 6 D 1 4 E 1 8 F 1 8 G 1 a H 1 e I 1 c J 1 5 K 1 4 L 1 6 M 1 3 N 1 3 1 2 3 4 5 6 7 8 9 a b v d e f g h i 1 8 2 1 7 3 1 8 4 1 7 5 1 4 6 1 4 7 1 4 8 1 3 9 1 4 a 1 4 b 1 4 c 1 6 d 1 7 e 1 8 f 1 5 g 1 2 h 1 2 i 1 2 B A C D E F G H I J K L M N


Download ppt "Rotate! Base Clique Motifs Bipartitie graph, G9.1:"

Similar presentations


Ads by Google