Download presentation
Presentation is loading. Please wait.
Published by윤태 왕 Modified over 6 years ago
1
The Edge pTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of the graph, G1 G1 1 2 3 4 (predicate for E is NotPureZero) First, construct the stride=|V|, two-level Edge pTree, then all others are constructed concurrently from it. key 1,1 1,2 1,4 1,3 2,1 2,3 2,2 2,4 3,1 3,3 3,2 3,4 4,1 4,2 4,3 4,4 EG1 1 E one-level 1 2 3 4 2LEG1 E two-level stride=|V|=4 PTG1 extends 2LEG1 1 2 3 4 PTG1 1 3 1 4 2 4 1 3 1 3 4 1 4 1 4 3 1 APTG1 1 2 3 4 1 3 4 3 4 1 2 4 1 4 2 3 1 3 1 4 1 4 3 3 1 4 1 3 4 All are 3 hop cycles. Each has 3 start pts and 2 directions. Each repeat 6 times. 6/6=1 3hop cycles (1341) CLG1 SPTG1 (initially E) SPTG1 1 2 3 4 SPTG1, initially E1=SP1,1=SPSF1 E2=SP2,1=SPSF2 E3=SP3,1=SPSF3 E4=SP4,1=SPSF4 1 2 3 4 SPSFk 1 3 1 4 2 4 1 3 1 3 4 1 4 1 4 3 1 1341 1431 1 2 2 1 3 2 1 3143 1 3 4 1 4 2 2 3 4 1 3 1 4 3413 1 2 2 1 3 1 2 4134 4314 kListPT3hij PT4hijk=Ek after zeroing i and j bits of Ek To extend 2LE to PT: kListEh PT2hk=Ek after zeroing the h bit of Ek kListPT2hj PT3hjk=Ek after zeroing the j bit of Ek At this point the SPT is completed. For Big Graphs, could stop here (e.g., Friends has ~1B vertices but a diameter of 4, so we would only need to build PT 4-hop paths) and possible expressed as a tree of lists rather than a tree of bitmaps. Also, for sparse BigGraphs, E could be leveled further and/or a tree of lists (then APT and SPT will be also). SPT(G)k (with k turned on) is a mask (where >0 means “yes”) for connectivity comp, COMP(G)k, containing the vertex, vk. For a bitmap of COMPk bit-slicing SPT (SPTk,h ... SPTk,0 k=1…|V|), then COMPk ORj=h..0SPTk,h. Also, the SPT structure may be more useful expressed as separate “categorical” bitmaps for each Shortest Path Length (SPk,h h=1..H. We also keep a mask of Shortest Paths so far, SPSFk vertex, k. With each new SP bitmap, SPB, SPSFkSPSFk | SPB and SPk,h+1 SPB & SPSFk. SPT is a rich data structure. It provides the Connectivity Component Partition; Maximal Cliques (go across SPk,1 and then look within subsets of those k’s for commonality); Note, Cliques are 0-plexes. Each mask, SPk,1 masks a 1-plex. Each SPk,1&SPk,2 masks a 2-plex (which is SPSFk,2? So if we save each SPSF instead of overwriting, we will have the k-plex masks without any further work??), etc.
2
The EdgepTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of a graph, G5 1 2 3 4 5 6 8 7 PTG5 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 EG5 2-level str=8 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 1 2 1 5 1 7 2 1 3 6 1 3 8 1 4 2 1 5 1 5 7 1 6 3 1 6 8 1 7 1 7 5 1 8 3 1 8 6 1 1 5 7 1 7 5 5 1 2 7 1 2 3 8 6 1 3 6 8 1 1 2 4 5 1 2 5 1 7 5 7 1 3 6 8 1 8 6 3 1 7 1 2 7 1 5 1 5 7 8 6 3 1 8 3 6 1 4 2 5 1 4 2 7 1 7 5 2 1 APTG5 CLG5 1571 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 1751 3683 3863 5175 1 2 1 5 1 7 2 1 3 6 1 3 8 1 4 2 1 5 1 5 7 1 6 3 1 6 8 1 7 1 7 5 1 8 3 1 8 6 1 5715 6386 6836 7157 7517 2 1 5 2 1 7 4 2 1 5 1 2 2 1 7 7 5 1 8368 8638 PT Clique Miner Algorithm A clique is all cycles Extend to a k-plex (k-core) mining algorithm? PT(=APT+CL), SPT are powerful datamining tools with closure properties (to eliminate branches) . SPTG5 1 2 1 2 1 2 1 3 1 4 2 1 3 4 2 1 4 1 5 1 2 3 5 1 5 1 2 6 1 7 1 2 7 1 7 1 2 3 8 1 Max clique Mining A kCycle is a kClique iff it’s found in CLk as PERM(k-1,k-1)/2=(k-1)!/2 kCycles (e.g., vertices are repeated in CL for 3cycles, 2!/2=1; 4cycles, 3!/2=3; 5cycles, 4!/2=12; 6cycles, 5!/2=60. 4 1 2 5 4 1 2 7 7 1 5 2 Downward closure: Once, a 4cycle is established as a 4clique (by the fact that {1,2,3,4} occurs 3!/2=3 times in CL), all 3vertex subsets are 3cliques {1,2,3},{1,2,4},{1,3,4}, so no need to check further. k-plex (missing k edges) mining alg? k-core (has k edges) mining alg? Density (internal edge density >> external|avg) mining alg? Degree (internal vertex degree >> external|avg) mining alg? DiameterG5 is max{Diameterk} = max{ 2,2,1,3,2,1,3,1}=3. Connected comp containing V1, COMP1={1,2,4,5,7}. Pick 1st vertex not in COMP1,3, COMP3 ={3,6,8}. Done. The partition is { {1,2,4,5,7}, {3,6,8} }. To pick the first vertex not in COMP1, mask off COMP1 with SPTv1’ and then pick the first vertex in this complement.
3
EdgepTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of a graph, G3 1 2 4 3 7 6 G3 5 1,2 1,1 key 1,3 1,5 1,4 1,7 1,6 2,2 2,1 2,3 2,5 2,4 2,7 2,6 3,2 3,1 3,4 3,3 3,5 3,7 3,6 4,2 4,1 4,4 4,3 4,7 4,6 4,5 5,2 5,1 5,4 5,3 5,6 5,5 5,7 6,2 6,1 6,4 6,3 6,6 6,5 7,1 6,7 7,2 7,4 7,3 7,6 7,5 7,7 PE 1 PTG3 EG3 2-level str=7 1 2 1 3 1 4 1 5 1 6 1 7 1 1 2 3 4 5 6 7 SPTG3 1 2 1 3 1 4 1 5 1 6 1 7 1 CLG3 APTG3 1 2 1 3 1 4 1 5 1 6 1 7 1
4
1 2 3 4 5 6 7 G2 1 2 4 3 6 7 5 This is a repeat slide from last week showing a possible way of updating PT when the graph grows an edge. 1 2 1 3 1 4 1 6 2 1 2 3 1 2 4 1 3 1 3 2 1 3 4 1 4 1 4 2 1 4 3 1 5 6 1 6 1 7 6 1 1 2 3 1 2 4 2 3 1 4 3 1 2 4 1 3 4 1 2 1 3 2 1 4 2 1 6 2 3 1 3 2 4 1 2 4 1 4 2 3 1 2 1 3 3 1 4 3 1 6 1 2 3 3 4 2 1 1 4 3 3 2 4 1 2 1 4 3 1 4 6 1 4 1 2 4 4 3 2 1 1 3 4 4 2 3 1 1 6 5 2 1 6 3 1 6 4 1 6 1 6 7 1 4 3 2 1 3 4 2 1 4 2 3 1 2 4 3 1 3 2 4 1 2 3 4 2 4 3 1 2 3 4 1 2 4 1 3 2 3 6 1 2 1 4 3 2 3 1 4 2 6 1 4 2 3 4 1 3 4 2 1 3 2 4 1 3 6 1 2 3 4 2 1 3 2 1 4 3 6 1 4 3 1 2 4 4 3 2 1 4 3 1 2 4 3 1 2 4 6 1 2 4 1 3 2 4 2 1 3 4 6 1 3 4 1 2 3 5 2 1 6 5 3 1 6 5 4 1 6 6 3 2 1 6 4 2 1 6 2 3 1 6 4 3 1 6 2 4 1 6 3 4 1 7 2 1 6 7 3 1 6 7 4 1 6 G3 = G2 with an additional edge, (5,7) G3 1 2 4 3 6 7 5 1 2 3 4 5 6 7 2 1 4 3 6 2 1 3 4 6 3 1 4 2 6 3 1 2 4 6 4 1 3 2 6 4 1 2 3 6 5 2 1 6 3 5 2 1 6 4 5 3 1 6 2 5 3 1 6 4 5 4 1 6 2 5 4 1 6 3 7 2 1 6 3 7 6 2 1 4 7 3 1 6 2 7 6 3 1 4 7 6 4 1 2 7 6 4 1 3 PT update alg? Copy kpaths from G2, add new ones 1 3 4 6 2 5 7 5 7 1 7 5 1 3 2 1 4 6 5 7 7 5 6 1 7 6 5 1 1 3 2 4 6 5 7 2 1 5 6 7 3 1 5 6 7 4 1 5 6 7 2 6 1 3 5 7 4 3 2 1 4 6 5 7 3 2 1 4 5 6 7 Next we complete the additional level (6paths exist now whereas they didn’t before.)
5
1 2 3 4 The 1 hop paths = the Edge table, E; as an adjacency matrix;
E2key 1,1,1 1,1,4 1,1,3 1,1,2 1,2,2 1,2,1 1,2,4 1,2,3 1,3,2 1,3,1 1,4,1 1,3,4 1,3,3 1,4,3 1,4,2 2,1,1 1,4,4 2,1,2 2,1,3 2,2,2 2,2,1 2,1,4 2,2,4 2,2,3 2,3,2 2,3,1 2,3,3 2,4,1 2,3,4 2,4,3 2,4,2 3,1,1 2,4,4 3,1,2 3,1,3 3,2,1 3,1,4 3,2,2 3,2,4 3,2,3 3,3,2 3,3,1 3,3,4 3,3,3 3,4,3 3,4,2 3,4,1 4,1,1 3,4,4 4,1,3 4,1,2 4,2,1 4,1,4 4,2,2 4,2,4 4,2,3 4,3,2 4,3,1 4,3,4 4,3,3 4,4,3 4,4,2 4,4,1 4,4,4 PE2 1 V2 2 3 4 V1 V3 Ekey 1,1 1,2 1,4_ 1,3 2,1 2,3 2,2 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 PE 1_ 1 PE3 1 , E3key 1,1,1 1,1,4 1,1,3 1,1,2 1,2,1 1,2,4 1,2,3 1,2,2 1,3,1 1,3,4 1,3,3 1,3,2 1,4,1 1,4,4 1,4,3 1,4,2 2,1,1 2,1,4 2,1,3 2,1,2 2,2,1 2,2,4 2,2,3 2,2,2 2,3,1 2,3,4 2,3,3 2,3,2 2,4,2 2,4,1 3,1,1 2,4,4 2,4,3 3,1,2 3,2,1 3,1,4 3,1,3 3,2,2 3,3,1 3,2,4 3,2,3 3,3,2 3,4,1 3,3,4 3,3,3 3,4,2 4,1,1 3,4,4 3,4,3 4,1,2 4,2,1 4,1,4 4,1,3 4,2,2 4,3,1 4,2,4 4,2,3 4,3,2 4,4,1 4,3,4 4,3,3 4,4,3 4,4,2 4,4,4 2 3 4 V2 1 2 3 4 V1 1 2 3 4 E1 The 1 hop paths = the Edge table, E; as an adjacency matrix; as a stride=|V|=4, 2-level pTree from E G1 1 2 3 4 V3 1 2 3 4 V4 V2 V1 V3 1 2 3 4 V4 V2 V1
6
SG Clique Mining 1,2 1,1 key 1,3 1,5 1,4 1,7 1,6 2,2 2,1 2,3 2,5 2,4 2,7 2,6 3,2 3,1 3,4 3,3 3,5 3,7 3,6 4,2 4,1 4,4 4,3 4,7 4,6 4,5 5,2 5,1 5,4 5,3 5,6 5,5 5,7 6,2 6,1 6,4 6,3 6,6 6,5 7,1 6,7 7,2 7,4 7,3 7,6 7,5 7,7 PE 1 2 4 3 7 6 G3 5 K=2: 2Cliques (2 vertices): Find endpts of each edges (Int((n-1)/7)+1, Mod(n-1,7) +1) 1 2 4 3 6 G2 7 5 key 1,1 1,3 1,2 1,5 1,4 1,6 2,1 1,7 2,3 2,2 2,5 2,4 2,6 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 4,1 4,3 4,2 4,5 4,4 4,7 4,6 5,2 5,1 5,3 5,5 5,4 5,7 5,6 6,2 6,1 6,3 6,4 6,6 6,5 6,7 7,2 7,1 7,3 7,4 7,6 7,5 7,7 E 1 EU 1 1 2 4 3 6 5 8 7 10 9 20 30 40 C 1 CU 1 6 k=3: k=4: 1234 ( are cliques) 123,134 ,134 , 234 ,2341234. 1234 only 4-clique Using the EdgeCount thm: on C={1,2,3,4}, CU=C&EU C is a clique since ct(CU)=comb(4, 2)=4!/2!2!=6 have 124CS3 PE(1,4)=1 134CS3 Have 123CS3 PE(2,3)=1 234CS3 Have k=2: E= already have 567 PE(2,3)=1 So 123CS3 PE(2,4)=1 124CS3 PE(2,6)=0 PE(6,7)=1 567CS3 PE(1,7)=0 PE(1,5)=0 PE(2,4)=1 1234CS4 Have 1234 k=3: EC, requires counting 1’s in mask pTree of each Subgraph (or candidate Clique, if take the time to generate the CCSs – but then clearly the fastest way to finish up is simply to lookup the single bit position in E, i.e., use EC). EdgeCount Algorithm (EC): |PUC| = (k+1)!/(k-1)!2! then CCCS The SG alg only needs Edge Mask pTree, E, and a fast way to find those pairs of subgraphs in CSk that share k-1 vertices (then check E to see if the two different kth vertices are an edge in G. Again this is a standard part of the Apriori ARM algorithm and has therefore been optimized and engineered ad infinitum!) PE(2,3)=1 234CS3 key 1,1 1,3 1,2 1,5 1,4 1,7 1,6 2,2 2,1 1,8 2,4 2,3 2,5 2,6 2,8 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 3,8 4,2 4,1 4,4 4,3 4,6 4,5 4,8 4,7 5,3 5,2 5,1 5,5 5,4 5,7 5,6 6,1 5,8 6,3 6,2 6,4 6,6 6,5 6,8 6,7 7,3 7,2 7,1 7,5 7,4 7,6 7,7 8.1 7,8 8,2 8,4 8,3 8,6 8,5 8.8 8,7 E 1 k=3: 2 4 3 6 G4 7 5 8 PE(1,4)=1 134CS3 Have PE(4,8)=1 248CS3 PE(4,8)=1 348CS3 PE(4,8)=1 12348CS5 have have k=2: k=4: PE(2,3)=1 123CS3 PE(2,4)=1 124CS3 PE(2,8)=1 128CS3 PE(2,6)=0 PE(3,8)=1 138CS3 PE(4,8)=1 148CS3 PE(1,7)=0 PE(1,5)=0 PE(6,8)=0 PE(3,8)=1 238CS3 have PE(6,7)=1 567CS3 have k=5: = CS5. PE(3,8)=1 1238CS4 PE(4,8)=1 1248CS4 PE(3,8)=1 1348CS4 Have PE(2,4)=1 1234CS4 PE(4,8)=1 2348CS4
7
APPENDIX 1 2 3 4 Always use pop-count for 1-counts as we AND, then C is a clique iff all C level-1 counts are |VC|-1. In fact one can mine out all cliques by just analyzing the PT counts G1 Note: If one creates PT, lots of tasks become easy! E.g., clique mining, shortest path mining, degree community mining, density community mining! What else? A k-plex is a maximal subgraph in which each vertex is adjacent to all other vertices of the subgraph except at most k of them. A k-core is a maximal subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. In any graph there is a whole hierarchy of cores of different order. k-plex existence alg (using the GPpT): C is a k-plex iff vC|Cv| |VC|2–k2 k-plex inheritance thm: Every induced subgraph of a k-plex is a k-plex. Mine all max k-plexes: Use |Cv| vC k-core inheritance thm: If a cover of G by induced k-cores, G is a k-core. k-core existence alg (using the GPpT): C is a k-core iff vC, |VC| k. Mine all max k-cores: Use |Cv| vC Community=subgraph w more edges inside than linked to its outside. Clique=community s.t. edge between each vertex pair. Recommenders: # edges = 1MB (1015) Gene-Gene Ints: # edges = 1B (109) Person-Tweet Security: # edges = 7B*10K= 1014 Friends Social: # edges = 4BB (1018) Stock-Price: # edges = 1013 Ekey V1 V2 ELabel 1,3 1 | 3 1 1,4 1 | 4 2 2,4 2 | 4 3 3,4 3 | 4 1 PEL.,1 1_ 1 PEL.,0 0_ EL 2_ 3_ 2 3 PE Ekey 1,1 1,2 1,4_ 1,3 2,1 2,3 2,2 2,4_ 3,1 3,2 3,3 4,1 3,4_ 4,2 4,4 4,3 E=Adj matrix 4:3 3:2 2:3 1:2 V1 As a V2 Rolodex card C PEC=PE&PC P1 P2 P3 P4 PVL,1 PVL,0 PC PUC=PU&PC PUC Ct=3 PUD Ct=1 PUF Ct=2 PUH 12 (C=Induced SubGraph with VC={1,3,4}) PU Bit offset 4 5 7 6 8 10 9 11 13 14 16 15 V (vertex tbl) Vkey VL 1 2 2 3 3 2 4 3 An Induced SubGraph (ISG) C, is a subgraph that inherits all of G’s edges on its own vertices. A k-ISG (k vertices), C, is a k-clique iff all of its (k-1)-Sub-ISGs are (k-1)-cliques. Clique Existence Alg is induced SG a clique. Edge Count existence thm (EC): |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) Apply EC 3vertex ISGs (3-Clique iff |PU|= 3!/(2!1!)=3) VC={1,3,4} VD={1,2,3} VF={1,2,4} VH={2,3,4} C only 3-Clique. SubGraph existence theorem (SG): (VC,EC) is a k-clique iff every induced k-1 subgraph, (VD,ED) is a (k-1)-clique. SG or EC better? Extend to quasi-cliques? Extend to mine out all cliques? A Clique Mining algorithm finds all cliques in a graph. For Clique-Mining we can use an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet By the SG clique thm, CCSk+1= all s of CSk pairs having k-1 common vertices. Let CCCSk+1 be a union of two k-cliques with k-1 common vertices. Let v and w be the kth vertices (different) of the two k-cliques, then CCSk+1 iff (PE)(v,w)=1. (We just need to check a single bit in PE.) Form CCSk+1: Union CSk pairs sharing k-1 vertices, check single PE bit Below, k=2, so we check edge pairs sharing 1 vertex, then check the 1 new edge bit in PE. CS2=E={ } The only expensive part of this is forming CCSk. And that is expensive only for CCS3 (as in Apriori ARM) PE(2,3)=PE(4*[2-1]+3=7)=0 PE(3,4) = PE(4*[3-1]+4=12)=1 134CS3 Already have 134 PE(1,2) = PE(4*[1-1]+2=2)=0 Next? List out CS3 = {134} form CCS4 = . Done. Internal degree of C, kCint =vC kvint kvext External degree of C, kCext =vC 2=|PC&PE&Pv1|=kv1int 2=|PC&PE&Pv3| =kv3int 2=|PC&PE&Pv4|=kv4int 6=kCint Int/Ext degree of v∈C, kvint/wxt=# edges v to wC/C’ 0=|P’C&PE&Pv3|=kv3ext 1=kCext Total degree of C, kC= +kCext kCint kC=7 0=|P’C&PE&Pv1|=kv1ext 1=|P’C&PE&Pv4|=kv4ext Intra-cluster density δint(C)=|edges(C,C)|/(nc(nc−1)/2)=|PE&PC&PLT|/(3*2/2)=3/3=1 Inter-cluster density δext(C)=|edges(C,C’)| / (nc(n-nc)) =|PE&P’C&PLT|=1/(3*1)=1/3 δintC- δextC=1–1/3=2/3 Tradeoff between large δint(C) and small δext(C) is goal of many community mining algorithms. A simple approach is to Maximize differences. Density Difference algorithm for Communities: δint(C)−δext(C) >Threshold? Degree Difference algorithm: kCint – kCext > Threshold? Easy to compute w pTrees, even for Big Graphs. Graphs are ubiquitous for complex data in all of science. Ignoring Subgraphs of 2 vertices, the four 3-vertex subgraphs are: C={1,3,4}, D={1,2,3}, F={1,2,4}, H={2,3,4} δint(D) =|PE&PD&PLT|/(3*2/2)=1/3 δext(D)=|PE&P’D&PLT|=1/(3*1)=3/3=1 δintD - δextD=1/3–1=-2/3 D δint(H) =|PE&PH&PLT|/(3*2/2)=2/3 δext(H)=|PE&P’H&PLT|=1/(3*1)=2/3 δint(F) =|PE&PF&PLT|/(3*2/2)=2/3 δext(F)=|PE&P’F&PLT|=1/(3*1)=2/3 δintF - δextF=2/3-2/3=0 F δintH - δextH=2/3-2/3=0 H
8
APPENDIX Path Tree PT 1 2 1 3 1 4 1 Graph diameter? diamk is max of min lengths from k to other vertices. k, record 1st kh hk. Shortest Path VkVh: Descend from Ek until you reach the 1st h. SPG1(1,2)? 1? No key 1,1 1,2 1,4 1,3 2,1 2,3 2,2 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 PE 1 Diam1=max{fo12 fo13 fo14}=max{ }=2 Diam2=max{fo21 fo23 fo24}=max{2 2 1}=2 Diam3=max{fo31 fo32 fo34}=max{1 2 1}=2 Diam4=max{fo41 fo42 fo43}=max{111}=1 DiamG1=maxkV(Diamk)=2 1 3 1 4 2 4 1 3 1 3 4 1 4 1 4 3 1 1? SP1,2=132 kListE3hij E4hijk=Ek kill i,j … kListEh, E2hk=Ek kill h kListE2hj, E3hjk=Ek kill j G1 1 2 3 4 4 3 1 2 4 1 3 4 1 1 4 2 2 3 4 1 4 1 3 1 4 3 2 4 3 3 1 4 4 3 1 G2 1 2 4 3 6 7 5 Diam1=2 Diam2=3 Diam3=3 Diam4=3 Diam5=3 Diam6=2 Diam7=3 DiamG2=3 SP in G2? SPG2(7,2)? 1? no SPG2(1,5)? 1? no 1 3 4 6 2 5 7 G3 = G2 with an additional edge, (5,7) 1? no Y SP15= G3 1 2 4 3 6 7 5 1 2 3 4 5 6 7 3 2 1 4 6 5 7 y SP72 =7612 MPP update alg? Copy kpaths from G2, add new ones 1 3 2 4 6 5 7 1 3 4 6 2 5 7 5 7 1 7 5 1 2 4 3 6 1 5 7 3 2 1 4 6 5 7 7 5 6 1 7 6 5 1 1 3 2 4 6 5 7 2 1 5 6 7 3 1 5 6 7 4 1 5 6 7 2 6 1 3 5 7 4 3 2 1 4 6 5 7 3 2 1 4 5 6 7 Next we complete the additional level (6paths exist now whereas they didn’t before.)
9
Unique Path Multilevel Map (UPMM) Bit maps at having the same levels as pTrees, but not a pTree. Caution: This copy of G2 gets modified during animation. An unaltered copy is below. To form UPMM from PMP: Go Top-bottom, Left-right: Eliminate paths ending with the start vertex after all pTrees with that start vertex are included Please note that UPMM is not a pTree but collections of leveled bitmaps (there are no inter-level pointers making it a pTree. 1 2 1 2 1 3 1 3 1 3 1 4 1 4 1 4 1 4 5 1 6 1 6 1 6 1 7 1 7 1 3 2 4 2 3 1 4 6 3 2 1 4 6 4 1 2 3 6 5 6 2 1 5 1 6 2 5 6 2 1 5 1 6 3 5 6 3 1 5 6 3 1 5 1 6 4 5 1 6 4 5 1 6 4 6 2 1 3 6 2 1 3 6 2 1 4 6 1 4 2 6 3 1 2 6 3 1 2 6 3 1 4 6 3 1 4 6 4 1 2 6 1 2 4 6 4 1 3 6 4 1 3 7 6 2 1 7 1 6 2 7 6 2 1 7 1 6 3 7 1 6 3 7 1 6 3 7 1 6 4 7 1 6 4 7 1 6 4 1 3 4 6 2 2 1 2 3 1 2 3 1 2 4 1 2 4 1 3 1 3 1 3 2 1 3 2 1 3 4 3 4 1 3 4 1 4 1 4 1 4 1 4 2 1 4 2 1 4 2 4 3 1 4 3 1 4 3 5 6 1 5 6 1 6 1 6 1 6 1 6 1 7 6 7 6 1 7 6 1 2 4 3 6 1 2 3 4 6 1 3 4 2 6 1 3 2 4 6 1 4 3 2 6 1 4 2 3 6 1 5 2 1 6 3 5 1 6 3 2 5 1 6 4 2 5 2 1 6 4 5 1 6 2 3 5 3 1 6 2 5 3 1 6 4 5 1 6 4 3 5 4 1 6 2 5 1 6 2 4 5 4 1 6 3 5 1 6 3 4 7 2 1 6 3 7 1 6 3 2 7 3 1 6 2 7 1 6 2 3 1 2 3 1 2 4 2 3 1 4 3 1 2 4 1 3 4 1 2 1 3 7 6 2 1 4 7 2 1 6 4 7 6 3 1 4 7 3 1 6 4 7 6 4 1 2 7 4 1 6 2 7 6 4 1 3 7 4 1 6 3 2 1 3 2 1 4 2 1 6 2 3 1 3 2 4 1 3 2 4 1 2 4 1 4 2 3 1 4 2 3 1 3 1 4 3 1 4 3 1 6 1 2 3 3 4 2 1 3 4 2 1 1 4 3 3 4 1 3 2 4 1 3 2 4 1 2 1 4 2 1 4 3 1 4 3 1 4 6 1 4 1 2 4 1 2 4 2 4 3 1 4 3 2 1 1 3 4 1 3 4 3 4 2 1 4 2 3 1 1 6 5 1 6 5 1 6 5 1 6 5 2 1 6 2 1 6 6 1 2 3 1 6 3 1 6 6 1 3 4 1 6 4 1 6 4 1 6 1 6 7 1 6 7 1 6 7 1 6 7 1 2 1 3 1 5 1 6 1 UPMM(G2) 1 2 4 3 2 4 3 1 6 3 4 2 1 6 4 2 3 1 6 1 3 4 6 2 2 1 2 3 1 2 3 1 2 4 1 2 4 1 3 1 3 2 1 4 1 5 6 1 2 1 4 3 6 2 1 3 4 6 3 1 4 2 6 3 1 2 4 6 4 2 1 3 6 4 3 1 2 6 1 2 3 4 3 2 4 1 6 6 1 3 2 4 2 1 4 3 6 1 2 1 3 1 4 1 5 1 6 1 7 1 PMP(G2) G2 3 6 7 1 4 2 5 Level 4 1 2 4 3 1 2 3 4 1 3 4 2 1 3 2 4 1 4 3 2 1 4 2 3 2 1 4 3 2 1 3 4 2 3 4 1 2 1 3 6 2 3 1 4 2 4 3 1 2 4 6 1 2 3 4 1 3 1 4 2 3 1 2 4 3 2 6 1 3 2 1 4 3 4 2 1 3 4 6 1 3 4 1 2 4 1 3 2 4 2 3 1 4 2 3 1 4 2 6 1 4 2 1 3 4 3 2 1 4 3 6 1 4 3 1 2 5 6 2 1 5 6 3 1 5 6 4 1 6 1 3 2 6 1 4 2 6 1 2 3 6 1 4 3 6 1 2 4 6 1 3 4 7 6 2 1 7 6 3 1 7 6 4 1 1 2 1 3 1 4 1 6 2 1 2 3 1 2 4 1 3 1 3 2 1 3 4 1 4 1 4 2 1 4 3 1 5 6 1 6 1 7 6 1 Level 1 Level 3 2 1 4 3 6 5 7 3 2 1 4 2 1 1 3 2 1 3 4 1 4 2 1 4 3 2 1 3 4 1 2 6 1 2 2 3 1 3 2 4 1 1 4 2 4 2 3 1 3 1 2 4 1 3 3 1 6 1 2 3 3 4 2 1 1 4 3 3 2 4 1 4 1 2 4 1 3 4 1 6 4 2 1 2 4 3 1 4 3 1 3 4 2 1 5 6 1 6 1 2 6 1 3 6 1 4 7 6 1 Level 0 Level 2
10
SubGraph Path pTrees E1 1 2 3 4 E2 E3 C
1 C1 1 E2 1 1 E3 1 C3 E4 1 1 C4 &1= 1 1 1 1 2 3 4 E2 13 1 1 13 1 14 1 1 14 1 24 1 31 1 1 31 1 34 1 1 34 1 41 1 1 41 1 42 1 42 43 1 1 43 1 E3 G1 C in orange PC= 1 241 1 1 243 134 1 134 1 142 142 1 143 1 143 1 314 1 314 1 341 1 1 341 1 342 342 413 1 1 1 1 1 1 413 1 431 1 431 1 1 To get the C Path pTree, remove all C’ pTrees. & each G pTree with PC. Kill the 2nd bit (or keep vertex2 having no incident edges (then all pTrees are the same depth and can operate on each other. Diameter of C? Cdiamk is the max of the min path lengths from k to the other Cvertices. For each k, proceed down from Ck a level at a time and record the first occurrence of kh , hk. CDiam1=max{fo13 fo14}=max{11}=1 Diam3=max{fo31 fo34}=max{11}=1 Diam4=max{fo41 fo43}=max{11}=1 DiamC = maxkV(Diamk) = 1 Always use pop-count for 1-counts as we AND, then C is a clique iff all C level-1 counts are |VC|-1. In fact one can mine out all cliques by just analyzing the G level=1 counts. Note: If one creates the G Path pTree, lots of tasks become easy! E.g., clique mining, shortest path mining, degree community mining, density community mining! What else? A k-plex is a maximal subgraph in which each vertex is adjacent to all other vertices of the subgraph except at most k of them. A k-core is a maximal subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. In any graph there is a whole hierarchy of cores of different order. k-plex existence alg (using the GPpT): C is a k-plex iff vC|Cv| |VC|2–k2 k-plex inheritance thm: Every induced subgraph of a k-plex is a k-plex. Mine all max k-plexes: Use |Cv| vC k-core inheritance thm: If a cover of G by induced k-cores, G is a k-core. k-core existence alg (using the GPpT): C is a k-core iff vC, |VC| k. Mine all max k-cores: Use |Cv| vC Ekey V1 V2 ELabel 1,3 1 | 3 1 1,4 1 | 4 2 2,4 2 | 4 3 3,4 3 | 4 1 PEL.,1 1_ 1 PEL.,0 0_ EL 2_ 3_ 2 3 PE Ekey 1,1 1,2 1,4_ 1,3 2,1 2,3 2,2 2,4_ 3,1 3,2 3,3 4,1 3,4_ 4,2 4,4 4,3 E=Adj matrix 4:3 3:2 2:3 1:2 V1 As a V2 Rolodex card C PEC=PE&PC P1 P2 P3 P4 PVL,1 PVL,0 PC PUC=PU&PC PUC Ct=3 PUD Ct=1 PUF Ct=2 PUH 12 (C=Induced SubGraph with VC={1,3,4}) PU Clique=community s.t. edge between each vertex pair. Bit offset 4 5 7 6 8 10 9 11 13 14 16 15 V (vertex tbl) Vkey VL 1 2 2 3 3 2 4 3 Community=subgraph w more edges inside than linked to its outside. Gene-Gene Ints: # edges = 1B (109) Friends Social: # edges = 4BB (1018) Recommenders: # edges = 1MB (1015) Stock-Price: # edges = 1013 Person-Tweet Security: # edges = 7B*10K= 1014 An Induced SubGraph (ISG) C, is a subgraph that inherits all of G’s edges on its own vertices. A k-ISG (k vertices), C, is a k-clique iff all of its (k-1)-Sub-ISGs are (k-1)-cliques. Clique Existence Alg is induced SG a clique. Edge Count existence thm (EC): |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) Apply EC 3vertex ISGs (3-Clique iff |PU|= 3!/(2!1!)=3) VC={1,3,4} VD={1,2,3} VF={1,2,4} VH={2,3,4} C only 3-Clique. SubGraph existence theorem (SG): (VC,EC) is a k-clique iff every induced k-1 subgraph, (VD,ED) is a (k-1)-clique. SG or EC better? Extend to quasi-cliques? Extend to mine out all cliques? A Clique Mining algorithm finds all cliques in a graph. For Clique-Mining we can use an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet By the SG clique thm, CCSk+1= all s of CSk pairs having k-1 common vertices. Let CCCSk+1 be a union of two k-cliques with k-1 common vertices. Let v and w be the kth vertices (different) of the two k-cliques, then CCSk+1 iff (PE)(v,w)=1. (We just need to check a single bit in PE.) Form CCSk+1: Union CSk pairs sharing k-1 vertices, check single PE bit Below, k=2, so we check edge pairs sharing 1 vertex, then check the 1 new edge bit in PE. CS2=E={ } The only expensive part of this is forming CCSk. And that is expensive only for CCS3 (as in Apriori ARM) PE(2,3)=PE(4*[2-1]+3=7)=0 PE(3,4) = PE(4*[3-1]+4=12)=1 134CS3 Already have 134 PE(1,2) = PE(4*[1-1]+2=2)=0 Next? List out CS3 = {134} form CCS4 = . Done. Internal degree of C, kCint =vC kvint 6=kCint kvext External degree of C, kCext =vC 2=|PC&PE&Pv1|=kv1int 2=|PC&PE&Pv3| =kv3int 2=|PC&PE&Pv4|=kv4int Int/Ext degree of v∈C, kvint/wxt=# edges v to wC/C’ Total degree of C, kC= +kCext kCint kC=7 0=|P’C&PE&Pv1|=kv1ext 0=|P’C&PE&Pv3|=kv3ext 1=|P’C&PE&Pv4|=kv4ext 1=kCext Intra-cluster density δint(C)=|edges(C,C)|/(nc(nc−1)/2)=|PE&PC&PLT|/(3*2/2)=3/3=1 Inter-cluster density δext(C)=|edges(C,C’)| / (nc(n-nc)) =|PE&P’C&PLT|=1/(3*1)=1/3 δintC- δextC=1–1/3=2/3 Tradeoff between large δint(C) and small δext(C) is goal of many community mining algorithms. A simple approach is to Maximize differences. Density Difference algorithm for Communities: δint(C)−δext(C) >Threshold? Degree Difference algorithm: kCint – kCext > Threshold? Easy to compute w pTrees, even for Big Graphs. Graphs are ubiquitous for complex data in all of science. Ignoring Subgraphs of 2 vertices, the four 3-vertex subgraphs are: C={1,3,4}, D={1,2,3}, F={1,2,4}, H={2,3,4} δint(D) =|PE&PD&PLT|/(3*2/2)=1/3 δext(D)=|PE&P’D&PLT|=1/(3*1)=3/3=1 δintD - δextD=1/3–1=-2/3 D δint(H) =|PE&PH&PLT|/(3*2/2)=2/3 δext(H)=|PE&P’H&PLT|=1/(3*1)=2/3 δint(F) =|PE&PF&PLT|/(3*2/2)=2/3 δext(F)=|PE&P’F&PLT|=1/(3*1)=2/3 δintF - δextF=2/3-2/3=0 F δintH - δextH=2/3-2/3=0 H
11
The Path Tree, PTG1 is an extension of EG1 1 2 3 4 PTG1 This is an older slide Shortest Path VkVh: Descend from Ek until you reach the 1st h. SP12? Graph diameter? diamk = maxhkSPkh. Diam1=max{SP12 SP13 SP14}=max{2,1,1}=2 DiamG1=maxkV(Diamk)=2 Diam2=max{SP21 SP23 SP24}=max{2,2,1}=2 Diam3=max{SP31 SP32 SP34}=max{1,2,1}=2 Diam4=max{SP41 SP42 SP43}=max{1,1,1}=1 The Edge pTree (predicate=NPZ) and the Path Tree of a graph 1? No 1,2 1,1 key 1,3 1,4 2,1 2,2 2,4 2,3 3,1 3,3 3,2 3,4 4,1 4,2 4,3 4,4 EG1 1 unilevel kPT3hij PT4hijk=Ek after zeroing i and j bits of Ek Extending E to PT? kEh PT2hk=Ek after zeroing the h bit of Ek kPT2hj PT3hjk=Ek after zeroing the j bit of Ek 1 2 3 4 EG1 multilevel stride=|V|=4 1? SP12=132 G1 1 2 3 4 SP in G2? SP72? SP15? G2 7 5 6 2 1 4 3 Dia1=2 Dia2=3 Dia3=3 Dia4=3 Dia5=3 Dia6=2 Dia7=3 DiaG2=3 1 2 3 4 6 5 7 PTG2 key 1,1 1,3 1,2 1,5 1,4 1,6 2,1 1,7 2,3 2,2 2,5 2,4 2,6 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 4,1 4,3 4,2 4,5 4,4 4,7 4,6 5,2 5,1 5,3 5,5 5,4 5,7 5,6 6,2 6,1 6,3 6,4 6,6 6,5 6,7 7,2 7,1 7,3 7,4 7,6 7,5 7,7 EG2 1 1? no 1? no 1? no SP15= y SP72 =7612
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.