Presentation is loading. Please wait.

Presentation is loading. Please wait.

In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g

Similar presentations


Presentation on theme: "In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g"— Presentation transcript:

1 In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g
In taking the inner product of 32 bitwidth Scalar pTreeSets (e.g., for Oblique or Hull Classification) we want line segments to be tight against the Training Class, but not too tight (because Training Classes are almost always only estimates of the actual classes).  I.e., we may want to leave room between the Training Class and the bordering line segment, because of the approximate-ness of the Training Classes We can do that as follows: For the segment on the Minimum side (the segment perpendicular to the unit vector, d, through minimum{d dot x | xTraining Class}, set the 24 LoBits to 0 (only the 8 HiBits are then used in inner product). This moves the bordering line segment away from that Training Class on that side. For the segment on the Maximum side set the 24 LoBits to 1 (Better yet, add 1 to Hi 8th bits, i.e., set the 24 LoBits=0, add 1 to the 8th HiBit (which is almost the same as setting the 24 LoBits to all 1s but gives a much faster inner product calculation). This moves the bordering line segment away from that Training Class on the other side. This approach is is a win-win: it places the line segments better for Classification, and it lowers inner product costs  (to 8 bit-width costs, instead of 32 bit-width costs????). The split of 32 into 24 and 8 could be varied and could depend on expected Training Set accuracy.  For accurate Training Sets, use, e.g., 12 HiBits (a very tight Hull), else use 4 HiBits only (a very loose Hull). When might we judge that the Training Set is very approximate? - When there are few Training points. Remember, for example, that the main criticism of most cancer prediction systems is that they are based on too few expert opinions or experimental cases because each is very expensive to obtain (i.e., we usually settle for just a few training points). While we’re at it, a new algorithm, Oblique-Hull, might only place hull segments when there is a gap between a pair of classes (separate segment pairs for each class pair). Continue to include new unit vectors until each class pair has been separated. The gap placement can use the 1st k HiBits value that produces a gap, k=1,2… (1st k HiBit value between the min inner product for one class and the max inner product for the other class). It seems like Mohammad’s 2’s complement procedure works that way anyway???? (proceeding one bit slice at a time from the high side? Or is it the low side?), so we can continually check for a gap and early exit as soon as one appears???? So for pair of Training Classes, we might use the unit vector between class means, then project the two classes using k HiBits only, k=1,2… (i.e., until a gap appears between the k HiBit min of the origin (of unit vector) class and the k HiBit max (plus 1) of the destination class …

2 cliqueTrees Stock BCTs I S Investor BCTs S I Stock EBCTs I S Inv EBGTs
Bipart G11: Inv(12345) rec Stk(ABCDE) Stock BCTs I S 1 4 2 3 5 B A C E D NPZpTr st=5 L=2 L=1 L=0 1 3 2 4 Investor BCTs S I 1 3 2 4 5 A C B D E G11 Stock EBCTs I S 1 2 4 3 5 B A C E D 1 2 A B C 3 H1 Stock EBCTs 1 B A C D E 2 3 4 5 EdgeMap EdgeTbl Adj Matrix Graph Traditional data structures 1 A B C D E 2 3 4 5 New DSs: NPZpT st=5 L=2 L=1 L=0 1 4 5 3 oa oa Stock EBCTs I S 1 4 2 3 5 A B D C E Inv EBGTs 1 3 2 5 4 B A C E D =C a MaxClique.Then 1 of must be a BC, say Expanding it gives C. Thus, for Bipartite Graphs, every MaxClique is an EBCT. 1 H1: On Day() I(123) recommend S(ABC) NPZ pTree (stride=3) L=3 L=2 L=1 L=0 1 2 3 Actually it is not true that 1 must be a BC, since there could be a different expansion for each of those 4, intersecting in C. In that case, we get each of those different expansions as an EBCT, but then the other operator will give us C (we will AND those expansions yielding the core leaf but OR the singletons giving the correct Part of C. =C a MaxClique.Then 1 of must be a BC, say Expanding it must give C. Thus, for Tripartite Graphs, every MaxClique is an EBCT. 1 1 2 A B C 3 DI StockBaseCliqueTrees D I S 1 2 A B C 3 DI StockBaseCliqueTrees D I S oaa aoa 1 2 3 A B C 1 2 3 A B C Thm: Every Maximal Clique is an Expanded Base Clique. I.e., C is a Maximal Clique iff C is an Expanded Base Clique I.e., MC(G)=EBC(G). Pf: Let be any MaxClique, C. Then some leaf expansion of each of … is a BCT. After we apply a..aoa..a with o in each but the last position, we will have an EBCT with the upper Parts of C and a leaf that covers the leaf of C. However, the leaf of that EBCT cannot strictly cover the leaf of C lest it be a MaxClique that strictly covers C. Thus, that EBCT=C 1 . aoa oaa 1 2 3 A B C 1 2 A B C 3

3 Vertical Graph Analytics using the Edge pTree (E) and the multi-Level PathPtree (PP)
PP(G), the Path Ptree of graph, G, (undirected unipartite graph - but most of this can be modified for directed and bipartite graphs also). We use se PP(G) to find diameter, shortest paths, communities (both degree and density based, including cliques, k-cores and k-plexes) and motifs. Some of these measurements and existence theorems are NP-complete or NP-hard. Many assume this means “They can’t be done!” That assumption is what we’re addressing. By modifying the basic data structure (from the traditional, ubiquitous, horizontal RECORD to the beautiful, vertical pTree) it becomes in harmony with modern computing hardware’s strengths and we can do important Big Data NP computations quickly. Notes: If one creates PP(G), lots of tasks become easy! We wil always use the new pop-count facility which produces 1-counts duirng ANDs/ORs for free (timewise). C is a clique iff all C level-1 counts are |VC|-1. In fact one can mine all cliques by analyzing counts. A k-plex is a maximal subgraph in which each vertex is adjacent to all other vertices of the subgraph except at most k of them. A k-core is a maximal subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. There is a whole hierarchy of cores of different order. k-plex existence: C is a k-plex iff vC|Cv|  |VC|2–k2 k-plex inheritance: Every induced subgraph of a k-plex is a k-plex. All max k-plexes: Use |Cv| vC k-core inheritance: If  cover by induced k-cores, G is k-core. k-core existence: C is a k-core iff vC, |VC|  k. All max k-cores: Use |Cv| vC Clique Existence: When is an induced SG a clique? Edge Count existence thm (EC): |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) SubGraph existence thm (SG): (VC,EC) is a k-clique iff every induced k-1 subgraph, (VD,ED) is a (k-1)-clique. A Clique Mining alg: finds all cliques in a graph. For Clique-Mining we can use an ARM-Apriori-like downward closure property: CLQkkCliqueSet, CCLQk+1Candidatek+1CliqueSet By SG, CCLQk+1= all s of CLQk-pairs having k-1 common vertices. Let CCCLQk+1 be a union of two k-cliques with k-1 common vertices. Let v,w be the kth vertices of the k-cliques, then CCLQk+1 iff (PE)(v,w)=1. (Just need to check a single bit in PE.) Int/Ext degree of v∈C, kvint/wxt=# edges v to wC/C’ Intra-cluster density  δint(C)=|edges(C,C)|/(nc(nc−1)/2) Inter-cluster density  δext(C)=|edges(C,C’)|/(nc(n-nc)) kvint Internal degree of C, kCint = vC External degree of C, kCext =vC kvext The proper tradeoff between large δint(C) and small δext(C) is goal of many community mining algorithms. A simple approach is to Maximize differences. Density Difference algorithm for Communities: δint(C)−δext(C) >Threshold? Degree Difference algorithm: kCint – kCext > Threshold? Easy to compute w pTrees, even for Big Graphs. Graphs are employed ubiquitously for complex data

4 The PathPtree for G1, PP(G1)
1 E2 E3 E4 Two-Level Stride=4, Edge pTrees E L1 U1 1 U2 U3 U4 L1 L0 Two-Level Str=4, Unique Edge pTrees 1 1112 1111 1113 1114 1121 1122 1124 1123 1131 1133 1132 1134 1141 1142 1143 1211 1144 1212 1213 1221 1214 1222 1224 1223 1231 1232 1233 1234 1242 1241 1243 1311 1244 1312 1314 1313 1321 1323 1322 1324 1331 1332 1333 1334 1341 1342 1343 1411 1344 1412 1414 1413 1421 1423 1422 1424 1431 1432 1433 1441 1434 1442 1444 1443 E3key 2113 2112 2111 2114 2121 2123 2122 2124 2131 2132 2133 2141 2134 2142 2144 2143 2211 2213 2212 2214 2222 2221 2223 2224 2231 2232 2234 2233 2241 2242 2244 2243 2311 2313 2312 2314 2322 2321 2323 2324 2331 2332 2334 2333 2341 2343 2342 2344 2411 2412 2413 2421 2414 2422 2423 2431 2424 2432 2434 2433 2441 2442 2443 2444 3113 3112 3111 3114 3121 3123 3122 3124 3131 3132 3133 3141 3134 3142 3144 3143 3211 3213 3212 3214 3222 3221 3223 3224 3231 3232 3234 3233 3241 3242 3244 3243 3311 3313 3312 3314 3322 3321 3323 3324 3331 3332 3334 3333 3341 3343 3342 3344 3411 3412 3413 3421 3414 3422 3423 3431 3424 3432 3434 3433 3441 3442 3443 3444 4113 4112 4111 4114 4121 4123 4122 4124 4131 4132 4133 4141 4134 4142 4144 4143 4211 4213 4212 4214 4222 4221 4223 4224 4231 4232 4234 4233 4241 4242 4244 4243 4311 4313 4312 4314 4322 4321 4323 4324 4331 4332 4334 4333 4341 4343 4342 4344 4411 4412 4413 4421 4414 4422 4423 4431 4424 4432 4434 4433 4441 4442 4443 4444 E3 3 4 11 12 13 14 2 111 112 113 114 121 122 123 124 131 132 133 134 141 142 143 144 h=1 j=4 k=3 E3143=E3&M’4 M’4 1 E3 143 1 4 3 2 V2 V1 1 2 3 4 1,3 1,2 1,1 1,4_ 2,1 2,2 2,3 3,1 2,4_ 3,2 3,4_ 3,3 4,1 4,2 4,3 4,4 V1 V2 Edges E 1_ 1 pTree Mask Edge U 1 1_ Unique Edge Mask 1 M1 M2 M3 M4 Vertex Masks L0 h=1 j=4 k=2 E3142=E2&M’4 M’4 1 E2 0 pure0 E3 142 h=2 j=4 ListE224={1,3} k=1 E3241=E1&M’4 M’4 1 E1 E3 241 Graph Path: a Sequence of edges connecting a sequence of vertices which are distinct from each other except for the endpts ( other defs?). 1Lev EE 1 111 112 114 113 121 122 124 123 131 133 132 134 141 142 143 211 144 212 214 213 221 222 223 224 232 231 233 241 234 242 243 311 244 312 314 313 321 322 323 324 332 331 333 341 334 342 344 343 411 412 413 414 422 421 423 424 432 431 433 441 434 442 444 443 E2key v1v2v3 2paths = E2, 3paths = E3, etc. h=2 j=4 k=3 E3243=E3&M’4 M’4 1 E3 243 Str=16 EE1 2Level 1 EE2 EE3 EE4 Str=4 3level EE11 EE12 EE13 1 EE14 EE21 EE22 EE23 EE24 EE31 EE32 EE33 EE34 EE41 EE42 EE43 EE44 kListEh, E2hk=Ek&M’h (other k, E2hk=0) For h=1, ListE1={3,4} h=3 j=1 k=4 E3314=E4&M’1 E4 1 M’1 E3 314 1 E3& M’1= EE13 For h=1 k=3: EE13=E3&M’1 h=3 j=4 k=1 E3341=E1&M’4 E1 1 M’4 E3 341 For h=1 k=4: EE14=E4&M’1 E4 1 M’1= EE14 For h=2, ListE2={4} h=3 j=4 k=2 E3342=E2&M’4 E2 1 M’4 E3 342 E4 1 M’2 EE24 For h=2 k=4: EE24=E4&M’2 h=4 j=1 k=3 E3413=E3&M’1 E3 1 M’1 413 For h=3, ListE3={1,4} For h=3 k=1: EE31=E1&M’3 E1& 1 M’3= EE31 kListE2hj, E3hjk=Ek&M’j. h=1 j=3 ListE213={4} k=4 E3134=E4&M’3 M’3 1 E4 E3 134 h=4 j=3 k=1 E3431=E1&M’3 E1 1 M’3 E3 431 E4& 1 M’3= EE34 For h=3 k=4: EE34=E4&M’3 kListE3hij, E4hijk = Ek & M’j & M’i ListE3134={1,2} h=1 i=3 j=4 k=2 M’3 1 E2 E41342 M’4 For h=4, ListE4={1,2,3} E1& 1 M’4= EE41 ListE3143={1} Level=3 (So E2 is the upper 3 levels of E3) 1 For h=4 k=1: EE41=E1&M’4 ListE3241={3} h=2 i=4 j=1 k=3 M’1 1 E3 E42413 M’4 E2& 1 M’4= EE42 0 pure0 For h=4 k=2: EE42=E2&M’4 Level=2 (These are exactly the Level=1 of E2) 1 L23 2 3 4 ListE3243={1} h=2 i=4 j=3 k=1 M’3 1 E1 E42431 M’4 For h=4 k=3: EE43=E3&M’4 ListE3341={3} Level=1=just E1,E2,E3,E4 with pure0 bits turned off. E1 1 E2 E3 E4 0 bit turned off E3& 1 M’4= EE43 ListE3314={2,3} h=3 i=1 j=4 k=2 1 M’1 E2 E43142 M’4 ListE3413={4} Level=1 1 L13 13 14 24 31 34 41 43 (These are exactly the Level=0’s of E2) ListE3431={4} EE=E2: 3 Level Stri=4 pTrees for Path Len=2 (2edges, 3vertices, unique except for endpts) There are no 5vertex (4edge) paths. Creation stops. The Stride=|V|, Levels=Diam PathPtree (PP): E E2 E3 : Elongest_path Level=2 1 Level=0 (We just computed these) 1 E3 134 143 241 243 314 341 413 431 Level=0 EE13 1 EE14 EE24 EE31 EE34 EE41 EE43

5 Apply to more Graphs. Revising PP when edges are added.
Use PP to get other pTrees for Shortest Paths, Diameter, Unique Paths, Cycles, Unique Cycles, Unique Acyclic Cycles Apply to more Graphs. Revising PP when edges are added. G11 1 2 3 4 Add path (12). Have to rebuild entire PP? no DIAMETER(G1)? PP(G1) SP(G1)? Unique PathPtree (UPP): Top-bottom, left-right, eliminate paths ending w the starting vertex after all pTrees with that starter have been included. G1 1 2 3 4 PP(G1) Diam1=max{fo12 fo13 fo14}. What is the first occurrence 12, fo12? The PPdepth from E1 where 2 first appears is depth= 2 = fo12 So Diam1=max{ }= 2 1 2 3 4 1 2 1 3 1 3 1 4 1 4 1 4 1 4 No 4 3 1 CP UCP 1 3 4 2 1 3 1 4 2 4 1 2 4 1 3 1 3 4 1 3 4 1 3 4 4 1 4 1 4 3 4 3 1 1 3 4 So SP1,2=132 kListE3hij E4hijk=Ek kill i,j kListEh, E2hk=Ek kill h kListE2hj, E3hjk=Ek kill j 4 3 1 2 4 3 1 2 4 1 1 4 3 1 4 2 3 4 2 2 3 4 1 4 1 3 3 1 4 1 4 3 3 4 2 3 1 4 1 3 4 4 3 1 Cycles List (CL): CP cycles only. includes all Nonredundant Cycles. 1 3 4 clockwise counterclockwise Diamk=minhkPathLen(h,k). k, record the Ek depth level of the 1st occurrence of vertex h, hk DiamG=maxkGDiamk. Eliminate 31 and 41 Eliminate 42 Eliminate Let’s see what happens if we add an edge to G2 1 2 1 Can UCP be used to mine all cliques? since a cliques must be cycles at each level. (In this example, there is only 1 level to check since there can be no 2cycles (2edges 3vertices.). PP Clique Mine Algs: Every clique is made up entirely of cycles at all levels. Every 3cycle is clique  4cycle, abcda check edges ac bd. Also check each acyclic 4path abcde for edges ac ad ae bd de ce. If any missing, eliminate branch, else look for acyclic 4path, 5paths … in its pTree branch, etc. Extend to Path pTree k-plex (k-core) mining algorithm? 3 1 G3 1 2 4 3 6 7 5 1 2 3 4 5 6 7 1 3 1 4 2 4 1 3 1 Diam2=max{fo21 fo23 fo24}=max{2 2 1}= 2 Diam3=max{fo31 fo32 fo34}=max{1 2 1}= 2 PP update alg? Copy 2paths, add new ones… 1 3 4 1 4 3 2 4 1 3 1 4 3 4 1 4 1 3 Diam4=max{fo41 fo42 fo43}=max{1 1 1}= 1 1 3 4 6 2 5 7 5 7 1 7 5 1 DiamG1=maxkV(Diamk) = 2 To find a Shortest Path from h to k? (path of length minhkPathLengthh,k), go down PP from Ek until h first appears Eg, SP(1,2)? Diam1=max{111212}=2 Diam2=3 Diam3=3 Diam4=3 Diam5=3 Diam6=2 Diam7=3 So, DiamG2=maxkV(Diamk)=3 3 2 1 4 6 5 7 7 5 6 1 7 6 5 1 G PP(G2) 1 2 4 3 6 7 5 SP in G2? SP(7,2)? SP(1,5)? no no 1 3 2 4 6 5 7 2 1 5 6 7 3 1 5 6 7 4 1 5 6 7 1 3 4 6 2 5 7 no y, SP15=165. 2 3 6 1 5 7 4 1 2 3 4 6 5 7 y, SP72 =7612 Next, complete added levels. (5paths exist now whereas they didn’t before. Also 6paths) 3 2 1 4 5 6 7 1 2 4 3 6 5 7

6 Retaining the Shortest Path So Far structure?
SPT(G)k (with k turned on) is a mask (where >0 means “yes”) for connectivity comp, COMP(G)k, containing the vertex, vk. For a bitmap of COMPk bit-slicing SPT (SPTk,h ... SPTk,0 k=1…|V|), then COMPk  ORj=h..0SPTk,h. The SPT structure may be more useful expressed as separate “categorical” bitmaps for each Shortest Path Length (SPk,h h=1..H. We keep a mask of Shortest Paths so far, SPSFk  vertex, k. With each new SP bitmap, SPB, SPSFkSPSFk | SPB and SPk,h+1  SPB & SPSFk. G1 1 2 3 4 PPT 1 2 3 4 1,2 1,1 key 1,3 1,4 2,1 2,2 2,4 2,3 3,1 3,3 3,2 3,4 4,1 4,2 4,3 4,4 EG1 1 E one-level 1 2 3 4 E 2Lev Str=4 1 3 1 4 2 4 1 3 1 3 4 1 4 1 4 3 1 SPT gives Connectivity Partition. For Maximal Cliques (go across SPk,1 look in subsets of those k’s for commonality); Cliques are 0-plexes. Each SPk,1 masks a 1plex. Each SPk,1&SPk,2 masks a 2-plex (=SPSFk,2?) So if we save each SPSF instead of overwriting, we will have the k-plex masks without any further work??), etc. 4 3 1 3 4 1 2 4 1 4 2 3 1 4 1 3 3 4 1 4 1 3 1 3 4 APPT 1 2 3 4 SPTG1 (initially E) 1 SPT 2 3 4 1 3 1 4 2 4 1 3 1 3 4 1 4 1 4 3 1 1 2 2 1 3 1 2 1 2 For Big Graphs, could stop here (e.g., Friends has ~1B vertices but a diameter of 4, so would only need to build PT 4-hop paths) and possible expressed as a tree of lists rather than a tree of bitmaps. Also, for sparse BigGraphs, E could be leveled further. 3 2 1 1 3 4 2 4 1 4 2 3 1 4 1 3 All are 3 hop cycles. Each has 3 start pts and 2 directions. Each repeats 6 times. 6/6=1 3hop cycles (1341) SPTG1, initially E1=SP1,1=SPSF1 E2=SP2,1=SPSF2 E3=SP3,1=SPSF3 E4=SP4,1=SPSF4 1 2 3 4 SPSFk 2 1 CLG1 1341 1431 3143 3413 4134 4314

7 Form UPP(G2) from PP(G2):
More PP, UPP, SPT, CL, UCL… G PP(G2) 1 2 4 3 6 7 5 1 2 1 2 1 3 1 3 1 3 1 4 1 4 1 4 4 1 5 1 6 1 6 1 6 1 7 7 1 Form UPP(G2) from PP(G2): Top-bottom, Left-right: After all pTrees with a given start vertex have been included, eliminate paths ending with that start vertex. 1 3 4 6 2 2 1 2 3 1 2 3 1 2 4 1 2 4 1 3 1 3 1 3 2 1 3 2 1 3 4 1 3 4 3 4 1 4 1 4 1 4 1 4 2 1 4 2 4 2 1 4 3 1 4 3 4 3 1 5 6 1 5 6 1 6 1 6 1 6 1 6 1 7 6 1 7 6 1 7 6 3 2 1 4 2 1 1 3 2 1 3 4 1 4 2 3 4 1 3 1 2 4 1 2 6 1 2 1 3 2 2 4 3 1 2 4 3 1 1 4 2 2 3 4 1 2 3 4 1 2 1 3 4 1 3 4 1 3 6 1 3 3 2 1 2 3 4 1 2 3 4 1 1 4 3 3 4 1 4 3 2 1 4 3 2 1 4 1 2 4 1 2 4 1 3 4 1 3 4 1 6 4 2 1 4 2 1 4 3 2 1 2 4 3 1 4 3 1 4 3 1 4 2 3 1 3 4 2 1 5 6 1 5 6 1 5 6 1 5 6 1 6 1 2 6 1 2 2 1 6 6 1 3 6 1 3 3 1 6 6 1 4 6 1 4 6 1 4 7 6 1 7 6 1 7 6 1 7 6 1 1 2 1 3 1 5 1 6 1 1 3 4 6 2 2 1 2 3 1 2 3 1 2 4 1 2 4 1 3 1 3 2 1 4 1 5 6 1 Unique PathPtree for G2, UPP(G2) 1 2 3 4 2 4 3 1 6 3 1 6 2 4 2 1 4 3 6 Cycles List CL(G2) (only 3,4cycles.  no 1,2 cycles) 3 2 1 4 UCL(G2) Same sequence diff start/dir), same cycle 1 2 3 4 ACPP(G2) 1 2 3 4 6 5 7 3 2 1 4 6 5 UACPP(G2) Remove path reverses

8 SG Clique Mining 1,2 1,1 key 1,3 1,5 1,4 1,7 1,6 2,2 2,1 2,3 2,5 2,4 2,7 2,6 3,2 3,1 3,4 3,3 3,5 3,7 3,6 4,2 4,1 4,4 4,3 4,7 4,6 4,5 5,2 5,1 5,4 5,3 5,6 5,5 5,7 6,2 6,1 6,4 6,3 6,6 6,5 7,1 6,7 7,2 7,4 7,3 7,6 7,5 7,7 PE 1 2 4 3 6 G3 7 5 K=2: 2Cliques (2 vertices): Find endpts of each edges (Int((n-1)/7)+1, Mod(n-1,7) +1) 1 2 4 3 6 G2 7 5 key 1,1 1,3 1,2 1,5 1,4 1,6 2,1 1,7 2,3 2,2 2,5 2,4 2,6 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 4,1 4,3 4,2 4,5 4,4 4,7 4,6 5,2 5,1 5,3 5,5 5,4 5,7 5,6 6,2 6,1 6,3 6,4 6,6 6,5 6,7 7,2 7,1 7,3 7,4 7,6 7,5 7,7 E 1 EU 1 1 2 4 3 6 5 8 7 10 9 20 30 40 C 1 CU 1 6 k=3: k=4: 1234 ( are cliques) 123,134  ,134 , 234 ,2341234. 1234 only 4-clique Using the EdgeCount thm: on C={1,2,3,4}, CU=C&EU C is a clique since ct(CU)=comb(4, 2)=4!/2!2!=6 have 124CS3 PE(1,4)=1 134CS3 PE(2,3)=1 234CS3 Have 123CS3 Have k=2: E= already have 567 PE(2,3)=1 So 123CS3 PE(2,4)=1 124CS3 PE(2,6)=0 PE(6,7)=1 567CS3 PE(1,7)=0 PE(1,5)=0 PE(2,4)=1 1234CS4 Have 1234 k=3: EC, requires counting 1’s in mask pTree of each Subgraph (or candidate Clique, if take the time to generate the CCSs – but then clearly the fastest way to finish up is simply to lookup the single bit position in E, i.e., use EC). EdgeCount Algorithm (EC): |PUC| = (k+1)!/(k-1)!2! then CCCS The SG alg only needs Edge Mask pTree, E, and a fast way to find those pairs of subgraphs in CSk that share k-1 vertices (then check E to see if the two different kth vertices are an edge in G. Again this is a standard part of the Apriori ARM algorithm and has therefore been optimized and engineered ad infinitum!) PE(2,3)=1 234CS3 key 1,2 1,1 1,4 1,3 1,5 1,7 1,6 2,1 1,8 2,3 2,2 2,4 2,5 2,8 2,7 2,6 3,2 3,1 3,4 3,3 3,5 3,6 3,8 3,7 4,1 4,3 4,2 4,5 4,4 4,6 4,8 4,7 5,2 5,1 5,4 5,3 5,5 5,6 6,1 5,8 5,7 6,3 6,2 6,5 6,4 6,6 6,7 7,1 6,8 7,2 7,4 7,3 7,6 7,5 7,7 8.1 7,8 8,3 8,2 8,5 8,4 8,6 8,7 8.8 E 1 k=3: 2 4 3 6 G4 7 5 8 PE(1,4)=1 134CS3 Have PE(4,8)=1 248CS3 PE(4,8)=1 348CS3 PE(4,8)=1 12348CS5 have have k=2: k=4: PE(2,3)=1 123CS3 PE(2,4)=1 124CS3 PE(2,8)=1 128CS3 PE(2,6)=0 PE(3,8)=1 138CS3 PE(4,8)=1 148CS3 PE(1,7)=0 PE(1,5)=0 PE(6,8)=0 PE(3,8)=1 238CS3 have PE(6,7)=1 567CS3 have k=5: = CS5. PE(3,8)=1 1238CS4 PE(4,8)=1 1248CS4 PE(3,8)=1 1348CS4 Have PE(2,4)=1 1234CS4 PE(4,8)=1 2348CS4

9 The EdgepTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of a graph, G5 PTG5 PT Clique Miner Algorithm A clique is all cycles Extend to a k-plex (k-core) mining alg? PT(=APT+CL), SPT are powerful datamining tools with closure properties (to eliminate branches) . 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 EG5 2-level str=8 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 PTG5 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 1 2 3 4 5 6 8 7 EG5 2-level str=8 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 1 2 1 5 1 7 2 1 3 6 1 3 8 1 4 2 1 5 1 5 7 1 6 3 1 6 8 1 7 1 7 5 1 8 3 1 8 6 1 k-plex (missingk edges) mine alg? k-core (has  k edges) mining alg? 1 2 1 5 1 7 2 1 3 6 1 3 8 1 4 2 1 5 1 5 7 1 6 3 1 6 8 1 7 1 7 5 1 8 3 1 8 6 1 k-plex (missingk edges) mine alg? k-core (has  k edges) mining alg? 1 5 7 1 7 5 5 1 2 7 1 2 3 8 6 1 3 6 8 1 1 2 4 Density (internal edge density >>external|avg) mining alg? Degree (internal vertex degree >> external|avg) mining alg? 5 1 2 5 1 7 5 7 1 3 6 8 1 8 6 3 1 7 1 2 7 1 5 1 5 7 8 6 3 1 8 3 6 1 1 5 7 1 7 5 2 1 5 2 1 7 6 3 8 1 8 3 6 1 Density (internal edge density >>external|avg) mining alg? Degree (internal vertex degree >> external|avg) mining alg? 4 2 1 2 1 5 7 1 5 1 7 5 3 6 8 1 8 6 3 1 7 1 2 7 1 5 1 5 7 3 8 6 1 8 3 6 1 1 2 3 4 5 6 8 7 4 2 5 1 4 2 7 1 7 5 2 1 Max clique Mining A kCycle is a kClique iff it’s found in CLk as PERM(k-1,k-1)/2=(k-1)!/2 kCycles (e.g., vertices repeated in CL for 3cycles, 2!/2=1; 4cycles, 3!/2=3; 5cycles, 4!/2=12; 6cycles, 5!/2=60. 4 2 5 1 4 2 7 1 7 5 2 1 Downward closure: Once, a 4cycle is established as a 4clique (by the fact that {1,2,3,4} occurs 3!/2=3 times in CL), all 3vertex subsets are 3cliques {1,2,3},{1,2,4},{1,3,4}, so no need to check further. APTG5 CLG5 1571 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 1751 3683 APTG5 CLG5 1571 DiamG5 is max{Diamk} = max{ }=3. Connect comp containing V1, COMP1={1,2,4,5,7}. 1st vertexCOMP1,3, COMP3 ={3,6,8}. Done. Partition={ {1,2,4,5,7}, {3,6,8} }. To pick the first vertexCOMP1, mask off COMP1 with SPTv1’, pick 1st vertex in this complement. 3863 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 1751 5175 3683 1 2 1 5 1 7 2 1 3 6 1 3 8 1 4 2 1 5 1 5 7 1 6 3 1 6 8 1 7 1 7 5 1 8 3 1 8 6 1 5715 3863 6386 5175 6836 1 2 1 5 1 7 2 1 3 6 1 3 8 1 4 2 1 5 1 5 7 1 6 3 1 6 8 1 7 1 7 5 1 8 3 1 8 6 1 5715 7157 6386 7517 6836 2 1 5 2 1 7 4 2 1 5 1 2 2 1 7 7 5 1 8368 7157 8638 7517 2 1 5 2 1 7 4 2 1 5 1 2 2 1 7 1 5 7 8368 SPTG5 8638 1 1 2 2 1 2 1 3 1 4 2 1 3 4 2 1 4 1 5 1 2 3 5 1 2 5 1 6 1 7 1 7 1 2 7 1 2 3 8 1 4 1 2 5 4 1 2 7 7 1 5 2 SPTG5 1 1 2 2 1 2 1 3 1 4 2 1 4 2 1 3 4 1 5 1 5 1 2 5 1 2 3 6 1 7 1 2 7 1 7 1 2 3 8 1 4 2 5 1 4 2 7 1 7 5 2 1 DiamG5 is max{Diamk} = max{ 2,2,1,3,2,1,3,1}=3. Connected comp containing V1, COMP1={1,2,4,5,7}. Pick 1st vertex not in COMP1,3, COMP3 ={3,6,8}. Done. The partition is { {1,2,4,5,7}, {3,6,8} }. To pick the first vertex not in COMP1, mask off COMP1 with SPTv1’ and then pick the first vertex in this complement.

10 cycles in blue (not in APT)
SP1 SP1&2 G6 1 2 4 3 6 7 5 8 9 a b c d e f g 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 d 1 e f 1 g 1 1 2 4 3 5 6 8 7 9 b a c f e d g 1 2 3 4 5 6 7 8 9 a b c d e f g The EdgepTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of G5 4 1 E=A1Ps 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 d 1 e f 1 g 1 1 2 4 3 5 7 6 8 b a 9 c d e g f SP2 SP1&2&3 1 1 2 2 1 3 1 3 4 1 4 5 1 5 6 1 6 7 7 1 8 1 8 9 1 9 a a 1 b 1 b c d e f g 1 2 4 3 6 5 8 7 9 a c b d f e g 1 2 3 4 5 6 7 8 9 a b c d e f g cycles in blue (not in APT) A2Ps 1 3 2 4 5 7 6 8 a 9 c b e d g f 1 3 1 6 2 4 1 3 1 3 4 1 4 3 1 5 6 1 5 7 1 6 1 6 5 1 6 7 1 7 5 1 7 6 1 8 4 1 9 c 1 A c 1 b c 1 D f 1 D g 1 F d 1 F g 1 G d 1 G f 1 SP3 SP1&2&3&4 1 1 2 1 2 3 3 1 4 4 1 5 5 1 6 6 1 7 7 1 8 8 1 9 a b c d e f g 1 2 4 3 6 5 8 7 9 a c b d f e g 1 2 3 4 5 6 7 8 9 a b c d e f g 1 3 4 5 6 1 7 6 1 2 3 4 1 6 1 3 1 3 4 5 6 1 6 5 7 1 7 5 6 1 5 6 7 1 3 1 6 6 7 5 1 6 5 7 1 7 6 5 1 5 7 6 1 7 6 1 7 5 6 1 4 8 3 1 D g F 1 G D f 1 F g D 1 F d G 1 G f D 1 F G d 1 A3Ps 1 2 4 3 6 5 8 7 9 b a c d f e g SP4 SP1&2&3&4&5 COMPLETE 1 2 1 2 3 4 1 4 5 5 1 6 1 6 7 7 1 8 8 1 9 a b c d e f g 1 3 2 4 5 7 6 8 a 9 c b e d g f 1 2 3 4 5 6 7 8 9 a b c d e f g A4Ps 1 2 4 3 6 5 8 7 9 a c b d f e g A5Ps 1 3 2 4 6 5 8 7 a 9 c b d f e g A6Ps 1 2 4 3 5 6 8 7 9 b a c d f e g SP5 2 3 4 1 3 6 1 5 3 6 1 7 4 1 3 6 5 1 6 3 5 6 7 1 6 3 1 4 7 6 5 1 7 1 6 3 8 1 3 4 2 3 4 6 1 4 3 6 1 5 4 1 3 7 6 5 6 3 1 4 5 7 1 6 3 7 1 6 5 3 7 6 3 1 4 8 3 4 6 1 4 2 1 3 5 6 2 3 4 6 1 7 5 7 3 1 6 4 7 5 3 1 6 4 8 4 6 1 3 5 4 8 1 3 7 6 SP6 1 2 1 2 3 4 5 5 1 6 7 7 1 8 1 8 9 a b c d e f g 1 3 2 4 7 6 5 8 a 9 c b e d g f 1 2 3 4 5 6 7 8 9 a b c d e f g 1 3 2 4 7 6 5 8 a 9 c b e d g f

11 EdgepTree(E), PathTree(PT) ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of G6
1 2 3 4 5 6 7 8 9 a b c 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 SP1 1 2 3 4 5 6 7 8 9 a b c SPT: Shortest Path Tree 1 2 3 4 5 6 7 8 9 a b c 1 4 2 3 5 6 7 c 9 b a 8 G6 1 2 3 4 2 3 1 2 4 1 3 1 3 2 1 3 c 1 4 1 4 2 1 4 7 1 5 6 1 5 7 1 6 5 1 6 7 1 6 8 1 7 4 1 7 5 1 7 6 1 8 6 1 8 9 1 8 a 1 9 8 1 9 a 1 9 b 1 9 c 1 A 8 1 A 9 1 A b 1 A c 1 B 9 1 B a 1 B c 1 C 3 1 C 9 1 C a 1 C b 1 1 1 2 3 4 5 6 7 8 9 a b 2 1 3 1 4 1 5 1 6 1 7 1 8 1 PT2 1 2 3 4 5 6 7 8 9 a b c 1 2 3 4 5 6 7 8 9 a b c SPT2 1 4 2 3 1 2 4 1 3 1 3 c 1 4 7 1 5 6 1 6 8 1 7 6 1 8 9 1 1 2 3 1 2 4 1 3 2 1 3 c 1 4 2 1 4 7 2 3 1 3 2 c 1 2 4 1 4 2 7 1 3 1 2 3 1 4 C 3 9 1 C 3 a 1 C 3 b 1 4 1 2 4 1 3 4 2 1 2 4 3 1 7 4 5 1 7 4 6 1 6 5 7 1 6 5 8 1 7 5 4 1 7 5 6 1 5 6 7 1 7 6 4 1 7 6 5 1 8 6 9 1 8 6 a 1 7 4 1 4 7 2 1 5 7 6 1 6 7 5 1 6 7 8 1 SP3 1 2 3 4 5 6 7 8 9 a b c 1 2 1 3 1 3 1 4 1 5 1 6 1 7 1 PT3 1 2 3 4 5 6 7 8 9 a b c 1 2 3 4 5 6 7 8 9 a b c SPT3 1 3 c 1 4 7 2 3 c 1 2 4 7 1 3 1 4 3 C 9 1 4 1 3 4 7 6 1 5 6 8 1 6 8 9 1 7 6 8 1 SP4 1 2 3 4 5 6 7 8 9 a b c 1 2 1 3 1 4 1 5 5 1 7 7 1 3 1 C 9 3 2 C 9 1 1 3 4 7 1 4 3 c 6 5 8 9 1 6 7 8 9 1 PT4 1 2 3 4 5 6 7 8 9 a b c 1 2 3 4 5 6 7 8 9 a b c SPT4 1 3 C 9 2 3 C 9 1 3 1 4 7 4 1 3 c 5 6 8 9 1 7 6 8 9 1 SP 1 2 3 4 5 6 7 8 9 a b c 1 2 1 3 2 1 3 2 4 2 1 3 4 2 1 2 1 3 3 2 1 3 2 1 3 2 1 3 2 4 1 4 2 1 4 2 1 3 4 2 1 3 5 1 2 3 5 1 2 5 1 2 3 4 6 1 2 6 1 2 3 7 2 7 2 3 4 7 2 3 8 1 2 CycleList 1231 1241 5675

12 1 2 3 4 2 1 3 4 3 1 2 C 4 1 2 7 5 6 7 6 5 7 8 7 4 5 6 8 6 9 A 8 9 A B C 8 A 9 B C B 9 A C 3 C 9 A B E1=SP1=PT1 1 3 2 1 3 3 1 4 1 3 5 1 2 6 1 3 7 1 3 8 1 3 9 1 4 a 1 4 b 1 3 c 1 4 SP1 1 2 3 4 5 6 7 8 9 a b c 38 2 1 3 4 C 1 1 2 3 4 5 6 7 8 9 a b c SP2 1 2 3 4 5 6 7 8 9 a b c 1 1 2 3 4 5 6 7 8 9 a b c SP3 1 2 3 4 5 6 7 8 9 a b c 1 2 4 3 5 6 7 8 9 a b c G6

13 1 SP1 =1deg 1 2 3 4 5 6 7 8 9 SP2 =2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 SP4 =4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 SP3 =3dg G7 ver g9a bg 1dg 9djgdcdhojepepff3fgqfggf66dklfkqb6 2dg 8b4b888c3b889366c nn678581aa 3dg a a dg dg 17 is an outlier. Try clustering by SPdeg from 17. The SPk17 pTrees mask the clustering (next slide) EdgepTree(E), PathTree(PT) ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of G7 BASE

14 Shortest Path Trees Construction
(We don’t need the Path Trees to get the Shortest Path Trees! That’s because a subpath of a shortest path is a shortest path.) S1P=E SPSF11 SPSF1’1 SPSF12 SPSF1’2 SPSF13 SPSF1’3 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 1 1 2 1 2 1 3 1 3 1 S2P1=SPSF1’1&(ORjS1P1Ej ) S2P2=SPSF1’2&(ORjS1P2Ej ) S2P3=SPSF1’3&(ORjS1P3Ej ) SPSF21 SPSF2’1 1 2 1 3 1 4 1 2 1 1 3 1 4 1 S2P SPSF23 SPSF2’3 3 1 1 2 1 c 1 1 2 1 3 1 4 1 5 6 7 8 9 a b c 1 1 from here on. Identical to 1 3 1 3 1 S3P1=SPSF2’1&(ORjS2P1Ej ) S3P3=SPSF2’3&(ORjS2P3Ej ) S3P SPSF31 SPSF3’1 1 7 1 c 1 1 4 2 3 5 6 7 c 9 b a 8 G6 SPSF33 SPSF3’3 3 1 4 1 9 1 a 1 b 1 1 2 1 3 1 1 1 1 3 1 3 1 S4P1=SPSF3’1&(ORjS3P1Ej ) What is the cost of creating the SPs? vV, there are ~Avg{Diam(v)vV} steps, each costs 1 complement of SPSF (cost =compl), OR of ~Avg|Ek| pTrees (cost=OrAvg|Ek| 1 SPSF & above_OR_result (cost=AND), 1 OR to update SPSF (cost=OR) Cost= |V|*AvgDiam*(compl+OR*AD+AND+OR), so O(|V|). I.e., linear in # of vertices, assuming AD=AvgDeg is small. This is a one-time, parallelizable construction over the vertices. For Friends, it is B*4*(3*pTOP+AD*pTOP)=4B*(3+AD)pTOP=B*pTOP*(12+4AD), where pTOP is the cost of a pTree Operation (comp, &, OR) and B=billion). Parallelized over an n node cluster, this 1-time Shortest Path Tree construction cost would be B*pTOP*(12+4AvgDeg) / n. The SnP’s capture only the shortest path lengths between all pairs of vertices. We could (have) capture actual shortest paths (all shortest paths?, all paths in PTs?), since we construct (but do not retain) that info along the way. How to structure it/index it?/residualize it? S4P3=SPSF3’3&(ORjS3P3Ej ) S4P SPSF41 SPSF4’1 1 5 1 6 1 9 1 a 1 b 1 SPSF43 SPSF4’3 3 1 7 1 8 1 1 2 1 3 1 1 Done with Vertex 1 Shortest Paths. Diam(1)=4 Done with Vertex 3 Shortest Paths. Vertices 4-c SPs done the same way SPSF1i = S1Pi OR Mi , Mi has 1 only at i SPSF(k+1)i = SPSFki OR S(k+1)Pi S(k+1)Pi=SPSFk’i&(ORjSkPj Ej ) “The mask pTree of the shortest k+1 path starting at vertex i is the Shortest Paths So Far Complement ANDed with the OR of ith edge pTrees over all ithe Shortest k Path List”

15 17 is an outlier. Try clustering by SPdeg from 17
17 is an outlier. Try clustering by SPdeg from 17. The SPk17 pTrees mask the clustering. 1 2 3 4 5 6 7 8 9 SPdegk(17) 1 SPdeg=1: 6 7 2 1 SPdeg=2: 3 1 SPdeg=3: 4 1 SPdeg=4: 5 1 SPdeg=5: G7 Now we would want to make this divisive and recursive. The maroon cluster could be broken apart into white and blue. Then one could use DegreeDifference within clusters to trade vertices among clustes to improve the DegDif quality measure. Maybe an agglomerative or divisive approach using SPdeg? Agglomerate two pieces together iff the SPdegdif is improved (or still exceeds a threshold?)? One could use Genetic Algorithm Hill Climbing to optimize clustering based on GAs applied to the SPdeg arrays. The bottom line is that there is a wealth of value in ShortestPathDegrees. One can easily mask subsets and recalculate SPdeg.

16 1 SP1 =1deg 1 2 3 4 5 6 7 8 9 SP2 =2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 SP4 =4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 SP3 =3dg G7 1 and 34 have highest SP1deg (most siblings) at 16. Start with clusters, S(1), S(34) of siblings. Break ties with DegreeDiffs defined below. intdegS(x)=#edges from x to S-vertices. extdegS(x)=#edges from x to S’-vertices. DegDifS(x)=indegS(x)-extdegS(x) (or intdegS(x)/1+extdegS(x)? Start with S (and T,U,… if there are ties) =siblings of x of highest SP1degree. So for G7, S=Sibl(1) and T=Sibl(34). Add y(S’-T) to S iff DegDifS(y)>thresh1 and subract zS from S iff DegDif(z)<thesh2.

17 K-plex Search on G6: A k-plex is a Subgraph missing  k edges
K-plex Search on G6: A k-plex is a Subgraph missing  k edges. All subgraphs will be induced subgraphs defined by their vertex set. Subgraph S has |ES|=s edges, |VS|=v vertices. S is a kplex iff C(v,2) – s = v(v-1)/2-s  k If S is a kplex, S’ adds 1 vertex, x to S, (V(S’)=V(S)!{x}) then S’ a kplex iff (v+1)v/2 – (deg(x,S’)+s)  k. 1 4 2 3 5 6 7 c 9 b a 8 G6 Edges are 1-plexes. |E{123}| = |PE123| = 3 so 123 is a 0plex(clique) and a 1plex |E{124}| = |PE124| = 3 so 124 is a 0plex (clique) If H is an ISG, |VH|=h, |EH|=H, H=h(h-1)/2 then H is a kplex iff H – H  k.. If H is a kplex and F is an ISG of H, then F is a kplex (if F is missing an edge than H is missing that edge also, since K inherits all H edges involving its vertices. F cannot be missing more edges than H.) If G isn’t a kplex, F1 an ISG of G with a vertex of least degree removed. If F1 isn’t a kplex, F2 ISG with a vertex of least degree removed, etc. until we find Fj to be a kplex. Remove Fj Repeat until all vertexes removed. We did a k-plex search of G6 by simple calculating edge counts (which are simply 1-counts of ANDed pTrees) using only SP1=E. 1 3 2 4 5 6 7 8 9 a c b SP1=E G=12*11/2=66. G= G is a kplex for k  H1=ISG{ abc} (deg5=2). H1=11*10/2=55, H1=17. H1 is a kplex for k  37. H2=ISG{ abc} (deg6=2). H2=10*9/2=45, H2=15. H2 is a kplex for k  30. H3=ISG{123489abc} (deg7=1). H3=9*8/2=36, H3=14. H3 is a kplex for k  22. H4=ISG{12389abc} (deg4=2). H4=8*7/2=28, H4=12. H4 is a kplex for k  16. 1 2 3 4 5 6 7 8 9 a c b SP2 H5=ISG{1239abc} (deg8=2). H5=7*6/2=21, H5=10. H5 is a kplex for k  11. H6=ISG{239abc} (deg1=2). H6=6*5/2=15, H6= H6 is a kplex for k  7. H7=ISG{39abc} (deg2=1). H7=5*4/2=10, H7= H7 is a kplex for k  3. H8=ISG{9abc} (deg3=1). H8=4*3/2=6, H8= H8 is a kplex for k  So take out {9abc} and start over. G={ } G=8*7/2= G= G is a kplex for k  18. deg= H1=ISG{ } (deg8=1) H1=7*6/2=21, H1=9. H1 is a kplex for k  12. deg= 1 2 3 4 5 6 7 8 9 a c b SP3 H2=ISG{234567} (deg1=2) H2=6*5/2=15, H2=6. H2 is a kplex for k  9. deg=112223 H3=ISG{34567} (deg2=1) H3=5*4/2=10, H3=4. H3 is a kplex for k  6. deg=01222 H4=ISG{4567} (deg3=0) H4=4*3/2=6, H4=4. H4 is a kplex for k  2. deg=1222 H5=ISG{567} (deg4=1) H5=3*2/2=3, H5=3. H5 is a kplex for k  0. deg=222 So take out {567} and start over. G={12348} G=5*4/2= G= G is a kplex for k  5. deg=33220 1 2 3 4 5 6 7 8 9 a c b SP4 H1=ISG{1234} (deg8=0) H1=4*3/2=6, H1=5. H1 is a kplex for k  1. deg=3322 H2=ISG{124} (deg3=2) H2=3*2/2=3, H2=3. H2 is a kplex for k  0. deg=222 This is exactly what we want ! is a 1plex (missing only 1 edge) and 124 was determined to be a clique (0plex – missing no edges). It’d have been great if 123 had revealed itself as a clique also, and if 89abc had been detected as a 1plex before 9abc was detected as a clique. How might we make progress in these directions? Try returning to remove all degree ties before moving on? We will try that on the next slide?

18 1 4 2 3 5 6 7 c 9 b a 8 K-plex search on G6 continued G6
k-plex=Subgraph missing  k edges. H a kplex and F a ISG(H), then F is a kplex If H is an ISG, |VH|=h, |EH|=H, H=h(h-1)/2, H is a kplex iff H–Hk. If F is missing an edge, H is missing that edge too (K inherits all H edges). F can’t be missing more edges than H. k-core=Subgraph containing  k edges. If F a kcore ISG of H then H is a kcore H0=G={ abc} H0=12*11/2=66. H0= H0 is a kplex for k  47 deg= is a kcore for k19 Mining all kplexes and kcores. At each step, we [potentially] branch to each of the lowest degree vertices (note, I skipped many of them in this illustration.) We might want kplex and/or kcore structure around a particular vertex. Use SP1, SP2…. E.g., find the kplex and kcore structure around v=1: H1=ISG{ abc} (deg5=2). H1=11*10/2=55, H1= H1 is a kplex for k  37. deg= is a kcore for k17 H26=ISG{ abc} (deg6=2). H26=10*9/2=45, H26= H26 is a kplex for k  30. deg= is a kcore for k15 H27=ISG{ abc} (deg7=2). H27=10*9/2=45 H27= H27 is a kplex for k  30. deg= is a kcore for k15 (H26 and H27 specify removal of 7 and 6 resp. Thus remove both) H2=ISG{123489abc} H2=9*8/2= H2= H2 is a kplex for k  22. deg= is a kcore for k14 H34=ISG{12389abc H34=8*7/2=28 H34= H34 is a kplex for k  16. deg= is a kcore for k12 1 3 2 4 5 6 7 8 9 a c b SP1 H38=ISG{12349abc} H38=8*7/2=28 H38= H38 is a kplex for k  15. deg= is a kcore for k13 H348=ISG{1239abc H348=7*6/2=21 H384=10 H384 is a kplex for k  11. deg= is a kcore for k10 H341=ISG{2389abc} ( H341=7*6/2=21 H341=10 H341 is a kplex for k  11. deg= is a kcore for k10 SPL1(1)=234 SPL2(1)=7c SPL3(1)=569abc SPL4(1)=8 To check 1234 kplex/core status check if there are edges, (y,y,n). Thus, 123, 124 are 0plexes and 3cores. 134, 234 are 1plexes and 2cores. 1234 is a 1plex and a 5core. H342=ISG{1389abc} H342=7*6/2=21 H342=10 H342 is a kplex for k  11. deg= is a kcore for k10 (H341,H342,H38 specify removal of 1,2. Thus remove both) H4=ISG{389abc H4= H4= H4 is a kplex for k  6. deg= is a kcore for k9 H5=ISG{89abc H5=5*4/2= H5= H5 is a kplex for k  2. deg= is a kcore for k8 1 2 3 4 5 6 7 8 9 a c b SP2 H6=ISG{9abc} (deg7=2) H6= H6= H6 is a kplex for k  0. deg= is a kcore for k6 This is what we want. 89abc a 2plex;9abc a 0plex H0=G={ } H= H= H is a kplex for k  11. deg= is a kcore for k9 H03=G={124567} H= H= H is a kplex for k  7. deg= is a kcore for k8 H05=G={123467} H= H= H is a kplex for k  7. deg= is a kcore for k8 To check 12347c kplex/core status, check edges 17 1c 27 2c 37 3c 47 4c 7c (n n n n n y y n n) 12347c=(Comb(6,2)-7)plex=8plex, 7core H06=G={123457} H= H= H is a kplex for k  7. deg= is a kcore for k8 1 2 3 4 5 6 7 8 9 a c b SP3 H035=G={12467} H= H= H is a kplex for k  7. deg= is a kcore for k8 H036=G={12457} H= H= H is a kplex for k  7. deg= is a kcore for k8 H0356=G={1247} H= H= H is a kplex for k  2. deg= is a kcore for k4 H03567=G={124} H= H= H is a kplex for k  0. deg= is a kcore for k3 This is what we want. Remove 12489abc H7={3567} H7=6. H7= H7 is a kplex for k  3. deg= is a kcore for k3 1 2 3 4 5 6 7 8 9 a c b SP4 H7={567} H7=3. H7= H7 is a kplex for k  0. deg= is a kcore for k3 1 4 2 3 5 6 7 c 9 b a 8 G6

19 K-Degree-Difference Community Search on G6: A kDegreeDifference Community of a graph, G, is a subgraph, H, such that ddHIntDegH-ExtDegH  k. Theorem: If hH, ddH-h = ddH – (2idh - edh) So we want to remove h s.t. (2idh – edh) is minimum. H=G= { abc} id= ed= ddH=38 ddH/|VH| = 38/12 = 3.16 Remove 5 H= { } id= 02321 ed= ddH=2 ddH/|VH| = 2/5 = 0.4 2id-ed=-34630 Remove 3 H= { abc} id= ed= ddH=34 ddH/|VH| = 34/11 = 3.09 2id-ed= Remove 6,7 H= { } id= 2321 ed= ddH=5 ddH/|VH| = 5/4 = 1.2 2id-ed= 4630 Remove 8 H= {123489abc} id= ed= ddH=26 ddH/|VH| = 26/9 = 2.88 2id-ed= Remove 4,8 H= { 567} id= 222 ed= 011 ddH=4 ddH/|VH| = 4/3 = 1.33 2id-ed= 433 Clique, so remove 567 and start over with 38 (but it has 0 id) H= {1239abc} id= ed= ddH=16 ddH/|VH| = 16/7 = 2.28 2id-ed= Remove 1,2 H= {39abc} id= 13334 ed= ddH=10 ddH/|VH| = 10/5 = 2.0 2id-ed=05568 Remove 3 H= {9abc} id= 3333 ed= ddH=9 ddH/|VH| = 9/4 = 2.25 2id-ed=5565 Clique so start over with H= { } id= ed= ddH=17 ddH/|VH| = 17/8 = 2.13 2id-ed= Remove 8 H= { } id= ed= ddH=16 ddH/|VH| = 16/7 = 2.28 2id-ed= Remove 3,6 H= {12457} id= 22312 ed= ddH=6 ddH/|VH| =6/5 = 1.2 2id-ed=33613 Remove 5 1 3 2 4 5 6 7 8 9 a c b SP1 H= {1247} id= 2231 ed= ddH=4 ddH/|VH| = 4/4 = 1.0 2id-ed=3360 Remove 7 H= {124} id= 222 ed= 111 ddH=3 ddH/|VH| = 3/3 = 1.0 2id-ed=333 Clique, so start over with 35678 1 4 2 3 5 6 7 c 9 b a 8 G6

20 Very Simple Weighted SP1 and SP2 K-plex Search on G6
Weighting: 0,1path nbrs of x times 3; 2path nbrs of x times 2; Until all degrees are weighted, then back to actual subgraph degrees H={ abc deg x=1 H={ abc H=15 H=7 kplex k8 deg x=1 after cutting 2,3,4 H={ abc H=6 H=5 kplex k1 deg x=1, after cut 23468 H={ abc deg x=2 H={ abc H=15 H=7 kplex k8 deg x=2 after cutting 2,3,4 H={ abc H=6 H=5 kplex k1 deg x=2, after cut 23468 H={ abc H=3 H=3 0plex deg x=3 after cut 1 (actual subgraph degrees) H={ abc deg c x=3 H={ abc H=6 H=4 2plex deg c x=3, after cut 2368 H={ abc deg x=4 H={ abc H=3 H=3 0plex deg x=4 after cut 2346 UNWEIGHTED Degrees H={ abc deg H={ abc deg x=5 H={ abc H=10 H=5 5plex deg x=5 after cut 34 H={ abc H=3 H=3 0plex deg x=5 after cut 1 from SG degs 1 3 2 4 5 6 7 8 9 a c b SP1 H={ abc deg x=6 H={ abc deg x=6 after cut 34 H={ abc H=3 H=2 1plex deg x=6 after cut 12 SG degs 211 H={ abc deg x=7 H={ abc deg x=7 after cut 34 H={ abc H=3 H=3 0plex deg x=7 after cut 1 SG degs H={ abc deg cc68 x=8 H={ abc deg cc68 x=8 after cut 34 H={ abc plex deg x=8 after cut12 SG degs 1 2 3 4 5 6 7 8 9 a c b SP2 H={ abc deg cc9c x=9 H={ abc H=10 H=8 H a kplex k 2 deg cc9c x=9 after Cutting 2,3,6 H={ abc deg cc9c x=a H={ abc H=10 H=8 H a kplex k 2 deg cc9c x=a after cut 2,3,6 H={ abc deg cc9c x=b H={ abc H=6 H=6 H a kplex k 0 deg cc9c x=b after cut 2,3,6 1 2 3 4 5 6 7 8 9 a c b SP3 H={ abc deg ccpc x=c H={ abc H=6 H=6 H a kplex k 0 deg cc9c x=c after cut 2,3,6 By weighting the initial round we have gotten nearly perfect information for this example (G6). The weightings, 3 and 2, were arbitrarily chosen but worked here. In general, one should devise a formula to determine them. Also we could weight SP3 and etc. as well? If we have paid the price of constructing SPk k>1, this is a much simpler way to do it, as compared to the Clique Percolation method of Palla (next slide). 1 2 3 4 5 6 7 8 9 a c b SP4 1 4 2 3 5 6 7 c 9 b a 8 G6

21 G7 Very Simple Weighted SP1 k-plex Search on G7 Weighting:
0,1path nbrs of x times 1; 2path nbrs of x times 0; 1 2 1 3 1 2 1 3 2 1 4 5 1 5 2 1 6 2 1 7 2 1 8 2 1 9 2 2 1 3 2 1 2 1 2 3 1 4 2 1 5 5 2 1 3 6 2 1 3 2 7 1 8 2 1 4 9 2 1 3 3 1 4 3 1 4 2 3 1 6 3 1 4 3 1 6 SP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 6 2 1 9 3 1 4 1 6 5 1 3 6 1 4 7 1 4 8 1 4 9 1 5 H= H=561 H=77 kplx k484 D g9a bg kcore k77 Cut 123: H= H=120 H=38 kplx k82 D kcore k38 Cut 23: H= H=55 H=26 kplx k24 D kcore k26 Cut 24: H= H=15 H=12 kplx k3 D kcore k12 Cut 2: H= H=10 H=10 kplx k0 D kcore k10 {1,2,3,4, 14} is a clique. {1,2,3,4,9,14} is a 3plex. Cut0: H= H=21 H=4 kplx k17 D kcore k4 Cut 1 leaves 25 only. H= D af Cut012: H= H=55 H=19 kplx k36 D kcore k19 H= H=19 H=4 kplex k15 D kcore k4 Cut03: H= H=6 H=4 kplx k2 D kcore k6 {24,32,33,34} is a 2plex G7 Cut0: H= H=19 H=4 kplex k15 D kcore k4 Cut 0 leaves {9,31} as a 0plex H= D H= H=17 H=2 kplex k15 D kcore k2 Cut 0 leaves {27,30} as a 0plex Cut01: H= H=15 H=6 kplx k9 D kcore k6 Cut0: H= H=10 H=6 kplx k4 D kcore k6 {5,6,7,11,17} is a 4plex H= H=14 H=0 kplex k14 D kcore k0 no edges left H= D The expected communities are mostly not detected as kplexes or kcores. Cut0: H= H=21 H=4 kplx k17 D kcore k4 (Symbols for base 65 )

22 ISG EdgeCount kplex Search Alg on G8 G8 is a graph of word associations starting from the word, BRIGHT using USF Free Association. An edge, AB, means some people associate the word B to word A. We try to determine the 4 categories; Intelligence, Astronomy, Light, Colors . 1 2 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 H = H=1431 H=197 kplex k1234 Deg 44444bb5656h9747c3c864fag4a386e j kcore k197 Cut H = H=45 H=22 kplex k13 Deg kcore k22 Cut H = H=10 H=8 kplex k2 Deg kcore k8 So {12,24,25,31,54}={sun,yellow,color,red,bright} is a 2plex Attempt 2: Remove bright, double the weight of nbrs of 12 (vertex if max degree) H = H=1431 H=197 kplx k1234 44444ba5645g9746b2b864f9f49386d Cut H = H=1431 H=197 kplex k1234 44484mka68agie4cm2b8c4fif49386d e356a349c5 G8 Cut H = H=1431 H=197 kplex k1234 c 1 6 1 2 7 1 3 9 1 4 7 1 5 4 1 6 7 1 7 2 1 8 3 1 9 2 2 1 8 2 1 6 2 1 4 3 2 1 5 2 4 1 5 2 1 6 6 2 1 4 2 7 1 8 2 1 3 9 2 1 8 3 1 6 SP1 2 1 3 4 6 5 7 9 8 10 11 12 14 13 15 17 16 18 19 20 22 21 23 25 24 26 28 27 29 30 31 32 34 33 35 37 36 38 39 40 42 41 43 45 44 46 48 47 49 50 51 52 54 53 1 4 2 1 4 3 1 4 4 1 5 1 4 6 1 7 1 8 1 5 9 1 6 1 5 3 1 4 2 3 1 4 3 1 5 3 4 1 5 3 1 6 6 3 1 5 3 7 1 8 3 1 4 9 3 1 6 4 1 8 4 1 5 2 4 1 7 3 4 1 6 4 1 8 5 4 1 3 6 4 1 5 7 4 1 3 8 4 1 5 9 4 1 3 5 1 4 5 1 9 2 5 1 6 5 3 1 4 5 1 9 1 Scientist 2 Science 3 Astronomy 4 Earth 5 Space 6 Moon 7 Star 8 Ray 9 Intelligent 10 Golden 11 Glare 12 Sun 13 Sky 14 Moonlight 15 Eyes 16 Sunshine 17 Light 18 Lit 19 Dark 20 Brown 21 Tan 22 Orange 23 Blue 24 Yellow 25 Color 27 Black 26 Gray 28 Race 29 White 30 Green 32 Crayon 31 Red 33 Pink 35 Flashlight 34 Velvet 36 Glow 38 Gifted 37 Dim 39 Genius 40 Smart 41 Inventor 43 Brilliant 42 Einstein 44 Shine 46 Telescope 45 Laser 47 Horizon 48 Sunset 49 Ribbon 50 Violet 51 Purple 52 Beam 53 Night 54 Bright

23 SP2 1 3 2 4 6 5 7 8 10 9 12 11 14 13 16 15 19 18 17 20 22 21 24 23 27 26 25 29 28 30 31 32 34 33 35 37 36 39 38 40 41 43 42 45 44 47 46 48 50 49 51 53 52 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SP1 and SP2 for G8 1 b a g b 2 b f 9 f d SP1 3 2 5 4 7 6 8 10 9 11 13 12 15 14 16 18 17 20 19 21 22 24 23 26 25 28 27 30 29 32 31 34 33 36 35 37 38 40 39 42 41 43 44 46 45 47 49 48 50 52 51 53

24 Simple Weighted SP1, SP2 K-plex Search on G8
3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 G8 Weighting ,1path neighbors (12012) times 5 334 2 path nbrs (39893) times 3 next cut<18 x=1 instead cut<19 x=1 This gives C0={1,2,9,39,40,41,42,43} which is exactly the Intelligence Class except that v=38 (gifted) is missing. It is a kplex k8 (not that strong of a community!) x=1 Within the Intelligence Class this is the 1plex, C1={1, 2,40,41,42} ( only edge missing is (2,40) ) with C1-degrees: Thus if we cut next using C1-degrees (cut 2,40) leaves the clique (0plex) C2={1,41,42} Cutting C0 and starting over: G-C0 degs x=3 Weighting 0,1path neighbors (367) times 5 2 path nbrs ( ) times 3 next cut<10 x=3 next cut<12 x=3 This gives C2={3,4,5,6,7, ,13,14,15,17,23,25,31,44, , 53} Whereas, Astronomy is 3,4,5,6,7,8,10,11,12,13,14,16,17, ,45,46,47,48,52,53 so, not a good fit! With replacement but using as starting vertex, the remaining vertex of highest degree (first, v=12). Weighting 0,1 SP nbrs times 5 2 SP nbrs times 3 cut<20 x=12 cut<20 x=12 Astronomy is Weighting 0,1 SP nbrs times 6 2 SP nbrs times 3 cut<30 Astronomy is Weighting 0,1 SP nbrs times 6 2 SP nbrs times 1 5 astronomy vertices missing (3,5,45,46,53} and 2 non-astronomy included {21,24} x=25 Weighting 0,1 SP nbrs times 6 Colors is 4 colors missing but zero non-colors included. 44444ba5645g9746b2b864f9f49386d x=1

25 While constructing Shortest Path pTrees, SP2…, record the Shortest Path Participation Count of each edge (SPPC). The edge(s) with max SPPC should be the best candidates for removal? 1 E ct 1 E ct ct Delete (1,2) And {3,6,8} and do over. Delete (1,6) and do over. E 1 1 1 1 1 1 1 1 2 3 4 5 6 7 1 E ct SP2 ct 2 3 4 5 6 7 SP3 ct SP gives the connectivity component partition: CC(1)={1,2,3,4} 0plex since EdgeCt=12= 2*COMBO(4,2) CC(5)={5,6,7} 1plex since EdgeCt=4=2*(COMBO(3,2)-1) SP ct SPPC ct 1 SP2 ct SP2 ct 1 SP2 ct 2 3 4 5 6 7 1 SP3 ct 1 SP ct SP3 1 ct 2 3 4 5 6 7 1 SP=SP1 | SP2 | SP3 ct 2 3 4 5 6 7 SP4 ct SP gives connectivity comp partition: CC(1)={1,5,7} is a 0plex since EdgeCt=3=COMBO(3,2)-0. CC(2)={2,4} is a 0plex since EdgeCt=1=COMBO(2,2)-0. 1 2 SP ct 4 c SPPC 1 2 3 5 6 7 ct SP gives connectivity comp partition: CC(1)={1,2,4,5,7} is a 5plex since EdgeCt=5=COMBO(5,2)-5. CC(3)={3,6,8} is a 0plex since EdgeCt=3=COMBO(3,2)-0 1 2 4 3 6 G2 7 5 1 2 3 4 5 6 8 7 G5 6 3 4 1 SPPC) ct SP gives connectivity comp partition: CC(1) = {1}List(SP(1) = {1,2,3,4,5,6,7} is a 12plex since EdgeCt=9=COMBO(7,2)-12

26 GN: Compute all edge betweenesses (SPPCs)
Remove edge with largest betweeness Recalc betweenesses; Repeat. 1,1 Ekey 1,2 1,3 1,4 1,5 2,1 2,2 2,3 2,4 2,5 3,1 3,2 3,3 3,4 3,5 4,1 4,2 4,3 4,4 4,5 5,1 5,2 5,3 5,4 5,5 E 1 SPPC 4 G1_2 1 2 3 4 5 G1_2 1 2 3 4 5 Ekey 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 E 1 G1_1 2 3 4 S 1 P 2 3 4 1,1 Ekey 1,2 1,3 1,4 1,5 2,1 2,2 2,3 2,4 2,5 3,1 3,2 3,3 3,4 3,5 4,1 4,2 4,3 4,4 4,5 5,1 5,2 5,3 5,4 5,5 E 1 SPPC 5 4 G1_3 1 2 3 4 5 G1 1 2 3 4 1 S P 2 3 4 null nul S 1 P 2 4 3 5 S 1 P 2 3 4 5 SPPC 3 2 4 1 null nul Ekey 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 E 1 SPPC 1 2 3 nul 2 S P 1 3 S 2 P 4 1 133 S 2 P 3 4 1 S 2 P 4 3 1 2 S P 1 3 4 5 2 S P 1 3 5 4 Check SPPC(34)=SPPC(43) (verify SPs backwards from hk get counted.) (34)E so ct=1 + CountS2P(34)=1 + CountS2P(43)=1 so ct=3 + CtS3P(34g)=0 + CtS3P(g34)=1, g=1 ct=4 GN says delete (3,4)! GN says delete any edge! 2 S P 1 2Pkey 1,1,1 1,1,2 1,1,3 1,1,4 1,2,1 1,2,2 1,2,3 1,2,4 1,3,1 1,3,2 1,3,3 1,3,4 1,4,1 1,4,2 1,4,3 1,4,4 2,1,1 2,1,2 2,1,3 2,1,4 2,2,1 2,2,2 2,2,3 2,2,4 2,3,1 2,3,2 2,3,3 2,3,4 2,4,1 2,4,2 2,4,3 2,4,4 3,1,1 3,1,2 3,1,3 3,1,4 3,2,1 3,2,2 3,2,3 3,2,4 3,3,1 3,3,2 3,3,3 3,3,4 3,4,1 3,4,2 3,4,3 3,4,4 4,1,1 4,1,2 4,1,3 4,1,4 4,2,1 4,2,2 4,2,3 4,2,4 4,3,1 4,3,2 4,3,3 4,3,4 4,4,1 4,4,2 4,4,3 4,4,4 2 P 1 3 S P 1 4 S 3 P 2 4 1 S 3 P 1 2 5 4 GN says delete 12 | 25 | 34 | 36 G1_4 1 2 3 4 5 6 To construct SPPC(hk) =SPPC(kh) (Shortest Path Participation Count) if (hk)E count 1 + OneCountS2P(hk) + OneCountS2P(kh) + OneCountS3P(hkg) + OneCountS3P(ghk), g + OneCountS4P(hkfm) + OneCountS4P(fhkm) + OneCountS4P(fmhk) f,m. Etc. GN: delete 12 | 23 | 25 not 34, 45 1 S P 2 3 4 5 6 Ekey 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6 4,1 4,2 4,3 4,4 4,5 4,6 5,1 5,2 5,3 5,4 5,5 5,6 6,1 6,2 6,3 6,4 6,5 6,6 E 1 G1_4 2 3 4 5 6 G1_3 1 2 3 4 5 G1_4 1 2 3 4 5 6 not 23, 16, 45 SPPC 7 5 6 4 G1_3 1 2 3 4 5 2 S P 1 3 5 4 6 G1_3 1 2 3 4 5 SPPC recalculation and repeat steps? Anyone see a shortcut? Or do we just start the calculation over on the reduced graph? Do the pointers help? Since in S2P(hk) one has to search out S2P(kh) and in S3P(hk) one has to find all S3P(hkg) snf D3P(ghk) g In the appendix I begin work on uniquely representing shortest k paths using both a fore and aft pTree. Consider that in G1_4 S3P(16)=2. G1_3 1 2 3 4 5 Notes: If any OneCount=0, no subsequence exist. It might be useful to use ptrs to make this proc easier. GN edge betweenness specifies pruning (2,4) S 3 P 1 2 5 4 6 G1_3 1 2 3 4 5

27 CC(9)={9 a b c} is a 3plex since EdgeCt=3=COMBO(4,2)-3
1 2 3 4 5 6 7 8 9 a b c d e f g SP6 1 2 3 4 5 6 7 8 9 a b c d e f g SP 1 2 3 4 5 6 7 8 9 a b c d e f g 1 2 4 3 5 7 6 8 b a 9 c d e g f 4 1 SP2 1 2 4 3 6 5 8 7 9 a c b d f e g SP gives connectivity comp partition: CC(1)={ } is a 20plex since EdgeCt=8=COMBO(8,2)-20. CC(9)={9 a b c} is a plex since EdgeCt=3=COMBO(4,2)-3 CC(d)={d f g} is a plex since EdgeCt=3=COMBO(3,2) CC( e)={e} SPPC 1 g f 2 7 3 4 5 6 8 9 a b c d e E 1 5 6 7 2 3 4 8 SP2 all pure0 SP 1 5 6 7 2 3 4 8 SP3 1 2 3 4 5 6 7 8 9 a b c d e f g SP gives connect comps: CC(1)={1}, CC(5)={5 6 7} Is a 0plex since EdgeCt34=COMBO(3,2)-0 Done! Delete (1,3) (SPPC=16 max) and delete {d f g}, {e} and do over. Also delete {9 a b c} as a 4VetexHubSpoke3plex. SP4 1 2 3 4 5 6 7 8 9 a b c d e f g E 1 2 3 4 5 6 7 8 SP2 1 2 3 4 5 6 7 8 SP3 all pure0 SP 1 2 3 4 5 6 7 8 SP gives connect comps: CC(1)={ } 2plex EdgeCt=4=COMBO(4,2)-2. CC(2)={ } is a 3plex since Ect=3=COMB(4,2)-3 (a 4VertexHubSpoke) SP5 1 2 3 4 5 6 7 8 9 a b c d e f g G6 1 2 4 3 6 7 5 8 9 a b c d e f g SPPC (Shortest Path Participation Counts) 1 3 2 4 5 6 7 8 Delete{ } 4VHubSpoke3plex, (1,6)

28 1 E E SP SP SP SP wt V#> 2 SP -1 SP -1 SP -1 SP -1 SP WeightSum Nbrs Nbrs If ( WtSum>=-20 & Nbr(1) ) then 1 else 0. wt V#> 2 SP -1 SP -1 SP -1 SP -1 SP WeightSum Nbrs Nbrs select their communities with a threshold on the weighted sum (=-20) giving the light green “1community” and black “34community (overlapping). Next, excise those and iterate. When all are in a community probably do a k means reshuffle to improve? This is an Agglomerative Method based on weighted sum of SPk counts to identify 1 and 34 as centers. Then among their individual nbrs, 1 2 4 3 5 6 7 8 9 Using weights of 0,1,2,4,6 for SP1,2,3,4,5 resp. wt V#> 0 SP 1 SP 2 SP 4 SP 6 SP WeightSum SP1|2(17) Iterate again on the remaining Using weights of5,5,1,1,0 for SP1,2,3,4,5 resp. wt V#> 5 SP 5 SP 1 SP 1 SP 0 SP WeightSum SP1|2(8) SP1|2(33) This method uses site betweeness, not edge betweenenss (SPPC not computed) but gives a good overlapping clustering (close to the author’s). One could attempt a few kMeans rounds to try to improve it. 10,25,26,28,29, 31 33,34 not shown (only 17 on, 8 only 27 turned on 1 SP4 =4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg G7 1 2 3 4 6 5 7 9 8


Download ppt "In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g"

Similar presentations


Ads by Google