Download presentation
Presentation is loading. Please wait.
1
Ideas Oblique FAUST, Barrel (OFLB)
f=distance dominated functional, avgGap=(fmax-fmin)/fct may be a good measurement for setting thresholds, e.g., x is an outlier=anomaly if gap around {x} > 3*avgGap? If the minimum barrel radii >> 0, we have chosen a d-line far from the data. It may be advisable to pick p to ba an actual data point. Here are the formulas from the spreadsheet: G=(B12-B$6)*B$9+(C12-C$6)*C$9+(D12-D$6)*D$9+(E12-E$6)*E$9 H=G12-$G$ L=(x-p)od-min I=(B12-B$6)^2+(C12-C$6)^2+(D12-D$6)^2+(E12-E$6)^2 B=SQRT[(x-p)o(x-p)-(x-p)od^2] Note we don't round, so we are calculating pTree bitslices by truncating. We don't even need to do that! For fixed piont, here are the bislice formulas: Keep going (take bitslices to the right of decimal pt) ... Floating point? Bitslice the mantissa. The exponent shifts the slice name. E.g., .1011 25 .0010 23 .1010 2-1 If d and t are trained over DocTerm, DT, Gradient=G=(Gd, Gt). Instead of a LineSearch using F(s)=f +sG, always use 2dRectangleSearch, F(sd,st)=F(f + sd*Gd + st*Gt). Set F/sd =0 and F/st=0. Better to find dense cells (sphere, barrel, cone) then fuse them? It's difficult to position spheres, barrels, cones around clusters (bumps, protrusion etc.) For outlier clusters (singleton\doubleton) this does not apply. An algorithm?: start with small barrel radius, find the dense region between two consecutive gaps within this pipe. Should identify a portion of a dense cluster. How to go from there? Lots of possibilities. a. Use centroid of dense pipe piece as sphere|barrel center. b. Move to a better centroid for that cluster by a gradient asc/desc process c. In a "GA mutation" fashion, jump to a nearby centroid, governed by some fitness function (e.g., count in dense pipe piece). 24 1 23 22 1 21 1 20 2-1 2-2 1 2-3 2-4 1 SSPTS = set of all SPTSs (columns of reals); V = n-dim vector space. Code operations on SSPTS (both 1 level or multi-level): SSPTS SSPTS SSPTS (Binary Algebraic Operations): including: +, -, /, RWP =Row_Wise_Product 10110. 10. .01010 {SPTSk}k=1..n SSPTS (Unary ops.Typically SPTSk=Vk) incl: SDv (Square Distance from a fixed vector, vV) DPv (Dot Product with a fixed vector, vV) ERa = FP's EinRings (n=1, rR) result masks rows s.t. row < a Oblique FAUST, Barrel (OFLB) Alternate Lpqx, Bpqx to produce a cluster dendogram (topdown). Take p=1st_TR pt? d=vomavg Defining Avg Density? AvD = count / k=1..dim(maxk-mink)? This is for the purpose of choosing good Thresholds. MinGapThresh=Tb,AvD≡ b*(1/ AvD)1/dim? (b=adjustable parameter If we're given a TrainingSet, TR, with K classes, is avgk=1..Kvomk a better mediod than VoM? Take p=MinCorner, q=MaxCorner of box circumscribing {VoMk}k=1..K better than not circ box of TR? SPTS R includes AGa = YC's Aggregates and iceberg queies: count, sum, avg, max, min, median, rank_k, top_k, IceBergQueries. SSPTS SSPTS (Unary Operations) including: SPc=Scalar_Product (Multiply each SPTS row by same constant, c. Use const SPTS? all rows=c, then RWP. More efficient? w/o forming const SPTS? Use c's bit pattern c only? (subset of previous with n = |SSPTS|?) Note, SSPTS includes SPTSs of all cardinalities (= depths = # of rows) It seems best to code on SSPTS rather than on SSPTSn (card(SPTS)=n). Of course, it is very important to know what the rows represent so as to avoid nonsense results, however, why restrict the operations themselves? When SPTS operands are of different depths, the result SPTS's depth = depth of the shallowest operand (operate from the top of each).
2
Oblique FAUST (OF) Clustering: Linear (default) OFL, Spherical OFS, Barrel OFB, Conical OFC)
Assume a real number table, TBL(C1..Cn), (= n-dim vector space; or categorical columns, either code to real numbers or bitmap, e.g., a Month column can be coded as {1,...,12} and a Color column can be bitmapped by Red(yes=1|no=0)...Violet(yes=1|no=0) ). TBL is converted to a PTreeSet. GapUpper d p GapLower a1 a2 No gaps show on the red, blue or green projection lines Bpdx x Define distance function ds(x,y):TBLTBLR ds(x,y)= kCRrk|xk-yk|2 + kCCck|xk-yk| where CR is the set of real columns, CC is the set of categorical columns (consider coded columns as real) and rk, ck are real coefficients. Each method uses a real valued functional from X to R and all methods are completely data parallel (data can be distributed over a cluster, processed in parallel (dot product), then the partial results sent home to be added. gapBarrel Lp,d:XR: Lp,d(x)=(x-p)od Oblique FAUST Linear (OFL) clustering (Enclose clusters between (n-1)-dimensional hyperplanar gaps) Find a1<a2 such that =GapLower={x | a1<Lpd(x)<a1+T} and =GapUpper={x | a2<Lpd(x)<a2+T} and C={x|a1+T<Lpd(x)<a2} Bp,d(x)=(x-p)o(x-p)-((x-p)od)2 Oblique FAUST Barrel (OFB) (Enclose clusters with barrel gaps) Search for GapLower>T, GapUpper>T and GapBarrel>T2 (BR≡Barrel_Radius) d p Note: Bpd(x) = Sp(x) L2pd(x) Note: C2pd(x) = L2pd(x) / Sp(x) Cp,d(x)=(x-p)od / (x-p)o(x-p) Oblique FAUST Cone (OFC) (Enclose clusters with cone gaps) r p Sp(x)=(x-p)o(x-p) Oblique FAUST Spherical (OFS) (Enclose clusters with spherical gaps) Search Sp for spherical gap, {x | r2 Sp(x) < (r+T)2}= so that the interior of the r-sphere about p encloses a sub-cluster.
3
OF LB...LB Clustering on Concrete(STrength,ConcreteMix,WAter,FineAggregate, AGgregate). Assess STerror L<40M<60H if 1st B radius>>0, use p=min_radius_pt p T=MGW=12 d=x-n= CONCRETE ST CM WA FA AG (x-p)od/4 Ct Gp C (x-p)od/4 gp3 C11 29 3 (x-p)od/4 gp C21 43 1 Br/4 gp3 C211 10 1 86 1 L1 Br/4 gp3 C0 65 1 L1 M1 L1 L1 M1 d=4 L1 M1 L2 M1 C0 L6 L3 M1 C211 L1 Br/4 gp3 C1 79 1 L2 L11 M3 L11 M C11 L1 L4 M1 (x-p)od/4 gp C23 51 1 M3 M2 L1 L1 M2 H1 C12 (x-p)od/4 gp3 C12 27 2 H1 Br/4 gp3 C231 35 1 L1 M1 H3 L1 H1 M2 H1 M1 H1 L1 L1 M1 C231 (x-p)od/4 gp C22 57 1 L20 M9 H4 C1 Br/ gp3 C241 ... 61 1 Br/4 gp3 C2 94 2 L3 (x-p)od/4 g3 C411 21 5 L1 M1 M1 . H5 L1 L1 M1 H5 C2411 L9 M C21 M1 M2 . H1 H5 L4 M3 H1 C22 L2 M4 H3 C23 H5 Br/4 gp3 C251 25 1 M1 H1 M1 (x-p)od/4 gp3 C24 ... 58 1 M1 L1 L2 M3 H16 C24 L2 M3 H4 C25 L1 M2 H16 C241 M1 (x-p)od/4 gp3 C25 58 1 M1 M2 M1 H3 C26 (x-p)od/4 gp3 C26 56 1 M1 ' H3 H2 M1 M1 H1C251 M2 (x-p)od/4 gp3 C27 56 1 H1 M3 H1 C27 M2 M H1 (x-p)od/4 gp3 C33 32 1 M H5 Br/4 ct gp3 C3 124 3 (x-p)od/4 gp3 C31 70 3 L18 M26 H28 C2 L1 M H3 M3 H3 C31 (x-p)od/4 gp3 C32 34 1 L1 M1 H4 C32 L1 M1 d=4 . H4 H1 c (Clust dendogram w/o purity) M1 H5 C33 H1 c0 c1 c2 c3 c4 H2 H1 c11 c12 c21 c22 c23 c24 c25 c26 c27 c31 c32 c33 L2 M12 H 17 C3 M1 H4 M2 c211 c231 c241 c251 M1 M3 M3 H1 C4 c2411 H1 Br/4 ct gp3 C4 116 1 H1 M3
4
OF_LBL: Clustering on OF_LBL: 75% accurate Next OFL..L
96 1 (x-p)o(xop)/4 Ct Gap>=3, p=1st L=1 L=34 M=14 H=4 C1 M=1 L=1 M=1 H=1 L=2 M=10 H=21 C2 L=1 M=1 H=5 L=1 M=2 H=5 L=1 M=5 H=1 OF_LBL: Clustering on Concrete(STrength,ConcreteMix,WAter,FineAggregate, AGgregate) (Cluster CM,WA,FA,AG. Assess error w L:ST<40 M:41<ST<60) (x-p)od/4 Ct Gap>=3 p=mn. q=mx 46 1 86 1 L=1 M=1 L=40 M=36 H=32 L=2 M=12 H=21 (x-p)od/4 Ct Gap>=3 L=0 M=3 H=1 OF_LBL: 75% accurate Next OFL..L (just linears) for comparison First: LBLBL on C11 L=5 min=p max=q T=12 d ST CM WA FA AG L=12 M=2 L=17 M=12 H=3 C11 H=1 (x-p)od/4 Ct Gap>=3 p=mn. q=mx 64 1 L=1 M=1 H=5 L=0 M=2 H=8 L=1 M=1 H=6 L=1 M=1 H=1 L=0 M=2 H=1 (x-p)o(xop)/4 Ct Gap>=3, p=1st L=1 M=3 H=3 L=1 M=1 H=4 H=4 M=1 H=6 H=1 H=2 M=1 M=2 M=3 L=1 M=4 H=6
5
(x-p)od/4 Ct Gap>=3 OFL..L: Clustering on Concrete(STrength,ConcreteMix,WAter,FineAggregate, AGgregate) (Cluster CM,WA,FA,AG. Assess error w L:ST<40 M:41<ST<60) OFLL: doesn't look promising. Also tried staying with p=nnnn q=xxxx without promising results. 86 1 L=1 M=1 (x-p)od/4 Ct Gap>=3 p=naaa q=xaaa min=p max=q T=12 d ST CM WA FA AG 100 1 L=2 M=1 L=5 L=8 M=5 L=17 M=11 L=17 M=3 L= H=3 L=5 M=4 H=6 L= H=12 L=1 M=7 H=6 L=0 M=3 H=4 L=0 M=2 H=1 L= C1 M=36 H=32 L=2 M=12 H=21 C2 L=0 M=3 H=1
6
Slide=OF_LS: Clustering on
(x-p)od/4 Ct Gap>=3 Slide=OF_LS: Clustering on Concrete(STrength,ConcreteMix,WAter,FineAggregate, AGgregate) (Cluster CM,WA,FA,AG. Assess error w L:ST<40 M:41<ST<60) OF_LS: in the Spherical round, mask off at the first radial thining, then try the last point. Repeat alternating first, last, middle.... At any S-round, if p has no nbrs within T move up (if last, or down if 1st) and redo with that p. Note: Should probably pick p randomly? 86 1 L=1 M=1 (x-p)o(x-p)/4 Ct Gap>=3 p=first ... 109 1 min=p max=q T=12 d ST CM WA FA AG L=12 (x-p)o(x-p)/4 Ct Gp>=3 p=3RD last (x-p)o(x-p)/4 Ct Gp>=3 p=last L=1 M=0 H=3 (x-p)o(x-p)/4 Ct Gp>=3 p=mid H=19 (x-p)o(x-p)/4 Ct Gp>=3 p=middle L=1 M=8 (x-p)o(x-p)/4 Ct Gp>=3 p=1st M=7 (x-p)o(x-p)/4 Ct Gp>=3 p=1st L=5 M=3 L= C1 M=36 H=32 L=18 M=4 H=4 (x-p)o(x-p)/4 Ct Gp>=3 p=last last=outliers L=0 M=1 H=3 (x-p)o(x-p)/4 Ct Gp>=3 p=last Other 5 are outliers L=2 M=12 H=21 C2 L=0 M=3 H=1 L=1 M=10
7
OFLS: on Conc(STrength,ConcreteMix,WAter,FineAggregate, AGgregate)
(Cluster CM,WA,FA,AG. Assess error w L:ST<40 M:41<ST<60) (x-p)o(x-p)/4 Ct T=2 DIS L=1 ol 97 1 No gaps Next step do a Sp to firn a dense cell at p (x-p)od Ct T=8 p(rand) min max T=Dim*AvgGap = 4* 1.58 = 6.3 d=|x-n| ROW ST CM WA FA AG L3 H10 L43 M59 H55 M3
8
Oblique FAUST Barrel (OFB) Clustering on
Concrete(STrength,ConcreteMix,WAter,FineAggregate, AGgregate) p=vom, q=mean L/4 Ct Gap>4 124 1 p q L/4 Ct Gap>4 105 2 STrength is the class label so we assess error with L:ST<40 M:41<ST<60. Here we try a "maximized dense cell" approach. Find dense regions between consecutive pipe gaps (p=1st, q=last pt. The pipe is around the pq=line of radius r0 as follows If pts evenly spaced, how far apart each adjacent pair? N=num of equiwidth subints on a side=count1/n =150.25=3.5 L = equiwidth of a side = Avgk=1..n(maxk-mink) / N M = length main diagonal of a singleton cube (= dis between evenly spaced points) = (n*L2)1/2 = 144 Set pipe radius r0=M (furthest nbr) or L=72 (nearest nbr Let pipe center to be C0 GWT = Tb,AvD ≡ b * (1/ AvD)1/dim (b is an adjustable parameter, e.g., b=n; AvD= count / k=1..dim (maxk-mink) ) It may be that the pipe is at the edge of a cluster so we try to find the center of the cluster as follows: a. increase barrel stave radius to r1, where Count 1st falls precipitously (reached one edge of the cluster). b. Increase barrel stave radius further until it changes precipitously again at r1 (If it falls, we are leaving our cluster. If it rises, we are entering another cluster on the "found edge" side of our cluster). c. Let C1 = VoM of that barrel stave (between r1-T and r1). d. Find center, C2, of the dense region of r0-pipe through C0 and C1 that contains C0 and C1. e. Find the first spherical gap with center, C3= Avg(e1, e2), where e1, e2 are the pts at the ends of this pipe and on its center line; and radius at least, r3=|e1-e2|/2 76% accuracy. Alg: L(mn-vom) R(find 1st 2 dense radii) L(AVG(means of densities)
9
Oblique FAUST Barrel (OFB) Clustering on
Concrete(STrength,ConcreteMix,WAter,FineAggregate, AGgregate) p=vom, q=mean L/4 Ct Gap>4 124 1 Very Simple MDC alg, do L(mn-vom) R(find 1st 2 dense annulii) L(2 means of those annulii) p q L/4 Ct Gap>4 105 2 ST=class. Assess error with L={ST<40}, M, H={ST>59} "Maximized Dense Cell" or MDC approach: Find dense regions between consecutive pipe gaps (e.g., p=1st, q=last pt. (pipe is around pq-line w radius r0) What r0 defines a pipe? (count >>0 but not too thick) If pts are evenly spaced, how far apart is each adjacent pair? N=# equiwidth subintervals on a side=count1/n =150.25=3.5 L = equiwidth of a side = Avgk=1..n(maxk-mink) / N M = length main diagonal of a singleton cube = (n*L2)1/n Set pipe radius r0=M (furthest nbr) or L=72 (nearest nbr Let C0 = center of dense region of the pipe. GWT=L or M? (If more dense than uniform spacing, dense cell.) 76% accuracy. This is the same accuracy as GV but without and gradient optimizations. GWT = Tb,AvD ≡ b * (1/ AvD)1/dim (b=parameter, e.g., b=n; AvD= count / k=1..dim (maxk-mink) ) It may be that the pipe is at the edge of a cluster so we try to find the center of the cluster as follows: a. increase the radius of the annular gap to r1, where Count 1st falls precipitously (reached an edge of cluster). b. Increase radius further until it changes precipitously again at r2 (If it falls, we are leaving our cluster. If it rises, we are entering another cluster on the "found edge" side of our cluster). c. Let C1 = VoM of the annulus between r1-T and r1. d. Find center, C2 = (C0+C1)/2 e. Find the first spherical gap with center, C3= Avg(e1, e2), where e1, e2 are the pts at the ends of theC0C1C2 pipe of radius at least, r3=|e1-e2|/2 C1^^^ Next we compare the full MDC alg, do L(mn-vom) On the L/6 pipe, find C2 as left. L(2 means of those annulii)
10
f=p1 and xofM-GT=23. First round of finding Lp gaps
FAUST CLUSTER-fmg: O(logn) pTree method for finding P-gaps: P ≡ ScalarPTreeSet( c o fM ) X x1 x2 p p p p p p p p p pa 13 4 pb 10 9 pc 11 10 pd 9 11 pe 11 11 pf 7 8 xofM 11 27 23 34 53 80 118 114 125 110 121 109 83 p6 1 p5 1 p4 1 p3 1 p2 1 p1 1 p0 1 p6' 1 p5' 1 p4' 1 p3' 1 p2' 1 p1' 1 p0' 1 f= OR between gap 2 and 3 for cluster C2={p5} p6' 1 p6 p4' 1 p4 p5' 1 p5 p3' 1 p3 width=23=8 gap: [ , ]=[0,8) width=23 =8 gap: [ , ] =[40,48) width=23 =8 gap: [ , ] =[56,64) width = 24 =16 gap: [ , ]= [64,80) width= 24 =16 gap: [ , ]=[88,104) OR between gap 1 & 2 for cluster C1={p1,p3,p2,p4} between 3,4 cluster C3={p6,pf} No zero counts yet (=gaps) Or for cluster C4={p7,p8,p9,pa,pb,pc,pd,pe}
11
Oblique FAUST Barrel (OFB) Clustering on
Concrete(STrength,ConcreteMix,WAter,FineAggregate, AGgregate) (Cluster CM,WA,FA,AG. Assess error w L:ST<40 M:41<ST<60) L=2 r Ct Gp 86 2 Alg: One OFL round, then one OFB but In OFB, if least radius is not 0 set p=first pt. Separate at all large r-gaps. 3. If an R-ring has 1 point outlier, else analyze R-ring further. M=1 (x-p)od/4 Ct Gap>=3 L=4 min=p max=q T=12 d ST CM WA FA AG L=1 86 1 L=5 M=2 M=1 L=1 L=1 M=1 L=2 M=1 r Ct Gp 91 2 L=6 M=8 H=4 Distances L outlir L outlir doubleton M ouliers L=3 M=2 H=4 L=3 M=1 H=4 L=17 M=8 H=4 L=6 M=2 H=4 Distances singleton ouliers r Ct Gp L=3 M=1 H=4 L=1 L=3 M=2 H=4 L=1 M=1 H=2 Distances singleton ouliers both H outliers L=1 M=1 H=1 L=1 M=1 H=2 Distances L8 H2 L=1 M=1 H=1 2Ms, H outliers r Ct Gp outliers L=5 M=2 H=1 outliers outliers M8 H7 r/2 Ct Gp all outliers outlier M=2 L=1 M=3 outliers L=1 M=5 M=2 H9 Dist all outliers L=4 M=12 H=21 L=38 M=35 H=32 Dist all outliers L=4 M=2 H=2 L=18 M=26 H=28 L=18 M=3 H=4 r/5 Ct Gp 99 2 L=1 M=9 H=13 R CtGp 11 3 1 12 7 1 13 6 1 1410 1 15 4 1 16 9 1 17 1 No gaps M=3 H=3 (21 between M's and H's) L=1 M=1 H=3 (M and H's separate but d(L,M)=4 outlier L=1 M=1 H=3 M=1 outlier L=2 M=9 H=13 So 4 errors out of 150 = 98% accuracy. But those "errors" are pts close together, not separable. L=2 M=9 H=4 all outlier L=2 M=3 H=1 outliers L=2 M=15 H=23 H=1
12
OFB on Abalone(Rings,Length,diameter,height,Shell)
p=vom, q=mean L*300 Ct Gap>6,7 gaps only at end p=vomK, q=mnK L*100 Ct Gap>2.3 C6=4 p=vom q=mean ring len diam heig Shell 16 outlier C6=3 C7=1 C9=1 13 outlier 15=1 16=1 =C outlier outlier =C =C 12 outlier 16 outlier
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.