Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ld,p  (X-p)od = Xod - pod Ld  Xod

Similar presentations


Presentation on theme: "Ld,p  (X-p)od = Xod - pod Ld  Xod"— Presentation transcript:

1 Ld,p  (X-p)od = Xod - pod Ld  Xod
FAUST Analytics X(X1..Xn)  Rn, |X|=N. If X = classifier TrainingSet w classes C={C1..CK} then X=X((X1..Xn,C}. d=(d1..dn), p=(p1..pn)Rn. Functionals, F:RnR, F=L, S, R : PTSSPTS. Ld,p  (X-p)od = Xod - pod Ld  Xod Sp  (X-p)o(X-p) = XoX + Xo(-2p) + pop = L-2p + XoX + pop Rd,p  Sp - L2d,p = XoX+L-2p+pop-(Ld)2-2pod*Xod+(pod)d2 = L-2p-(2pod)d - (Ld)2 + XoX + pop+(pod)2 FAUST Hull Classifying An unclassified yCk iff yHullk {z | minFCk-  F(z)  maxFCk+} for as many Fs as possible. ( depends on the thoroughness of the TrainingSet. We call minFCk- and maxFCk+ the Hull Cut Points (HCPs). HCPs at precipitous count changes in F, give multiple hulls per class, facilitating recursive hulling of non-convex classes. FAUST Hull Classifying can be preceded by an attribute selection step. FAUST Clustering Starting with 1 cluster C=X, and until a stop condition (e.g., cluster density > threshold), recursively cut C at each F-gap (e.g., at the midpoint or adjusted based on, e.g., subcluster variance) using a different F in each recursion step. PrecipitousCountChange gaps can be used instead of Value gaps to deal with suspected aberrations. FAUST Top K Outlier Detector Use rankn-1Sx Mark Wed 11/26 Related: dominant attributes may exist in only some classes. We must be factored in when ascribing weight/value to an attribute. Mark Wed 11/26 9am We’re adjusting the midpoint as well based on cluster deviation. This gives us an extra 4 percentage points or so accuracy over straight midpoint. The hull is interesting case, as we are looking at situations like this. We are already able to predict which members are poor matches to a class. Mark Tue 11/25/14 5PM FYI, some updated results in text classification. FAUST in and of itself is capable of accuracy as high as anything out there. Using the famous Stanford_newsgroup dataset (7,500 docs).  plain Jane FAUST got 80% Extra boost by eliminating any term that appears in <= 2 TrainingSet docs and by using chi-squared to reduce attributes a further 20% (pick 80% most important attributes). By processing vertically, we can toss attributes easily before we expend a lot of CPU.  If we toss them intelligently, we improve accuracy and reduce classification time!  We eliminated about 70% of the attributes from the TestSet and achieved better accuracy than the classifiers referenced on Stanford Natural Language Processing site!!   We’re exploring other approaches to further identify the critical attribute. We are about to turn this loose on datasets approaching 1TB in size.

2 There is something wrong here. This does not find all maximal cliques.
WP Wed 11/26 Yes, we have discovered also that one has to think about the quality of the training set.   If it is very high quality (expected to fully encompass all borderline cases of all classes) then using exact gap endpoints is probably wise, but if there is reason to worry about the comprehensiveness of the training set (e.g., when there are very few training samples - which is often the case in medical expert systems where getting a sufficient number of training samples is difficult and expensive), then it is probably better to move the cutpoints toward the midpoint (reflecting the vagueness of training set class boundaries).  What does one use to decide how much to move away from the endpoints?  That's not an easy question.  Cluster deviation seems like a useful measure to employ. One last though on how to decide whether to cut at gap midpoint, endpoints, or to move the cut-points away from the endpoints toward the midpoint, If one has a time-stamp on training samples, one might assess the "class endpoint" change rate over time. As the training set gets larger and larger, if an endpoint stops moving much and isn't an outlier, then cutting at the endpoint seems wise.   If an endpt is still changing a lot, then moving away from that endpoint seems wise (maybe based on the rate of change of that endpoint as well as other measures?). A complete subgraph is a clique. A maximal clique is not a proper subset of any other clique. In G=(X,Y,E), a bipartite graph, a clique (Sx, Sy) is a complete bipartite subgraph induced by bipartite vertex set (Sx, Sy). The Consensus Set or clique of Sx, CLQ(Sx) = xSxNy(x), i.e., the set of all y's that are adjacent (edge connected) to every x in Sx. Clearly, (Sx, CLQ(Sx)) is a clique. Thm1: (Sx, Sy) is a maximal clique iff Sy=CLQ(Sx) and Sx=CLQ(Sy) Thm2:  SyY s.t. CLQ(Sy) ( CLQ(Sy), CLQ(CLQ(Sy)) ) is maximal. Find all cliques starting with Sy=singletons. Then examine Sy1y2-doubletons s.t. Px(Sy1y2) Then tripletons etc. Examining MGRs, (x=docs, y=words) all singleton wordsets, Sy, form a nonempty clique. AND pairwise to find all nonempty doubleton wordset cliques, Sy1y2. AND those nonempty doubleton wordset with each other singleton wordset to find all nonempty tripleton wordset cliques, Sy1y2y3... Start w singleton docs, incl another... until . The last nonempty set is a max-clique and all subsets are cliques. Remove them. Iterate. 2 37 w57 2 46 w45 2 47 w57 w57 w4 w7 w10 w13 w24 w42 w44 w4 w13 w7 w10 w13 w42 4 8 w25 4 29 w2 4 30 w2 4 35 w25 4 39 w2 4 46 w2 w25 4 50 w25 w25 w2 w44 w42 w32 w12 w19 #CLQs #docs #words 13 2 2 1 6 1 7 5 1 9 4 1 23 3 1 48 2 1 1 8 w58 1 14 w21 1 17 w49 1 23 w52 1 28 w52 1 30 w49 1 41 w52 1 46 w49 1 48 w52 1 8 none 1 14 none w49 w52 3 13 w51 3 29 w8 3 46 w51 3 47 w8 w51 w8 w44 w38 w17 w26 w38 w34 w34 w38 w44 w59 w15 w44 w59 w16 w25 w3 w42 w35 w3 w42 w22 w5 There is something wrong here. This does not find all maximal cliques. Next I try the following logic: Find all 1WdC (1 Word Cliques). A kWdC contains each of k (k-1)WdCs, so of a (k-1) wordset is not the wordset of a clique than none of its supersets are either (downward closure property). Thus, the wordset of any 2WdCs can be composed by unioning the wordsets of two 1WdCs and any k WdC wordset is the union o f a (k-1)WdC wordset with a 1WdCwordset. w22 w10 w47 w4 w13 w54 w51 w47 w54 w47 w55 w48 w6 w52 w43 w53 w10 w4 w54 w47 w44 w42 w38 w18 w56 w28 w20 w9 w52 w22 w50 w31 w3 w60 w1 w36 w30 w43 w53 w28 w8 w2 w48 w13 w29 w41 w57 45 w57 46 w27 w23 w24 w2 w49 w33 w17 w40 w45 w34 w7 w39 w2 w18 w9 w45 w25 48 49 w22 w27 w52 50 36 42 43 44

3 1ct> v [ DC WC [ 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 6 d 0 d 0 d 3 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d 0 d AND with w1 & w & w & w & w

4 FAUST Clustering1 2-1 separates 7,50 2-2 separates.27s
L-Gap Clusterer Cut, C, mid-gap (of F&C) using next (d,p) from dpSet, where F=L|S|R 2-1 separates 7,50 2-2 separates.27s D=d35 0 d26 0 d1 0 d27 0 d3 0 d44 0 d16 0 d6 0 d17 0 d47 0 d18 0 d10 0 d43 0 d12 0 d33 0 d14 0 d23 0 d49 0 d25 0 d45 0 d2 0 d29 0 d13 0 d9 0 d32 0.27 d28 0.27 d41 0.27 d42 0.27 d30 0.27 d21 0.27 d22 0.27 d15 0.27 d36 0.27 d11 0.27 d38 0.27 d46 0.27 d5 0.27 d8 0.27 d37 0.27 d48 0.27 d39 0.27 d4 0.55 d50 0.55 d7 3.60 d35 35, 7, 50 outliers 2^? D=.27s 0 d9 0 d49 0 d45 0.09 d6 0.09 d3 0.09 d33 0.09 d18 0.09 d44 0.18 d43 0.18 d25 0.18 d22 0.18 d12 0.18 d16 0.18 d2 0.27 d27 0.27 d23 0.27 d42 0.27 d15 0.27 d13 0.27 d47 0.36 d26 0.36 d29 0.36 d36 0.46 d38 0.46 d14 0.46 d48 0.46 d8 0.46 d10 0.46 d37 0.55 d32 0.55 d1 0.55 d5 0.64 d21 0.64 d4 0.64 d11 0.64 d17 0.92 d30 1.01 d41 1.01 d28 1.10 d39 1.29 d46 {28,30,39,41,46} Cluster D=.64s 0 d26 0 d33 0 d3 0 d27 0 d45 0 d2 0 d44 0 d23 0 d9 0 d15 0 d49 0 d16 0 d38 0 d6 0 d18 0 d22 0.25 d1 0.25 d37 0.25 d43 0.25 d8 0.25 d29 0.25 d25 0.25 d42 0.25 d12 0.25 d47 0.25 d48 0.51 d32 0.51 d14 0.51 d4 0.51 d36 0.51 d13 0.51 d5 0.77 d10 1.03 d11 1.29 d17 1.54 d21 the 0's, .25s, .51s are clusters. d10, d11, d17, d21 outliers Going back to D=d35, how close does HOB come? 21, 20 separate 35 C1 (.17  xod  .25)={2,3,6,16,18,22,42,43,49} D=sum of all C31docs 0.63 d17 0.63 d29 0.63 d11 0.84 d50 0.84 d13 0.84 d30 0.95 d26 0.95 d28 0.95 d10 0.95 d41 1.16 d21 C311(..63) ={11,17,29} C312(.84) ={13,30,50} C313(.95) ={10,26,28,41} 21 outlier C2 (.34  xod  .56)={1,4,5,8,9,12,14,15,23,25,27,32,33,36,37,38,44,45,47,48} C3 (.64xod.86)={10,11,13,17,21,26,28,29,30,39,41,50} Single: 46 (xod=.99); 7 (=1.16); 35 (=1.47) D=sum of allC2docs 0.27 d23 0.36 d25 0.36 d4 0.36 d38 0.45 d15 0.45 d33 0.45 d12 0.45 d36 0.54 d8 0.54 d44 0.54 d47 0.63 d1 0.63 d37 0.63 d5 0.63 d32 0.63 d50 0.72 d27 0.72 d45 0.72 d9 0.81 d14 Next, on each Ck try D=Ck, Thres=.2 D=sum of all C1docs 0.42 d16 0.42 d2 0.42 d3 0.42 d42 0.42 d43 0.42 d22 0.63 d18 0.63 d49 0.85 d6 C11(xod=.42)={231622,42,43} 6,18,49 outliers D=sum of all C11docs 0.57 d2 0.57 d3 0.57 d16 0.57 d22 0.57 d42 0.57 d43 D=sum of all C3docs 0.56 d11 0.66 d17 0.66 d29 0.75 d13 0.85 d30 0.85 d10 0.94 d28 0.94 d26 0.94 d41 0.94 d50 1.03 d21 1.41 d39 C31(.56xod1.03) ={10,11,13,17,21,26,28,29,30,41,50} 39 outlier Other Clustering methods later D=44docs GT=.08 0.17 d22 0.17 d49 0.21 d42 0.21 d2 0.21 d16 0.25 d18 0.25 d3 0.25 d43 0.25 d6 0.34 d23 0.34 d15 0.34 d44 0.34 d38 0.34 d25 0.34 d36 0.38 d33 0.38 d48 0.38 d8 0.43 d4 0.43 d12 0.47 d47 0.47 d9 0.47 d37 0.51 d5 0.56 d1 0.56 d32 0.56 d45 0.56 d14 0.56 d27 0.64 d10 0.64 d17 0.64 d21 0.64 d29 0.64 d11 0.69 d26 0.69 d50 0.69 d13 0.73 d30 0.77 d28 0.82 d41 0.86 d39 0.99 d46 1.16 d7 1.47 d35 C11: 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 22. Had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a little garters to tie his little hose. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. C2: 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 14. If all seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into great sea, what a splish splash it would be! 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. C311: 11. One misty moisty morning when cloudy was weather, I met an old man clothed all in leather. He began to compliment and I began to grin. How do And how do? And how do again 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. C312: 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! C313: 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three.

5 FAUST Cluster 1.2 real HOB Alternate WS0, DS0
DS1= | WS1= 46 | DS2 46 OUTLIER: 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. real HOB Alternate WS0, DS0 DS0=|WS1= 35 |---| |DS2| |35 | OUTLIER: 35. Sing a song of sixpence, a pocket full of rye. 4 and 20 blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. Queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. WS0= DS1 |WS1= 42(Mother) 7 9 |DS2|WS2=WS1 11 |7 27 | 29 C1: Mother theme 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. DS0|WS 1 |DS1| WS 10 |10 | DS2| WS3 13 | | 10 | DS3 10 OUTLIER: 10. Jack and Jill went up hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. WS DS1 WS1= {fiddle(32 41) man(11 32) old(11 44) 11 DS2 32 11 41 22 44 C2 fiddle old man theme 11. One misty moisty morning when cloudy was weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do How do you do? How do you do again 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I'd give my fiddle they will think I've gone mad. For many a joyous day my fiddle and I have had 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. DS0| WS1= 1 | DS1|WS2= 13 | 39 |DS2 14 | |39 OUTLIER: 39. A little cock sparrow sat on a green tree. He chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie. Oh no, says the sparrow I will not make a stew. So he flapped his wings\,away he flew WS DS1 WS1= 38 52 5 17 23 28 36 48 C3: men three 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. DS0|WS 13 |DS2|WS 14 |13 |DS3 13 OUTLIER: 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. C4: 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not k 21. Lion and Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 26. Sleep baby sleep. Our cottage valley is deep.Little lamb is on green with woolly fleece so soft, clean. Sleep baby sleep. Sleep baby sleep, down where woodbines creep. Be always like lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! WS0= DS1|WS1(17wds)= 4 6 8|DS2=DS1 DS0|WS1= 2 3|DS2=DS1 Each of the 10 words occur in 1 doc, so all 5 docs are outliers OUTLIERS: 2. This little pig went to market. This little pig stayed home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 22. Had little husband no bigger than my thumb. Put him in a pint pot, there I bid him drum. Bought a little handkerchief to wipe his little nose, pair of little garters to tie little hose 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. OUTLIER: 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. Notes Using HOB, the final WordSet is the document cluster theme! When the theme is too long to be meaningful (C4) we can recurse on those (using the opposite DS)|WS0?). The other thing we can note is that DS) almost always gave us an outliers (except for C5) and only WS) almost always gave us clusters (excpt for the first one, 46). What happens if we reverse it? What happens if we just use WS0?

6 FAUST Cluster 1.2.1 real HOB Alternate WS0, DS0 recuring on C3 and C4
C Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. DS0|WS1= (on C4) 21|DS2 WS2=41(morn) 57(way) 26| 37 DS3=DS2 30| 47 . C4.2.1 word47(plum) 21. Lion &Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake sent them out of town. 50. Little Jack Horner sat in corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! DS0|WS1= 47 (plum) 21 DS2 WS2=WS1 26 21 30 50 50 WS0= DS1|WS1= 4 DS2 WS2= DS3 WS3=WS2 49 50 Final WordSet is too long. Recurse 4.2 WS0= DS1|WS1 = 4 |DS2 WS2= 8 |4 DS3 WS3= 12 |8 8 DS4 WS4= 25 | DS5 WS544 59 26 | DS6=DS5 30 | C word44(old) word59(woman) 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 25. There was old woman. What do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. Doc26 and doc30 have none of the 12 words in commong so these two will come out outliers on the next recursion! OUTLIERS: 26. Sleep baby sleep. Cottage valley is deep.Little lamb is on green with woolly fleece soft, clean. Sleep baby sleep. Sleep baby sleep, down where woodbines creep. Be always like lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 30. Hey diddle diddle! Cat and the fiddle. Cow jumped over moon.Little dog laughed to see such sport, and dish ran away with spoon. DS0|WS1= 26 |DS1=DS0 30 OUTLIER: 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. WS0= DS WS1=5 22 DS2 6 C4.2.3 (day eat girl) Little Miss Muffet sat on tuffet, eating curd, whey. Came big spider, sat down beside her, frightened Miss Muffet away 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 18. I had 2 pigeons bright and gay. They flew from me other day. What was the reason they did go? I can not tell, for I do not know. 33. Buttons, farthing pair! Come who will buy them? They are round, sound, pretty, fit for girls of city. Come, who will buy ? Buttons, farthing a pair 49. There was little girl had little curl right in the middle of her forehead. When she was good she was very good and when she was bad she was horrid. DS0|WS1= 4 DS2 =WS1 8 |4 8 15|15 18 Recursing 18|33 49 no change Doc43 and doc44 have none of the 6 words in commong so these two will come out outliers on the next recursion! OUTLIERS: 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. recurse on C3: C31 [21]cut [38]men [49]run 1. Three blind mice! See how run! All ran after farmer's wife, cut off tails with carving knife. Ever see such thing in life as 3 blind mice? 14. If all seas were 1 sea, what a great sea that would be! And if all trees were 1 tree, what a great tree that would be! And if all axes were 1 axe, what a great axe that would be! if all men were 1 man what a great man he would be! And if great man took great axe and cut down great tree and let it fall into great sea, what a splish splash that would be! 17. Here sits Lord Mayor. Here sit his 2 men. Here sits the cock. Here sits hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! DS0=|WS1= 1 |DS1 |WS2= 14 |1 |DS3=DS2 17 |14 28 |17 C32: [38]men [52] three 5. Humpty Dumpty sat on wall. Humpty Dumpty had great fall. All Kings horses, all Kings men cannot put Humpty Dumpty together again. 23. How many miles to Babylon? 3 score miles and 10. Can I get there by candle light? Yes, back again. If your heels are nimble, light, you may get there by candle light. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. WS0=38 52 DS1|WS1=WS0 5 |

7 What do we want in bioinformatics? (cliques, strong clusters, ...???)
FAUST Cluster 1.2.2 HOB Alternate WS0, DS0 16 OUTLIERS: Categorize clusters (hub-spoke, cyclic, chain, disjoint...)? Separate disjoint sub-clusters? Each of the 3 C423 words gives a disjoint cluster! Each of the 2 C32 work gives a disjoint sub-clusters also. C day 15. Great A. little a. This is pancake day. Toss ball high. Throw ball low. Those come after sing heigh ho! 18. I had 2 pigeons bright and gay. They flew from me other day. What was reason they go? I can not tell, I do not know. 15 18 day C eat 4. Little Miss Muffet sat on tuffet, eat curd, whey. Came big spider, sat down beside her, frightened away 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. 4 8 eat C4233 girl 33. Buttons, farthing pair! Come who will buy them? They are round, sound, pretty, fit for girls of city. Come, who will buy ? Buttons, farthing a pair 49. There was little girl had little curl right in the middle of her forehead. When she was good she was very good and when she was bad she was horrid. 33 49 girl C1: mother 7. Old Mother Hubbard went to cupboard to give her poor dog a bone. When she got there cupboard was bare, so poor dog had none. She went to baker to buy some bread. When she came back dog was dead. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. 11 32 41 men fiddle old C2: fiddle old men {cyclic} misty moisty morning when cloudy was weather, Chanced to meet old man clothed all leather. He began to compliment,I began to grin. How do you do How do? How do again 32. Jack come give me your fiddle, if ever you mean to thrive. No I'll not give fiddle to any man alive. If I'd give my fiddle they will think I've gone mad. For many joyous day fiddle and I've had 41. Old King Cole was merry old soul. Merry old soul was he. He called for his pipe, he called for his bowl, he called for his fiddlers 3. And every fiddler, had a fine fiddle, a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. C11 cut men run {cyclic} 1. Three blind mice! See how run! All ran after farmer's wife, cut off tails with carving knife. Ever see such thing in life as 3 blind mice? 14. If all seas were 1 sea, what a great sea that would be! And if all trees were 1 tree, what a great tree that would be! And if all axes were 1 axe, what a great axe that would be! if all men were 1 man what a great man he would be! And if great man took great axe and cut down great tree and let it fall into great sea, what a splish splash that would be! 17. Here sits Lord Mayor. Here sit his 2 men. Here sits the cock. Here sits hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 17 1 14 run cut men C321 men 5. Humpty Dumpty sat on wall. Humpty Dumpty had great fall. All Kings horses, all Kings men can't put Humpty together again. 36. Little Tommy Tittlemouse lived in little house. He caught fishes in other mens ditches. 5 36 men C322 three 23. How many miles to Babylon? 3 score 10. Can I get there by candle light? Yes, back again. If your heels are nimble, light, you may get there by candle light. 28. Baa baa black sheep, have any wool? Yes sir yes sir, 3 bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 23 28 48 three C4.1 morn way 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on cold and frosty morn. This is way wash our hands, wash our hands, wash our hands. This is way wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash r clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 37 47 morn way C421 plum 21. Lion &Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake sent them out of town. 50. Little Jack Horner sat in corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! C422 old woman 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 25. There was old woman. What do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 12 25 old woman Let's pause and ask "What are we after?" Of course it depends upon the client. 3 main categories for relatioinship mining? text corpuses, market baskets (includes recommenders), bioinformatics? Others? What do we want from text mining? (anomalies detection, cliques, bicliques?) What do we want from market basket mining? (future purchase predictions, recommendations...) What do we want in bioinformatics? (cliques, strong clusters, ...???)

8 FAUST Cluster 1.2.3 word-labeled document graph
26 always 29 4 away 30 39 46 9 baby 27 45 7 13 23 back 6 49 bad 50 28 boy 35 bake 3 bed 21 bread 18 44 bright 10 brown 33 buy back buy 42 12 child 8 clean 11 cloth 37 crown 47 cry 38 1 14 cut 15 day 32 dish dog 36 eat fall 5 fiddle 41 full girl green high hill 43 house king lady lamb maid men merry money morn way mother 22 nose old 25 pie 17 pig plum town plum 16 bag round cock 2 run sing son 48 three town tree two wife thumb woman FAUST Cluster 1.2.3 word-labeled document graph 17 1 14 run cut men We have captured only a few of the salient sub-graphs. Can we capture more of them? Of course we can capture a sub-graph for each word, but that might be 100,000. Let's stare at what we got and try to see what we might wish we had gotten in addition. 48 23 28 three 50 21 plum 36 5 men 49 33 girl 29 9 27 45 7 mother men 32 fiddle 41 old 11 37 47 morn way day A bake-bread sub-corpus would have been strong. (docs{ ) A bake-bread sub-corpus would have been strong. (docs{ ) There are many others. eat 8 4 Using AVG+1 d d d d d 12 25 old woman

9 HOB2 Alt (use other HOBs)
26 always 29 4 away 30 39 46 9 baby 27 7 13 23 back 6 49 bad 50 28 boy 35 bake 3 bed 21 bread 18 44 bright 10 brown 33 buy back buy 42 12 child 8 clean 11 cloth 37 crown 47 cry 38 1 14 cut 15 day 32 dish dog 36 eat fall 5 fiddle 41 full girl green high hill 43 house king lady lamb maid men merry money morn way mother 22 nose old 25 pie 17 pig plum town plum 16 bag round cock 2 run sing son 48 three town tree two wife thumb woman FAUST Cluster 1.2.4 HOB2 Alt (use other HOBs) wAvg+1, dAvg+1 a b b e p p w o r a i l a y e t e u y a m d d d d d d 35 21 bread plum 50 39 pie boy 46 away eat recurse: wAv+2,dAvg-1 e p a i t e d d d d 39 46 35 50 pie eat And if we want to pull out a particular word cluster, just turn the word-pTree into a list.: 12 child old woman w=boy a b w o a y 2 9 d d d w=baby a b w a b a y 2 3 d d d d For a particular doc cluster, just turn the doc-pTree into a list: 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. c o w h l o i d m l a d n d 45 baby 26 9 27 39 50 28 boy

10 FAUST HULL Classification 1
Using the clustering of FAUST Clustering1 as classes, we extract 80% from each class as TrainingSet (w class=cluster#). How accurate is FAUST Hull Classification on the remaining 20% plus the outliers (which should be "Other"). Use Lpd, Sp, Rpd with p=ClassAvg and d=unitized ClassSum. C11={2,3,16,22,42,43} C311= {11,17,29} C312={13,30,50} C313={10,26,28,41} C2 ={1,4,5,8,9,12,14,15,23,25,27,32,33,36,37,38,44,45,47,48} OUTLIERS {18,49} {6} {39} {21} {46} {7} {35} Full classes from slide: FAUST Clustering1 C11={2,16,22,42,43} C311= {11,17} C312={30,50} C313={10,28,41} C2 ={1,5,8,9,12,15,25,27,32,33,36,37,38,44,47,48} 80% Training Set C11={3} C311= {29} C312={13} C313={26} C2 ={4,14,23,45} O={ } 20% Test Set D1=TS p=avTS Lpd MIN MAX CLASS C11 C2 C311 C312 C313 C313 C11 C2 .572 C311 C312 D11=C11 p=avC11 L MIN MAX CLASS C11 0 .63 C2 C311 C312 0 .31 C313 .63 C11 C311 C313 C2 .31 C312 .66 C311 D2=C2 p=avC2 L MIN MAX CLS 0 .22 C11 C2 C311 C312 C313 C11 C312 C313 C2 D311=C311 p=avC311 L MN MX CLAS C11 C2 C311 C312 C313 C311 C11 0 .33 C312 C313 C2 1.58 C312 D312=C312 p=avC312 L MN MX CLAS C11 C2 C311 C312 C313 .31 C11 .31 C2 .31 C311 .31 C313 D313=C313 p=avC313 L MN MX CLAS C11 C2 C311 C312 C313 C11 C2 C311 C313 .22 C312 All 6 class hulls separated using Lpd, p=CLavg, D=CLsum. D311 separates C311, D312 separates C312 and D313 separates C313 from all others. D2 separates C11 and C2. Now, remove some false positives with S and R using the same p's and d's: D1=TS p=avTS Sp C 1.9 C C 2.4 C 4.6 C D11=C11 p=avC11 Sp [1.6]C [ ]C [ ]C313 [ ]C2 [5]C312 D2=C2 p=avC2 Sp [2 2.3]C [ ]C313 [ ]C [5 5.1]C312 [ ]C311 D313=C313 p=avC313 Sp [ ]C [6.5]C312 [ ]C2 [ ]C311 [ ]C313 D311=C311 p=avC311 Sp [1.2]C [4.2]C11 [ ]C312 [ ]C2 [ ]C313 D312=C312 p=avC312 Sp [ ]C11 [ ]C313 [ ]C2 [2.5]C [5.5]C311 Sp removes a lot of the potential for false positives. (Many of the classes lie a single distance from p.) D1=TS p=avTS Rpd [ ]C11 [ ]C2 [ ]C311 [2.1]C312 [ ]C313 D11=C11 p=avC11 Rpd [1.2]C11 [ ]C2 [ ]C311 [2.2 2.]]C312 [ ]C313 D2=C2 p=avC2 Rpd [ ]C11 [ ]C2 [ ]C311 [2.2]C312 [ ]C313 D311=C311 p=avC311 Rpd [1.4]C11 [ ]C2 [1.1]C311 [2.2]C312 [ ]C313 D312=C312 p=avC312 Rpd [ ]C11 [ ]C2 [ ]C311 [1.5]C312 [ ]C313 D313=C313 p=avC313 Rpd [ ]C11 [ ]C2 [ ]C311 [2.2]C312 [ ]C313 Rpd removes even more of the potential for false positives.

11 FAUST Hull Classification 2 (TESTING)
D1=TS p=avTS Lpd [ ]C313 [ ]C11 [ ]C2 [.57]C311 [ ]C312 D1=TS p=avTS Sp [ ]C313 [ ]C11 [ ]C2 [ ]C311 [ ]C312 D1=TS p=avTS Rpd [ ]C11 [ ]C2 [ ]C311 [2.1]C312 [ ]C313 C11={3} C311= {29} C312={13} C313={26} C2 ={4,14,23, 45} O={ } Test Set D11=C11 p=avC11 Lpd [.63]C11 [0]C311 [ ]C313 [ ]C2 [.31]C312 D11=C11 p=avC11 Sp [1.6]C [ ]C [ ]C313 [ ]C2 [5]C312 D11=C11 p=avC11 Rpd [1.2]C11 [ ]C2 [ ]C311 [2.2 2.]]C312 [ ]C313 [.66] C311 D2=C2 p=avC2 Lpd [ ]C11 [ ]C312 .[44 .66]C313 [ ]C2 D2=C2 p=avC2 Sp [2 2.3]C [ ]C313 [ ]C [5 5.1]C312 [ ]C311 D2=C2 p=avC2 Rpd [ ]C [ ]C313 [ ]C2 [ ]C311 [2.2]C312 D311=C311 p=avC311 Lpd [ ]C311 [0]C11 [0 .33]C312 [0 .33]C313 [ ]C2 D311=C311 p=avC311 Sp [1.2]C [4.2]C11 [ ]C312 [ ]C2 [ ]C313 D311=C311 p=avC311 Rpd [1.4]C11 [ ]C2 [1.1]C311 [2.2]C312 [ ]C313 1.58 C312 D312=C312 p=avC312 Lpd .31 C11 .31 C2 .31 C311 .31 C313 D312=C312 p=avC312 Sp [ ]C [ ]C313 [ ]C2 [2.5]C [5.5]C311 D312=C312 p=avC312 Rpd [ ]C [ ]C313 [ ]C2 [ ]C311 [1.5]C312 D313=C313 p=avC313 Lpd [0 .22]C11 [ ]C2 [ ]C311 [ ]C313 [.22]C312 D313=C313 p=avC313 Sp [ ]C [6.5]C312 [ ]C2 [ ]C311 [ ]C313 D313=C313 p=avC313 Rpd [ ]C11 [ ]C2 [ ]C311 [2.2]C312 [ ]C313 D=TS Rpd Sp Lpd trueCL Predicted____CLASS Final R S L predicted d Oth Other d Other d14 Oth Other d23 2| |11 Oth Other d29 Oth | Other d Other d26 Oth Oth Other d6 2| Other d Oth Other d18 2| |11 Oth Other d Oth Oth Other d35 Oth Oth Oth Other d39 Oth Oth Other d46 Oth Oth Other d Oth Other 8/15 = 53% correct just with D=TS p=AvgTS Note: It's likely to get worse as we consider more D's. ε=.8 predicted Class 11 2 311(all 311|2 all) 312(all 312|313 a Other . Let's think about TrainingSet quality resulting from clustering. This a poor quality TrainingSet (from clustering Mother Goose Rythmes. MGR is a difficult corpus to cluster since: 1., in MGR, almost every document is isolated (an outlier), so the clustering is vague (no 2 MGRs deal with the same topic so their word use is quite different.). Instead of tightening the class hulls by replacing CLASSmin and CLASSmax by CLASSfpci (fpci=first percipitous count increase) and CLASSlpcd, we might loosen class hulls (since we know the classes somewhat arbitrary) by expanding the [CLASSmin, CLASSmax] interval as follows: Let A = Avg{ClASSmin, CLASSmax} and R (for radius) = A-CLASSmin (=CLASSmax-A also). Use [A-R-ε, A+R+ε]. Let ε=.8 increases accuracy to 100% (assuming all Other stay Other.). Finally, it occurs to me that Clustering to produce a TrainingSet, then setting aside a TestSet gives a good way to measure the quality of the clustering. If the TestSet part classifies well under the TrainingSet part, the clustering must have been high quality (produced a good TrainingSet for classification). This clustering quality test method is probably not new (check the literature?). If it is new, we might have a paper here? (discuss this quality measure and assess using different ε's?)


Download ppt "Ld,p  (X-p)od = Xod - pod Ld  Xod"

Similar presentations


Ads by Google