Download presentation
Presentation is loading. Please wait.
Published byJordan Mathews Modified over 6 years ago
1
Ld,p (X-p)od = Xod - pod And letting Ld Xod, Ld,p = Ld - pod
FAUST Analytics X(X1..Xn)Rn, |X|=N. If X is a classified training set with classes=C={C1..CK} then X=X((X1..Xn,C}. In either case d=(d1..dn), p=(p1..pn)Rn. We have functionals, F:RnR, F=L, S, R (We think of these as mapping n-vectors to 1-vectors of numbers - or in terms of bit columns (compressed or not), of mappings from a PTS to a SPTS). Ld,p (X-p)od = Xod - pod And letting Ld Xod, Ld,p = Ld - pod Sp (X-p)o(X-p) = XoX + Xo(-2p) + pop = L-2p + XoX + pop Rd,p Sp - L2d,p = XoX+L-2p+pop-(Ld)2-2pod*Xod+(pod)d2 = L-2p-(2pod)d - (Ld)2 + XoX + pop+(pod)2 Fmind,p,k min(Fd,p&Ck) = minFd,p,k where Fd,p,k = Fd,p & Ck Fmaxd,p,k max(Fd,p&Ck) = maxFd,p,k XoX can be pre-computed, one time. FPCCd,p,k,j jth precipitous count change (from left-to-right) of Fd,p,k. Same notation for PCIs and PCDs (incr/decr) GAP: Gap Clusterer If DensityThreshold, DT, isn't reached, cut C mid-gap of Ld,p&C using the next (d,p) from dpSet PCC: Precipitous Count Change Clusterer If DT isn't reached, cut C at PCCsLd,p&C using the next (d,p) from dpSet Fusion step may be required? Use density, proximity, or use Pillar pkMeans (next slide). TKO: Top K Outlier Detector Use rankn-1Sx for TopKOutlier-slider. LIN: Linear Classifier yCk iff yLHk {z | minLd,p,k Ld,p,k(z) maxLd,pd,k} (d,p)dpSet LHk is a Linear hull around Ck. dpSet is a set of (d,p) pairs, e.g., (Diag,DiagStartPt). LSR: Linear Spherical Radial Classifier yCk iff yLSRHk{z | minFd,p,k Fd,p,k(z) maxFd,p,k d,pdpSet, F=L,S,R} (Examine and remove outliers first, then use first PCI instead of min and last PCD instead of max?) Express the Hulls as decision trees, one for every d. Then y isa k iff y isa k in every d-tree. Build each d-tree using Ld at the root and then from any multi-class inode use F=L,R,S with d=AvCiAvCj and p=AvCi distinct pair Ci, Cj, where Ci,Cj have nonempty restrictions at that node, using every F=L,S,R except the parent. This assumes convex classes. If it's known/suspected there are non-convex classes, judicious use of PCCs may provide tighter hulls. What should we pre-compute besides XoX? stats(min/avg/max/std); Xop; p=class_Avg/Med; Xod; Xox; d2(X,x); Rkid2(X,x); Ld,p, Rd,p We need a "Basic pTree Operations Timing Manual" to show users the cost of various pTree computations.
2
MG44d60w: 44 MOTHER GOOSE RHYMES with a synonymized vocabulary of 60 WORDS
1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. 21. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 22. I had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! Av: always away baby back bad bag bake bed boy bread bright brown buy cake child clean cloth cock crown cry cut day dish dog eat fall fiddle full girl green high word# df# min=2 hill house king lady lamb maid men merry money morn mother nose old pie pig plum round run sing son three thumb town tree two way wife woman wool word# df# max=6
3
Pan For Clusters or PFC method
1. Given a KWL0, find DocSet1 having any of those words (i.e., DocSet1=DocSet(KWL0) ). 2. Let KWL1=KWL(DocSet1). Let DocSet2=DocSet(KWL1). etc. Repeat 2 until it converges. If the result is all documents (and all words), start over with a different KWL PFCANY: Let KWL0=KWL(DocSet0) PFCALL: DocSet(KWL)={docs containing ALL KWL words} and KWL(DocSet)={words in all docs in DocSet} PFCANY=PFC1,1 PFCDT,WT: DocSet(KWL)={docs containing WT KWL words} and KWL(DocSet)={words in DT docs in DocSet} PFCANY w KWL0={baby}. D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 a l w y s 1 a w y 2 1 b a y 3 1 b a c k 4 1 b a d 5 1 b a g 6 1 b a k e 7 1 b e d 8 1 b o y 9 1 b r e a d 1 b r i g ht 1 b r o w n 1 2 b u y 1 3 c a k e 1 4 c h i l d 1 5 c l e a n 1 6 c l o t h 1 7 c o k 1 8 c r o w n 1 9 c r y 2 1 c u t 2 1 d a y 2 1 d i s h 2 3 1 d o g 2 4 1 e a t 2 5 1 f a l 2 6 1 f i d le 2 7 1 f u l 2 8 1 g i r l 2 9 1 g r e n 3 1 h i g 3 1 h i l 3 2 1 h o u s e 3 1 k i n g 3 4 1 l a d y 3 5 1 l a m b 3 6 1 m a i d 3 7 1 m e n 3 8 1 m e r y 3 9 1 m o n e y 4 1 m o r n 4 1 m o t h er 4 2 1 n o s e 4 3 1 o l d 4 1 p i e 4 5 1 p i g 4 6 1 p i u m 4 7 1 r o u n d 4 8 1 r u n 4 9 1 s i n g 5 1 s o n 5 1 t h r e 5 2 1 t h u m b 5 3 1 t o w n 5 4 1 t r e 5 1 t w 5 6 1 w a y 5 7 1 w i f e 5 8 1 w o m a n 5 9 1 w o l 6 1 D C
4
WORD WC 4 2 3 7 5 6 13 01TBM 02TLP 03DDD 04LMM 05HDS 06SPP 07OMH 08JSC 09HBD 10JAJ 11OMM 12OWF 13RRS 14ASO 15PCD 16PPG 17FEC 18HTP 21LAU 22HLH 23MTB 25WOW 26SBS 27CBC 28BBB 29LFW 30HDD 32JGF 33BFP 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 43HHD 44HLH 45BBB 46TTP 47CCM 48OTB 49WLG 50LJH
5
Treat the text corpus as a bipartite graph V={DOCs,WORDs}
E={(doc,word) iff worddoc }. 1st step: partition DOCs and WORDs into connected components. Carve off components (start w the component of the longest doc). WORD 35SSS DS0 04LMM 05HDS 07OMH 08JSC 11OMM 15PCD 21LAU 22HLH 28BBB 30HDD 35SSS 36LTT 37MBB 38YLS 39LCS 41OKC 42BBC 46TTP 48OTB 50LJH DS2 DS4 is all documents except 49 DS6 is all 44 documents b a k e 7 1 r d c l o t h i s 2 3 5 f u 8 n g 4 m y p WS1 D O C 6 9 10 21 30 41 a w y 2 1 b c k 4 g 6 e 7 o 9 r d n u 3 l t h 8 i s 5 f le m er p WS3 WS5 is all 60 words So there is just one connectivity component and it is the entire graph. 2nd step: In each connectivity component partition requiring incidence count 2.
6
The text corpus is one connected component.
Next partition requiring incidence count 2 (DS0 excepted of course). WORD 35SSS DS0 07OMH 35SSS 50LJH DS2 DS4 35SSS 50LJH DS5 35SSS 50LJH DS0 50LJH 35SSS 39LCS 50LJH DS2 35SSS 39LCS 50LJH DS4 39LCS DS0 39LCS 50LJH DS2 b a k e 7 1 r d c l o t h i s 2 3 5 f u 8 n g 4 m y p WS1 D O C 6 9 10 21 30 41 b a k e 7 1 t 2 5 p i 4 WS3 e a t 2 5 1 p i 4 WS5 e a t 2 5 1 p i 4 WS6 WS1 D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b o y e a t p i u m h b o y 9 1 e a t 2 5 p i 4 WS3 b o y 9 1 e a t 2 5 p i 4 WS5 a w y 2 1 b o 9 c k 8 g r e n 3 m p i 4 5 t WS1 b o y 9 1 p i e 4 5 WS3 So "incidence2 connectivity components of DS0=35SSS are DOC_Comp={35SSS, 50LJH} WORD_comp={eat, pie} Is it stable? I.e., if we start with 50LJH do we get the same result? So "incidence2 connectivity components of DS0=50LJH are DOC_Comp={35SSS, 39LCS, 50LJH} WORD_comp={boy,eat, pie} So it is not stable, but there is set containment???? Finally we start with DS0=30LCS So "incidence2 connectivity components of DS0=30LCS are DOC_Comp={39LCS, 50LJH} WORD_comp={boy, pie} There seems to be some stability here. We could carve off DC={35SSS, 39LCS, 50LJH} and WC={boy, eat, pie} because if we Start with DC or any subset as DS0 we stay in DC on convergence and if we start with WC or any subset as WS0 we stay in WC on convergence. Will this always happen? Is there never any overlap of DCs or WCs? If the answer to both questions is no, this makes a great thanksgiving clustering method (i.e., carve off the final DC and WC each time).
7
Partition requiring incidence count 2 (DS0 excepted of course).
Start with DS0=04LMM DS0 04LMM WORD DS2 04LMM 46TTP DS4 04LMM 46TTP DS0 46TTP 04LMM 30HDD 46TTP DS2 04LMM 30HDD 46TTP DS4 DS0 30HDD 30HDD 46TTP DS2 30HDD 46TTP DS4 WS1 a w y 2 1 e t 5 WS3 a w y 2 1 e t 5 WS1 a w y 2 1 c r e t 5 p i g 4 6 u n 9 s o WS3 a w y 2 1 e t 5 r u n 4 9 a w y 2 1 d i s h 3 o g 4 f le 7 r u n 9 WS1 WS3 a w y 2 1 r u n 4 9 So "incidence2 connectivity components of DS0=04LMM are DOC_Comp={04LMM 46TTP} WORD_comp={away, eat} Next start with 46TTP. So "incidence2 connectivity comps of DS0=30HDD DOC_Comp={30HDD 46TTP} WORD_comp={away, run} Next start with 30HDD. D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 So "incidence2 connectivity components of DS0=46TTP are DOC_Comp={04LMM 30HDD 46TTP} WORD_comp={away, eat, run} Next start with 30HDD. There seems to be similar stability here. We find there is WC overlap so we will carve off only the stable DCs. So we carve off the two stable document clusters, a "pie-50LJH" document cluster {35SSS, 39LCS, 50LJH} and an "away-46TTP" document cluster {04LMM 30HDD 46TTP} 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 30.Hey diddle diddle! The cat and the fiddle.The cow jumped over the moon.The little dog laughed to see such sport, and the dish ran away with spoon 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. All"incidence3 connectivity components are singletons clusters! Next we see that we get the same stable document clusters by starting with WS0=pie and WS0=away???
8
The text corpus is one connected component.
Next partition requiring incidence count 2 (DS0 excepted of course). WORD 35SSS 50LJH DS1 39LCS 35SSS 50LJH DS3 39LCS WORD DS0 04LMM 29LFW 30HDD 39LCS 46TTP DS2 30HDD 46TTP DS2 p i e 4 5 1 W S W S a w y 2 1 a w y 2 1 r u n 4 9 W S a w y 2 1 r u n 4 9 W S b o y 9 1 e a t 2 5 p i 4 WS2 D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 So "incidence2 connectivity components of WS0=pie are DOC_Comp={35SSS, 39LCS, 50LJH} WORD_comp={boy,eat, pie} So "incidence2 connectivity comps of WS0=away DOC_Comp={30HDD 46TTP} WORD_comp={away, run}
9
partition requiring incidence count 2 (DS0 excepted of course).
WORD 35SSS DS0 07OMH 35SSS 50LJH DS2 07OMH 35SSS DS4 07OMH 35SSS DS6 07OMH DS0 07OMH 13RRS 35SSS 45BBB DS2 07OMH 13RRS 35SSS 45BBB DS4 07OMH 35SSS DS1 07OMH DS3 07OMH 21LAU 36LTT DS2 13RRS DS0 13RRS 21LAU DS1 45BBB 07OMH 09HBD 27CBC 45BBB b a k e 7 1 r d c l o t h i s 2 3 5 f u 8 n g 4 m y p W S D O C 6 9 10 21 30 41 b a k e 7 1 r d i s h 2 3 p 4 5 W S b a k e 7 1 r d W S 5 b a c k 4 1 e 7 r d u y 3 o g 2 m t h er l W S b a c k 4 1 e 7 r d u y 3 m o t h er 2 W S b a c k 4 1 W S b a c k 4 1 r e d W S 2 b a c k 4 1 r e d W S b a c k 4 1 r e d W S b a c k 4 1 g r e n 3 p i u m 7 s o 5 t w p i u m 4 7 1 t o w n 5 b a y 3 1 u m o t h er 4 2 b a y 3 1 u m o t h er 4 2 So "incidence2 connectivity components of DS0=35SSS are DOC_Comp={07OMH 35SSS} WORD_comp={bake, bread} Next start with 07OMH. So "incidence2 connectivity components of DS0=07OMH are DOC_Comp=S={07OMH 13BBS 35SSS 45BBB} WORD_comp={back, bake, bread, buy, mother} Next start with bake and then bread. (no documents get 2 votes). Finally, check if 13BBS ends outside S (yes {13BBS 21LAU}) and 45BBB ends outside S {07OMH 09HBD 27CBC 45BBB} We have to expand using DS0=09OMH 13BBS 21LAU, . 07OMH
10
partition requiring incidence count 2 (DS0 excepted of course).
13RRS DS0 13RRS 21LAU DS1 13RRS 21LAU DS3 DS0 45BBB DS2 07OMH 09HBD 27CBC 45BBB DS0 W S 1 W S 1 W S 2 W S 1 W S 3 D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b a c k 4 1 g r e n 3 p i u m 7 s o 5 t w p i u m 4 7 1 t o w n 5 b a y 3 1 u m o t h er 4 2 b a y 3 1 u m o t h er 4 2 Incidence2 connectivity components of DS0=35SSS are DOC_Comp={07OMH 35SSS} WORD_comp={bake, bread} incidence2 connectivity components of DS0=07OMH are DOC_Comp=S={07OMH 13BBS 35SSS 45BBB} WORD_comp={back, bake, bread, buy, mother} Incidence2 connectivity component of WS0=bake is DOC_Comp={07OMH} and of WS0=bread there is none. Incidence2 connectivity components of DS0=13BBS are DOC_Comp={13BBS 21LAU} Incidence2 connectivity components of DS0=45BBB are DOC_Comp={07OMH 09HBD 27CBC 45BBB} We have to expand using DS0=09OMH 13BBS 21LAU, .
11
GOING BACK IN TIME 09HBD 26SBS 27CBC 45BBB DS1 WS1={1 3=baby(4) 13 42=mother(3) 60} WORD 07OMH 08JSC 12OWF 13RRS 27CBC 28BBB 29LFW 33BFP 38YLS 39LCS 44HLH 45BBB 46TTP WS2={1 2 3=baby(7) =buy(5) 20=cry(5) =mother(7) } D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b a y = WS0 1 DS1= a l w y s WS1-WS0 1 b u c h i d e n r g m o t DS2=7 8 9HBD(3) SBS(7) 27CBC(3) BBB(3) We get a clear "baby" document cluster (<26Sleep baby sleep>} and a word cluster {3=baby, 42=mother}
12
and a fairly clear "king" word cluster {king, men}
WS1={ =king(3) } WORD 05HDS 35SSS 41OKC DS1 docs DS2 docs: 35SSS(13/13) 41OKC(5/5) 5(3/3) 7(3/7) 11(3/3) 14(2/4) 28(2/6) 30(2/5) 32(2/3) 36(2/2) 39(2/7) 48(2/3) 50(2/5) + lots with 1 word. 11OMM 36LTT WS2_2={ king(3) 38men(3) 44} We get a clear "king" document cluster (<Sing a song of sixpence>, <Old King Cole>} and a fairly clear "king" word cluster {king, men} D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 k i n g WS0 1 b a k e WS1-WS0 1 r d c l o t h i s f u m n y p g
13
WS1={2 9 18 21 26 30 38 39 45 55tree(2) } WS2_2={1(2/2) 2(5/5)
WORD DS1 docs 14ASO 39LCS WS2_2={1(2/2) 2(5/5) 9(3/3) 18(3/3) 20 21(2/2) 25(3/5) 26(3/3) 30(2/2) 38(3/6) 45 49(4/4) 52 55(2/2)} We get a strong "tree" document-cluster { 14ASO(3/4) 39LCS(5/7) } and a strong "tree" word cluster {tree} 14. If all seas were one sea, what a great sea that would be! And if all trees were one tree, what a great tree that would be! And if all axes were one axe, what a great axe that would be! And if all men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 39LCS: 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. DS2-DS1: 01TBM 04LMM 05HDS 10JAJ 17FEC 26SBS 28BBB 29LFW 30HDD 46TTP 47CCM 50LJH D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 t r e 1 DS a w y 1 b o c k u t f l g r e n DS2 14(3) 39(5) + (1)s: We get weak "tree" word-cluster { 1(2/2)=always 2(5/5)=away 9(3/3)=boy (3/3)=clock (2/2)=cut (3/3)=fall (2/2)=green 49(4/4)=run }
14
WS1={ 40money(2) } WORD DS1 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 35SSS 38YLS WS2={7baby(2/3) 9boy(2/3) 10bread(2/3) 20cry(3/4) 25child(3/5) 28full(2/2) 40money(2/2) 45pie(2/3) } 07OMH 28BBB 46TTP 50LJH DS2 D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 m o n e y 40 1 DS b a k e 1 r d c l o t h y i s f u n g m p DS2 7(2/7) 28(2/6) 35SSS(13/13) 38YLS(3/3) 46(2/6) 50(2/5) We get a strong "money" document-cluster { 35SSS 38YLS } and a strong "money" word cluster {money cry}
15
WS1={ 2(3/5) 18(3/3) 20(2/4) 24(3/3) 30(2/2) 36(2/2) 46(2/2) 49(3/4) 57(2/3) } WORD DS1 02TLP 07OMH 17FEC 26SBS 30HDD 38YLS 39LCS 43HHD 46TTP 47CCM WS2{2=away(3/5) 20=cry(2/4) 46pig(2/2) 49run(2/4) } 02TLP 30HDD 38YLS 39LCS 46TTP DS2 D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 c o k 1 8 d o g 2 4 1 l a m b 3 6 1 p i g 4 6 1 DS2 1 2(2/2) (2/4) 26(2/7) (3/5) 37 38(2/3) 39(3/7) 43 46(4/6) 47(2/4) a w y 2 1 c r g e n 3 u 4 9 5 7 We get a initial document-cluster { 02TLP 07OMH 17FEC 26SBS 30HDD 38YLS 39LCS 43HHD 46TTP 47CCM} and a strong initial word-cluster { 02away 18cock 20cry 24dog 30green 36lamb 46pig 49run 57way} We get a strong final document-cluster { 02TLP 30HDD 38YLS 39LCS 46TTP } and a strong final word-cluster { 02away 20cry 46pig 49run} 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. DS
16
WS1 2 { (4/4) (4/4) 52(5/5) 56(2/2) 58(2/2) } WORD DS0: 01TBM 01TBM 08JSC 14ASO 17FEC 23MTB 28BBB 30HDD 41OKC 46TTP 48OTB DS1 WS2 2 {49run(4/4) 52three(5/5) 56two(2/2) } WS3 2 {49run(4/4) 52three(5/5) 56two(2/2) } 17FEC 48OTB DS2: 01TBM D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 c u t 2 1 r u n 4 9 1 t h r e 5 2 1 w i f e 5 8 1 t w 5 6 1 DS2 2 01(4) 17(2) 48(2) r u n 4 9 1 t h e 5 2 w 6 DS3 2 01(2) 17(2) 48(2) 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. doc-cluster { 01TBM 17FEC 48OTB } word-cluster { 49run three 56two } Has it converged, i.e., WS3=WS2, DS3=DS2)? YES! It would be interesting to find that we get the same convergent clusters starting with doc 17FEC and with doc 48OTB (and starting with word run and with word three and with word two). That's asking a lot but we do know that we get the same result starting with WS={run three two} and starting with DS={01TBM 17FEC 48OTB}. DS1 1(4)
17
We carve off a baby document cluster, { 09HBD 26SBS 27CBC 45BBB }
WS1={1 3=baby(4) 13 42=mother(3) 60} 09HBD 26SBS 27CBC 45BBB DS1 WORD 07OMH 08JSC 12OWF 13RRS 27CBC 28BBB 29LFW 33BFP 38YLS 39LCS 44HLH 45BBB 46TTP WS2={1 2 3=baby(7) =buy(5) 20=cry(5) =mother(7) } D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b a y = WS0 1 DS1= a l w y s WS1-WS0 1 b u c h i d e n r g m o t w o l 1 DS2=7 8 9HBD(3/3) SBS(7/7) 27CBC(3/3) BBB(3/3) Now use relative count (count/total66.67%) and carve off document clusters that converge (producing an authentic partition or clustering). We carve off a baby document cluster, { 09HBD 26SBS 27CBC 45BBB } 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in.
18
WS1={ =king(3) } WORD 05HDS 35SSS 41OKC DS1 docs DS2 docs: 35SSS(13/13) 41OKC(5/5) 5(3/3) 7(3/7) 11(3/3) 14(2/4) 28(2/6) 30(2/5) 32(2/3) 36(2/2) 39(2/7) 48(2/3) 50(2/5) + lots with 1 word. 11OMM 36LTT WS2_2={ king(3) 38men(3) 44} We carve these off a King document cluster {05HDS 11OMM 35SSS 36LTT 41OKC}. 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other men's ditches. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 k i n g WS0 1 b a k e WS1-WS0 1 r d c l o t h i s f u m n y p g
19
We carve off a "tree" document-cluster { 14ASO(3/4) 39LCS(5/7) }
WS1={ tree(2) } WORD DS1 docs 14ASO 39LCS WS2_2={1(2/2) 2(5/5) 9(3/3) 18(3/3) 20 21(2/2) 25(3/5) 26(3/3) 30(2/2) 38(3/6) 45 49(4/4) 52 55(2/2)} We carve off a "tree" document-cluster { 14ASO(3/4) 39LCS(5/7) } 14. If all seas were one sea, what a great sea that would be! And if all trees were one tree, what a great tree that would be! And if all axes were one axe, what a great axe that would be! And if all men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 t r e 1 DS a w y 1 b o c k u t f l g r e n DS2 14(3) 39(5) + (1)s:
20
WORD DS1 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 35SSS 38YLS WS1={ 40money(2) } WS2={7baby(2/3) 9boy(2/3) 10bread(2/3) 20cry(3/4) 25child(3/5) 28full(2/2) 40money(2/2) 45pie(2/3) } 07OMH 28BBB 46TTP 50LJH DS2 D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 m o n e y 40 1 DS b a k e 1 r d c l o t h y i s f u n g m p DS2 7(2/7) 28(2/6) 35SSS(13/13) 38YLS(3/3) 46(2/6) 50(2/5) We carve off a "money" document-cluster { 38YLS } noting 35SSS has already been carved off.
21
WS1={ 2(3/5) 18(3/3) 20(2/4) 24(3/3) 30(2/2) 36(2/2) 46(2/2) 49(3/4) 57(2/3) } WORD DS1 02TLP 07OMH 17FEC 26SBS 30HDD 38YLS 39LCS 43HHD 46TTP 47CCM WS2{2=away(3/5) 20=cry(2/4) 46pig(2/2) 49run(2/4) } 02TLP 30HDD 38YLS 39LCS 46TTP DS2 D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 c o k 1 8 d o g 2 4 1 l a m b 3 6 1 p i g 4 6 1 DS2 1 2(2/2) (2/4) 26(2/7) (3/5) 37 38(2/3) 39(3/7) 43 46(4/6) 47(2/4) a w y 2 1 c r g e n 3 u 4 9 5 7 We get a initial document-cluster { 02TLP 07OMH 17FEC 26SBS 30HDD 38YLS 39LCS 43HHD 46TTP 47CCM} and a strong initial word-cluster { 02away 18cock 20cry 24dog 30green 36lamb 46pig 49run 57way} We carve off a final animal document-cluster { 02TLP 30HDD 46TTP } 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. DS
22
WS1 2 { (4/4) (4/4) 52(5/5) 56(2/2) 58(2/2) } WORD DS0: 01TBM 01TBM 08JSC 14ASO 17FEC 23MTB 28BBB 30HDD 41OKC 46TTP 48OTB DS1 WS2 2 {49run(4/4) 52three(5/5) 56two(2/2) } WS3 2 {49run(4/4) 52three(5/5) 56two(2/2) } 17FEC 48OTB DS2: 01TBM D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 c u t 2 1 r u n 4 9 1 t h r e 5 2 1 w i f e 5 8 1 t w 5 6 1 DS2 2 01(4/4) 17(2/4) 48(2/3) r u n 4 9 1 t h e 5 2 w 6 DS3 2 01(4/4) 17(2) 48(2/2) We carve off this 01TBM document cluster { 01TBM 48OTB } 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. DS1 1(4)
23
WS WORD DS0 03DDD DS1 03DDD WS2 8(1/3) 13(2/3) 51(2/3) 13RRS 33BFP 03DDD 13RRS WS3 51(2/3) DS4 46TTP WS4 51(3/3) We carve off 03DDD document cluster {03DDD 13RRS} 46TTP is already carved D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b e d 8 1 s o n 5 1 b a y 3 1 b u y 1 3 g i r l 2 9 1 p i e 4 5 1 p i u m 4 7 1 DS2 3(2/2) 13(3/5) 33(2/3) 45gone 50(2/5) b e d 8 1 b u y 1 3 s o n 5 1 DS3 3(2/3) 13(2/3) s o n 5 1 DS 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. DS1 3(2)
24
We carve off document cluster {06SPP}
WS1 5(2/2) 22(4/4) WORD DS0 06SPP DS1 06SPP 15PCD 18HTP 32JGF 49WLG We carve off document cluster {06SPP} D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b a d 5 1 DS1 6(2/2) d a y 2 1
25
WORD DS1 DS0 07OMH 10JAJ WS2 (4) 54 59 12OWF 21LAU 23MTB 25WOW 29LFW WS3 (4) 54 59 33BFP 42BBC 43HHD DS2 10JAJ 07OMH 12OWF 21LAU 25WOW 43HHD D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b a c k 4 1 b a k e 7 1 b r e a d 1 b u y 1 3 d o g 2 4 1 m o t h er 4 2 1 o l d 4 1 DS1 7(7/7) (2/7) (2/7) b r o w n 1 2 c a k e 1 4 c r o w n 1 9 t o w n 5 4 1 w o m a n 5 9 1 DS2 7(7) (5) b r e a d 1 b r o w n 1 2 c r o w n 1 9 d o g 2 4 1 o l d 4 1 t o w n 5 4 1 w o m a n 5 9 1 DS3 7(3) 10(3) 12(2) 21(4) 25(2) 43(2) Converges on DS3 and WS3 We carve off DS3 document cluster {07OMH 10JAJ 12OWF 21LAU 25WOW 43HHD} 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 21. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns.
26
WORD WS DS0 08JSC WS2-WS3 (3) DS2= DS1 08JSC 04LMM 50LJH D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 c l e a n 1 6 e a t 2 5 1 w i f e 5 8 1 DS (3) (reds are not yet carved off) a w y 2 1 b o y 9 1 p i e 4 5 1 p i u m 4 7 1 t h u m b 5 3 1 DS2 4(2/2) 8(3/3) (3) (5/5) Converges on DS3 and WS3 We carve off DS3 document cluster { 04LMM 08JSC 50LJH } 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I!
27
WORD WS0: 6 48 DS0 16PPG WS2 2 6(2) (2) DS1 16PPG 28BBB 30HDD 37MBB WS2: DS2 16PPG 30HDD WS3: 16PPG DS3 32JGF D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b a g 6 1 r o u n d 4 8 1 DS1 16(2) w o l 6 1 DS2: 16(2/2) 28(2/6) a w y 2 1 b a g 6 1 d a y 2 1 d i s h 2 3 1 f i d le 2 7 1 r o u n d 4 8 1 r u n 4 9 1 DS3: 16(2/2) 30(4) 32(2/3) 46(2) b a g 6 1 d a y 2 1 f i d le 2 7 1 m e n 3 8 1 r o u n d 4 8 1 DS4:16(2/2) 32(3/3) Converges on DS3 and WS3 We carve off DS3 document cluster { 16PPG 32JGF } 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had I think this uncovers a potential weakness of the method. Even though the WordSet and the DocSet converged, the resulting DocSet documents share no words.
28
WORD 18HTP DS0 WS 06SPP DS1 15PCD 32JGF WS 18HTP 44HLH 06SPP DS2 18HTP 15PCD 32JGF WS (4/4) D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 50 b r i g ht 1 d a y 2 1 b a d 5 1 b r i g ht 1 d a y 2 1 f i d le 2 7 1 h i g 3 1 h i l 3 2 1 l a d y 3 5 1 m e n 3 8 1 s i n g 5 1
29
APPENDIX FAUST KWL Clustering (w or w/o replacement; KWL=KeyWordList)
Let L=LKWL,origin L-1 (k) =docs with exactly k KeyWord matches. It should scale to Big Text Corpuses (billions of docs, thousands of words), because it uses 1 dot product SPTS over |KWL| not |Vocab|, then UDR. Thanksgiving Clustering (no replacement) carves off clusters as we go, so the sequence of KWLs is important. Gradient_of_Variance_Optimization (GVO) may be useful in determining "next best KWL"? To keep |KWL| small we might take KWL to be the nearest actual doc to the GVO vector). KWL={boy=9 girl=29 child=15} UDR output Ct 37 7 L 1 12OWF 26SBS 28BBB 33BFP 39LCS 49WLG 50LJH KWL={bake=7 bread=10 cake=14 pie=45 plum=47} Ct L 1 13RRS 39LCS 2 07OMH 42BBC 50LJH 4 35SSS 21LAU KWL={girl=29 lady=35 maid=37 mother=42 wife=58 woman=59} Ct L 1 01TBM 44HLH 45BBB 33BFP 27CBC 07OMH 08JSC 12OWF 48OTB 29LFW 35SSS 25WOW 49WLG 2 09HBD D=01TBM Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 1 28BBB 46TTP 14ASO 30HDD 48OTB 17FEC 23MTB 41OKC 08JSC 4 01TBM Ct L D=07OMH Old Mother Hubbard went to cupboard to give her poor dog a bone. When she got there cupboard was bare and so poor dog had none. She went to baker to buy him bread. When she came back dog was dead. 1 27CBC 42BBC 43HHD 33BFP 30HDD 09HBD 25WOW 29LFW 21LAU 10JAJ 11OMM 41OKC 23MTB 12OWF 2 13RRS 45BBB 35SSS 7 07OMH Ct L 100 010 001 000 110 101 011 111 3 KWL w/o rrepl uses a sequence of KWLs replacement. e.g., using the words in the pillar documents ? (e.g., initial pillar=FFA; "next" pillar maximizes the distance from PillarSet (or sum of Pillar distances), until the distance to PillarSet (or sum/min) falls below a threshold. How far away from each other are KWLs? 2 Let L=L100 L(100)=1 L(010)=0 L(011)=0 gap =1 i.e., |L(010)-L(100)|=1 |L(011)-L(100)=1 Euclidean Distances are, E(100, 010)=2=1.41 E(100, 011)=3=1.73 Manhattan Distances, M(100, 010)=2 M(100, 011)=3 (We also note incidentally that |L(010)-L(011)|=0 whereas, ED(010,011)=MD(010,011)=1) The KWL approach is Hub And SPoke (HASP) clustering, where the affinity is to the hub=KWL (there may be less or no spoke-spoke affinity). The client might want a more Uniform Affinity (UA) clustering, assuming affinity(x,y) = Count(KeyWordMatch(x,y)) e.g., define UAdoc,T to contain doc and have uniform affinityT, i.e., Affinity(x,y)T x,yUAdoc,T. (T|KWLdoc|, lest docUAdoc,T) Since docUAdoc,T, xUAdoc,T x must contain at least T KeyWordsdoc, and therefore xHASPdoc So to find UAdoc,T we can first construct HASPdoc and then search internal to it for UAdoc,T. But this is also Hub and Spoke since the result will depend upon the search order. Let RINGdoc,S be the HASPdoc docs with exactly S docwords. Carve off from HASPdoc, RINGdoc,T, then carve off RINGdoc,T+1, ... 35SSS: sing full bake pie dish king house money eat bread maid cloth nose 15PCD sing 28BBB full 42BBC bake 39LCS pie 30HDD dish 05HDS king 41OKC 38YLS money 36LTT house 04LMM eat 46TTP eat 21LAU bread 48OTB maid 22HLH nose 08JSC cloth 11OMM cloth 37MBB cloth 50LJH eat pie 07OMH bake bread
30
WS1={1 3=baby(4) 13 42=mother(3) 60} 09HBD 26SBS 27CBC 45BBB DS1 WORD 07OMH 08JSC 12OWF 13RRS 27CBC 28BBB 29LFW 33BFP 38YLS 39LCS 44HLH 45BBB 46TTP WS2={1 2 3=baby(7) =buy(5) 20=cry(5) =mother(7) } DS3(0)=Docs with 0 WS3: 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All Kings horses, and all Kings men cannot put Humpty Dumpty together again. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 22. I had a little husband nobigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. D O C 1 2 3 4 5 6 7 8 9 10 21 30 41 b a y = WS0 1 DS1= a l w y s WS1-WS0 1 b u c h i d e n r g m o t DS2=7 8 9HBD(3) SBS(7) 27CBC(3) BBB(3) a w y WS2-WS1-WS0 1 b c k g e o r d i h t f u l m n p s 0 DS3(0) ={ } DS3-DS2(high)={7(6) 35(6) 39(7) 46(6)} Docs with 1 WS3: 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one. 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 18. I had 2 pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 42. Bat bat, come under my hat and I will give you a slice of bacon. When I bake I will give you a cake, if I'm not mistaken. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. DS3(high)=DS3 w 6 or 7: 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow determined to shoot this little cock sparrow. Little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says sparrow, I will not make a stew. So he flapped his wings and away he flew. 46. Tom Tom piper's son, stole a pig and away he run. Pig was eat and Tom was beat and Tom ran crying down the street. We get a clear "baby" document cluster (<Sleep baby sleep>} and a word cluster {baby, mother}
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.