Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-Supervised Learning With Graphs William Cohen 1.

Similar presentations


Presentation on theme: "Semi-Supervised Learning With Graphs William Cohen 1."— Presentation transcript:

1 Semi-Supervised Learning With Graphs William Cohen 1

2 HW7 Hints m movies n users m movies x1y1 x2y2.. …… xnyn a1a2..…am b1b2……bm v11 … …… vij … vnm ~ V[i,j] = user i’s rating of movie j r W H V 2

3 Homework 7 - Hints What’s parallel and what’s sequential? for epoch t=1,….,T in sequence – for stratum s=1,…,K in sequence for block b=1,… in stratum s in parallel – for triple (i,j,rating) in block b in sequence » do SGD step for (i,j,rating) Hint 1: You need some loops around rdd operations Hint 2: Look at rdd.mapPartitions() to see how to control a parallel/sequential breakdown like the inner loop only needs part of W,H this needs r/w access to (part of) W and H How? 3

4 Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory as shared synchronized memory: you should worry about transmission cost and blocking Abhinav hint: Python dictionaries are thread-safe 4

5 Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory as broadcast memory: workers get unsynchronized copies of the data: worry here is extra broadcast cost hint: mapPartitions ~= Hadoop map 5

6 Homework 7 - Hints What is rdd.mapPartitions(foo)? Function foo is an iterator that loops over the entries in a partition (aka shard) of rdd and yields a sequence of entries: def foo(iterator): doSomething() #setup for x in iterator … yield … … doAnotherThing() #breakdown 6

7 Homework 7 - Hints What is rdd.mapPartitions(foo)? Function foo is an iterator that loops over the entries in a partition (aka shard) of rdd and yields a sequence of entries: def foo(iterator): doSomething() #setup: modifiedEntries = set() for (i,j,r) in iterator … yield … # update copies of W[i,*], H[*,j] … doAnotherThing() #output modified entries of W,H input: broadcast W,H, ratings Vij output: updated W,H 7

8 Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory shared memory individual copies of H, W memory – W,H on disk, joined in with (i,j,r) should worry about inflating the size of the ratings – tis scheme inserts many copies of each W – Hybrid schemes: triples (i,j,r)  (i,{(j1,r1),(j2,r2),…,}, W[i,*]) H in memory 8

9 Homework 7 - Hints Change grain size of computation Rather than streaming through (r,i,j) tasks: – create SGD subtasks W1,H1,V11 W2,H2,V22 W3,H3,V33 – each map task does SGD on one subtask Should worry about subtask size (part of ratings are in memory) 9

10 Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory shared memory individual copies of H, W memory – W,H on disk, joined in with (i,j,r) should worry about inflating the size of the ratings – tis scheme inserts many copies of each W – Hybrid schemes: triples (i,j,r)  (i,{(j1,r1),(j2,r2),…,}, W[i,*]) H in memory 10

11 Homework 7 – Avuncular Advice This is a really interesting assignment – think and learn! …But don’t stress out We know this assignment is challenging and we’ll curve it appropriately HW8 is being reduced in scope PS. If you’re an undergrad doing carnival stuff talk to me about the (useless) extension 11

12 Semi-Supervised Learning With Graphs William Cohen 12

13 Flashback – Graph Algorithms so far…. PageRank and how to scale it up Personalized PageRank/Random Walk with Restart and – how to implement it – how to use it for extracting part of a graph Other uses for graphs? – not so much 13

14 Main topics today Scalable semi-supervised learning on graphs – SSL with RWR – SSL with coEM/wvRN/HF Scalable unsupervised learning on graphs – Power iteration clustering – … 14

15 Semi-supervised learning A pool of labeled examples L A (usually larger) pool of unlabeled examples U Can you improve accuracy somehow using U? 15

16 Semi-Supervised Bootstrapped Learning/Self-training Paris Pittsburgh Seattle Cupertino mayor of arg1 live in arg1 San Francisco Austin denial arg1 is home of traits such as arg1 anxiety selfishness Berlin Extract cities: 16

17 Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness 17

18 Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness Nodes “near” seedsNodes “far from” seeds Information from other categories tells you “how far” (when to stop propagating) arrogance traits such as arg1 denial selfishness 18

19 ASONAM-2010 (Advances in Social Networks Analysis and Mining) 19

20 Network Datasets with Known Classes UBMCBlog AGBlog MSPBlog Cora Citeseer 20

21 RWR - fixpoint of: Seed selection 1.order by PageRank, degree, or randomly 2.go down list until you have at least k examples/class Seed selection 1.order by PageRank, degree, or randomly 2.go down list until you have at least k examples/class 21

22 Results – Blog data Random Degree PageRank We’ll discuss this soon…. 22

23 Results – More blog data Random Degree PageRank 23

24 Results – Citation data Random Degree PageRank 24

25 Seeding – MultiRankWalk 25

26 Seeding – HF/wvRN 26

27 What is HF aka coEM aka wvRN? 27

28 CoEM/HF/wvRN One definition [MacKassey & Provost, JMLR 2007]:… Another definition: A harmonic field – the score of each node in the graph is the harmonic (linearly weighted) average of its neighbors’ scores; [X. Zhu, Z. Ghahramani, and J. Lafferty, ICML 2003] Another definition: A harmonic field – the score of each node in the graph is the harmonic (linearly weighted) average of its neighbors’ scores; [X. Zhu, Z. Ghahramani, and J. Lafferty, ICML 2003] 28

29 CoEM/wvRN/HF Another justification of the same algorithm…. … start with co-training with a naïve Bayes learner 29

30 CoEM/wvRN/HF One algorithm with several justifications…. One is to start with co-training with a naïve Bayes learner And compare to an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples 30

31 CoEM/wvRN/HF A second experiment – each + example: concatenate features from two documents, one of class A+, one of class B+ – each - example: concatenate features from two documents, one of class A-, one of class B- – features are prefixed with “A”, “B”  disjoint 31

32 CoEM/wvRN/HF A second experiment – each + example: concatenate features from two documents, one of class A+, one of class B+ – each - example: concatenate features from two documents, one of class A-, one of class B- – features are prefixed with “A”, “B”  disjoint NOW co-training outperforms EM 32

33 CoEM/wvRN/HF Co-training with a naïve Bayes learner vs an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples vs an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples incremental hard assignments iterative soft assignments 33

34 Co-Training Rote Learner My advisor pages hyperlinks

35 Co-EM Rote Learner: equivalent to HF on a bipartite graph Pittsburgh contexts NPs lives in _ 35

36 What is HF aka coEM aka wvRN? Algorithmically: HF propagates weights and then resets the seeds to their initial value MRW propagates weights and does not reset seeds Algorithmically: HF propagates weights and then resets the seeds to their initial value MRW propagates weights and does not reset seeds 36

37 MultiRank Walk vs HF/wvRN/CoEM Seeds are marked S MRWHF 37

38 Back to Experiments: Network Datasets with Known Classes UBMCBlog AGBlog MSPBlog Cora Citeseer 38

39 MultiRankWalk vs wvRN/HF/CoEM 39

40 How well does MWR work? 40

41 Parameter Sensitivity 41

42 Semi-supervised learning A pool of labeled examples L A (usually larger) pool of unlabeled examples U Can you improve accuracy somehow using U? These methods are different from EM – optimizes Pr(Data|Model) How do SSL learning methods (like label propagation) relate to optimization? 42

43 SSL as optimization slides from Partha Talukdar 43

44 44

45 yet another name for HF/wvRN/coEM 45

46 match seeds smoothness prior 46

47 47

48 48

49 49

50 How to do this minimization? First, differentiate to find min is at Jacobi method: To solve Ax=b for x Iterate: … or: 50

51 51

52 52

53 precision- recall break even point /HF/… 53

54 /HF/… 54

55 /HF/… 55

56 from mining patterns like “musicians such as Bob Dylan” from HTML tables on the web that are used for data, not formatting 56

57 57

58 58

59 More recent work (AIStats 2014) Propagating labels requires usually small number of optimization passes – Basically like label propagation passes Each is linear in – the number of edges – and the number of labels being propagated Can you do better? – basic idea: store labels in a countmin sketch – which is basically an compact approximation of an object  double mapping 59

60 Flashback: CM Sketch Structure Each string is mapped to one bucket per row Estimate A[j] by taking min k { CM[k,h k (j)] } Errors are always over-estimates Sizes: d=log 1/ , w=2/   error is usually less than  ||A|| 1 +c h 1 (s) h d (s) d=log 1/  w = 2/  from: Minos Garofalakis 60

61 More recent work (AIStats 2014) Propagating labels requires usually small number of optimization passes – Basically like label propagation passes Each is linear in – the number of edges – and the number of labels being propagated – the sketch size – sketches can be combined linearly without “unpacking” them: sketch(av + bw) = a*sketch(v)+b*sketch(w) – sketchs are good at storing skewed distributions 61

62 More recent work (AIStats 2014) Label distributions are often very skewed – sparse initial labels – community structure: labels from other subcommunities have small weight 62

63 More recent work (AIStats 2014) Freebase Flick-10k “self-injection”: similarity computation 63

64 More recent work (AIStats 2014) Freebase 64

65 More recent work (AIStats 2014) 100 Gb available 65


Download ppt "Semi-Supervised Learning With Graphs William Cohen 1."

Similar presentations


Ads by Google