Download presentation

Presentation is loading. Please wait.

Published bySophie Gerold Modified over 2 years ago

1
Semi-Supervised Learning With Graphs William Cohen 1

2
HW7 Hints m movies n users m movies x1y1 x2y2.. …… xnyn a1a2..…am b1b2……bm v11 … …… vij … vnm ~ V[i,j] = user i’s rating of movie j r W H V 2

3
Homework 7 - Hints What’s parallel and what’s sequential? for epoch t=1,….,T in sequence – for stratum s=1,…,K in sequence for block b=1,… in stratum s in parallel – for triple (i,j,rating) in block b in sequence » do SGD step for (i,j,rating) Hint 1: You need some loops around rdd operations Hint 2: Look at rdd.mapPartitions() to see how to control a parallel/sequential breakdown like the inner loop only needs part of W,H this needs r/w access to (part of) W and H How? 3

4
Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory as shared synchronized memory: you should worry about transmission cost and blocking Abhinav hint: Python dictionaries are thread-safe 4

5
Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory as broadcast memory: workers get unsynchronized copies of the data: worry here is extra broadcast cost hint: mapPartitions ~= Hadoop map 5

6
Homework 7 - Hints What is rdd.mapPartitions(foo)? Function foo is an iterator that loops over the entries in a partition (aka shard) of rdd and yields a sequence of entries: def foo(iterator): doSomething() #setup for x in iterator … yield … … doAnotherThing() #breakdown 6

7
Homework 7 - Hints What is rdd.mapPartitions(foo)? Function foo is an iterator that loops over the entries in a partition (aka shard) of rdd and yields a sequence of entries: def foo(iterator): doSomething() #setup: modifiedEntries = set() for (i,j,r) in iterator … yield … # update copies of W[i,*], H[*,j] … doAnotherThing() #output modified entries of W,H input: broadcast W,H, ratings Vij output: updated W,H 7

8
Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory shared memory individual copies of H, W memory – W,H on disk, joined in with (i,j,r) should worry about inflating the size of the ratings – tis scheme inserts many copies of each W – Hybrid schemes: triples (i,j,r) (i,{(j1,r1),(j2,r2),…,}, W[i,*]) H in memory 8

9
Homework 7 - Hints Change grain size of computation Rather than streaming through (r,i,j) tasks: – create SGD subtasks W1,H1,V11 W2,H2,V22 W3,H3,V33 – each map task does SGD on one subtask Should worry about subtask size (part of ratings are in memory) 9

10
Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory shared memory individual copies of H, W memory – W,H on disk, joined in with (i,j,r) should worry about inflating the size of the ratings – tis scheme inserts many copies of each W – Hybrid schemes: triples (i,j,r) (i,{(j1,r1),(j2,r2),…,}, W[i,*]) H in memory 10

11
Homework 7 – Avuncular Advice This is a really interesting assignment – think and learn! …But don’t stress out We know this assignment is challenging and we’ll curve it appropriately HW8 is being reduced in scope PS. If you’re an undergrad doing carnival stuff talk to me about the (useless) extension 11

12
Semi-Supervised Learning With Graphs William Cohen 12

13
Flashback – Graph Algorithms so far…. PageRank and how to scale it up Personalized PageRank/Random Walk with Restart and – how to implement it – how to use it for extracting part of a graph Other uses for graphs? – not so much 13

14
Main topics today Scalable semi-supervised learning on graphs – SSL with RWR – SSL with coEM/wvRN/HF Scalable unsupervised learning on graphs – Power iteration clustering – … 14

15
Semi-supervised learning A pool of labeled examples L A (usually larger) pool of unlabeled examples U Can you improve accuracy somehow using U? 15

16
Semi-Supervised Bootstrapped Learning/Self-training Paris Pittsburgh Seattle Cupertino mayor of arg1 live in arg1 San Francisco Austin denial arg1 is home of traits such as arg1 anxiety selfishness Berlin Extract cities: 16

17
Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness 17

18
Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness Nodes “near” seedsNodes “far from” seeds Information from other categories tells you “how far” (when to stop propagating) arrogance traits such as arg1 denial selfishness 18

19
ASONAM-2010 (Advances in Social Networks Analysis and Mining) 19

20
Network Datasets with Known Classes UBMCBlog AGBlog MSPBlog Cora Citeseer 20

21
RWR - fixpoint of: Seed selection 1.order by PageRank, degree, or randomly 2.go down list until you have at least k examples/class Seed selection 1.order by PageRank, degree, or randomly 2.go down list until you have at least k examples/class 21

22
Results – Blog data Random Degree PageRank We’ll discuss this soon…. 22

23
Results – More blog data Random Degree PageRank 23

24
Results – Citation data Random Degree PageRank 24

25
Seeding – MultiRankWalk 25

26
Seeding – HF/wvRN 26

27
What is HF aka coEM aka wvRN? 27

28
CoEM/HF/wvRN One definition [MacKassey & Provost, JMLR 2007]:… Another definition: A harmonic field – the score of each node in the graph is the harmonic (linearly weighted) average of its neighbors’ scores; [X. Zhu, Z. Ghahramani, and J. Lafferty, ICML 2003] Another definition: A harmonic field – the score of each node in the graph is the harmonic (linearly weighted) average of its neighbors’ scores; [X. Zhu, Z. Ghahramani, and J. Lafferty, ICML 2003] 28

29
CoEM/wvRN/HF Another justification of the same algorithm…. … start with co-training with a naïve Bayes learner 29

30
CoEM/wvRN/HF One algorithm with several justifications…. One is to start with co-training with a naïve Bayes learner And compare to an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples 30

31
CoEM/wvRN/HF A second experiment – each + example: concatenate features from two documents, one of class A+, one of class B+ – each - example: concatenate features from two documents, one of class A-, one of class B- – features are prefixed with “A”, “B” disjoint 31

32
CoEM/wvRN/HF A second experiment – each + example: concatenate features from two documents, one of class A+, one of class B+ – each - example: concatenate features from two documents, one of class A-, one of class B- – features are prefixed with “A”, “B” disjoint NOW co-training outperforms EM 32

33
CoEM/wvRN/HF Co-training with a naïve Bayes learner vs an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples vs an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples incremental hard assignments iterative soft assignments 33

34
Co-Training Rote Learner My advisor + - - pages hyperlinks - - - - + + - - - + + + + - - + + - 34

35
Co-EM Rote Learner: equivalent to HF on a bipartite graph Pittsburgh + - - contexts NPs - - - - + + - - - + + + + - - + + - lives in _ 35

36
What is HF aka coEM aka wvRN? Algorithmically: HF propagates weights and then resets the seeds to their initial value MRW propagates weights and does not reset seeds Algorithmically: HF propagates weights and then resets the seeds to their initial value MRW propagates weights and does not reset seeds 36

37
MultiRank Walk vs HF/wvRN/CoEM Seeds are marked S MRWHF 37

38
Back to Experiments: Network Datasets with Known Classes UBMCBlog AGBlog MSPBlog Cora Citeseer 38

39
MultiRankWalk vs wvRN/HF/CoEM 39

40
How well does MWR work? 40

41
Parameter Sensitivity 41

42
Semi-supervised learning A pool of labeled examples L A (usually larger) pool of unlabeled examples U Can you improve accuracy somehow using U? These methods are different from EM – optimizes Pr(Data|Model) How do SSL learning methods (like label propagation) relate to optimization? 42

43
SSL as optimization slides from Partha Talukdar 43

44
44

45
yet another name for HF/wvRN/coEM 45

46
match seeds smoothness prior 46

47
47

48
48

49
49

50
How to do this minimization? First, differentiate to find min is at Jacobi method: To solve Ax=b for x Iterate: … or: 50

51
51

52
52

53
precision- recall break even point /HF/… 53

54
/HF/… 54

55
/HF/… 55

56
from mining patterns like “musicians such as Bob Dylan” from HTML tables on the web that are used for data, not formatting 56

57
57

58
58

59
More recent work (AIStats 2014) Propagating labels requires usually small number of optimization passes – Basically like label propagation passes Each is linear in – the number of edges – and the number of labels being propagated Can you do better? – basic idea: store labels in a countmin sketch – which is basically an compact approximation of an object double mapping 59

60
Flashback: CM Sketch Structure Each string is mapped to one bucket per row Estimate A[j] by taking min k { CM[k,h k (j)] } Errors are always over-estimates Sizes: d=log 1/ , w=2/ error is usually less than ||A|| 1 +c h 1 (s) h d (s) d=log 1/ w = 2/ from: Minos Garofalakis 60

61
More recent work (AIStats 2014) Propagating labels requires usually small number of optimization passes – Basically like label propagation passes Each is linear in – the number of edges – and the number of labels being propagated – the sketch size – sketches can be combined linearly without “unpacking” them: sketch(av + bw) = a*sketch(v)+b*sketch(w) – sketchs are good at storing skewed distributions 61

62
More recent work (AIStats 2014) Label distributions are often very skewed – sparse initial labels – community structure: labels from other subcommunities have small weight 62

63
More recent work (AIStats 2014) Freebase Flick-10k “self-injection”: similarity computation 63

64
More recent work (AIStats 2014) Freebase 64

65
More recent work (AIStats 2014) 100 Gb available 65

Similar presentations

OK

Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.

Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on metro train in delhi Pdf to ppt online conversion Ppt on different types of computer viruses Ppt on relation and function free download Ppt on water activity Ppt on measures to control environmental pollution Download ppt on conservation of resources Ppt on business etiquette in india Ppt on total parenteral nutrition definition Ppt on causes of 1857 revolt leaders