Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Active Perspectives on Computational Learning and Testing Liu Yang Slide 1

Interaction An Interactive Protocol - Have algorithm interact with oracle/experts Care about - Query complexity - Computational Efficiency Question: how much better can learner do w/ interaction, vs. getting data in one shot ? Slide 2

Notation Instance space X = {0, 1}^n Concept space C: collection of fn h: X -> {-1,1} Distribution D over X Unknown target function h*: the true labeling function (Realizable case: h* in C) Err(h) = P x~D [h(x) ~= h*(x)] Slide 3

“Active” means Label Request Label request: have a pool of unlabeled exs, pick any x and receive h*(x), repeat Alg achieves Query Complexity S(ε,δ,h) for (C,D) if it outputs h n after ≤ n label requests, and for any h* in C, ε > 0, δ > 0, n ≥ S(ε, δ, h), P[err(h n ≤ ε )] ≥ 1- δ Motivation: labeled data is expensive to get Using label request, can do - Active Learning: find h has small err(h) - Active Testing: decide h* in C or far from C Slide 4

Thesis Outline Bayesian Active Learning Using Arbitrary Binary Valued Queries (Published) Active Property Testing Active Property Testing (Major results submitted) Self-Verifying Bayesian Active Learning (Published) A Theory of Transfer Learning (Accepted ) Learning with General Types of Query (In progress) Active Learning with a Drifting Distri. (Submitted) This Talk Slide 5

Outline Active Property Testing (Submitted) Active Property Testing (Submitted) Self-Verifying Bayesian Active Learning (Published) Transfer Learning (Accepted) Learning with General Types of Query (in progress) Slide 6

Property Testing What Property Testing is for ? - Quickly tell whether the right fn class - Estimate complexity of fn without actually learning Question : Can you do w/ fewer queries than learning ? Yes !!! e.g. Union of d Intervals, testing help!!! ----++++----+++++++++-----++---+++-------- - Testing tells how big d need to be close to target - #Label: Active Testing need O(1), Passive Testing need Θ(√d), Active Learning need Θ(d) Slide 7

Active Property Testing Passive Testing : has no control on labeled exs Membership Query: unrealistic to query fn. at arbitrary points Active query asks labels from what exist in env Question : Is active testing still get significant benefit in label requests over passive testing ? Slide 8

Property Tester - Accepts w.p. >= 2/3 if h* in C - Rejects w.p. >= 2/3 if d(h*;C) = min g in C P x~D [h*(x) ≠ g(x)] ≥ ε Slide 9

Testing Union of d Intervals ----++++----+++++++++-----++---+++-------- Theorem. Testing unions of <=d intervals in active testing model uses only queries. Noise Sensitivity := Pr [two close points labeled diff] Proof Idea: - all unions of d intervals have low noise sensitivity - all functions that are far from this class have noticeably larger noise sensitivity - we introduce a tester that estimates the noise sensitivity of the input function. Slide 10

Summary of Results ★★ Active Testing (constant # of queries ) much better than Passive Testing and Active Learning on unions of intervals, cluster assumption, margin assumption ★ Testing is easier than learning on LTF ✪ For dictator (single variable fn), Active Testing no help Slide 11

Outline Active Property Testing (Submitted) Self-Verifying Bayesian Active Learning Self-Verifying Bayesian Active Learning (Published) Transfer Learning (Accepted) Learning with General Types of Query (in progress) Slide 12

Self-Verifying Bayesian Active Learning Self-verifying (a special type of stopping criterion) -given ε, adaptively decides # of query, then halts -has the property that E[err] < ε when halts Question: Can you do with E[#query] = o(1/ε) ? (passive learning need 1/ε labels) Slide 13

Example: Intervals - - Suppose D is uniform on [0,1] 01 Slide 14 +

Example: Intervals Suppose h* is empty interval, D is uniform on [0,1] } 22 10 Slide 16 Verification Lower Bound alg somehow arrives at h, er(h) < ε; how to verify h ε close to h* ? h* ε close to h  every h’ ε outside green ball is not h* - Everything on red circle is outside green ball; so have to verify those are not h* - Suppose h* is empty interval, then er(h) p(h=+) < ε. h* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Slide 15 h* h C ε 2ε - In particular, intervals of width 2  are on red circle. - Need one label in each interval to verify it is not h* - So need  (1/  ) labels to verify the target isn’t one of these.

Learning with a prior Suppose we know a distribution the target is sampled from, call it prior Slide 16

Interval Example with prior - - - - - |+++++++|- - - - - - Algorithm: Query random pts till find first +, do binary search to find end-pts. Halt when reach a pre-specified prior-based query budget. Output posterior’s Bayes classifier. Let budget N be high enough so E[err] < ε - N = o(1/ε) sufficient for E[err|w*>0] 0 w/ o(1/eps) queries; by DCT, the average of any collection of o(1/N) fns is o(1/N), so only need N=o(1/ε) to make E[err|w*>0] < ε. - N = o(1/ε) sufficient for E[err|w*=0] 0, then after some L = O(log(1/ε)) queries, w.p.> 1-ε, most prob. mass on empty interval, so posterior’s Bayes classifier has 0 error rate Slide 17

Can do o(1/eps) for any VC-class Theorem : With the prior, can do o(1/ε) There are methods that find a good classifier in o(1/eps) queries (though they aren’t self- verifying) [see TrueSampleComplexityAL08] Need to set a stopping criterion for those alg The stop criterion we use : let alg run until make a certain #query (set the budget to be just large enough so the E[err] < ε) Slide 18

Outline Active Property Testing (Submitted) Self-Verifying Bayesian Active Learning (Published) Transfer Learning Transfer Learning (Accepted) Learning with General Types of Query (in progress) Slide 19

priorh1*h1* x 1 1,y 1 1 … x 1 k,y 1 k Task 1 hT*hT* x T 1,y T 1 … x T k,y T k … Task T Model of Transfer Learning Motivation: Learners often Not Too Altruistic h2*h2* x 2 1,y 2 1 … x 2 k,y 2 k Task 2 Layer 1: draw task i.i.d. from (unknown) prior Layer 2: per task, draw data i.i.d. from target Better Estimate of Prior Slide 20

Insights Using a good estimate of the prior is almost as good as using the true prior - Now we only need a VC-dim # of additional points from each task, to get a good estimate of the prior - We’ve seen self-verifying alg, if given the true prior, has guarantee on the err and # queries - As #task ->∞, when call self-verifying alg, it outputs a classifier as good as if it had the true prior Slide 21

Main Result Design an alg using this insight - Uses at most VC-dim # additional labeled points per task (vs. learning with known true prior) - Estimate joint distribution on (x1,y1)…(xd,yd) - Invert to get prior estimate Running this alg asymptotically just as good as having direct knowledge of prior (Bayesian) - [HKS] showed passive save const. factors in Θ(1/ε) sample complexity per task (replace vc-dim w/ a prior-dependent complexity measure) - We showed access to prior can improve active sample complexity, sometimes from Θ(1/ε) to o(1/ε). Slide 22

Estimating the prior Insight : Identifiability of priors by d-dim joint distri. Distrib of full sequence (x1,y1),(x2,y2),… uniquely identifies prior Set of joint distribs on (x1,y1),…,(xk,yk) s.t. 1≤k<∞ identifies distrib of full sequence (x1,y1),(x2,y2),… For any k > d=VC-dim, can express distrib on (x1,y1),…,(xk,yk) in terms of distrib of (x1,y1),…,(xd,yd). How to do it when d =1 ? e.g. threshold - for two points x 1, x 2, if x 1 < x 2, then Pr(+,-)=0, Pr(+,+)=Pr(+.), Pr(-,-)=Pr(.-), Pr(-,+)=Pr(.+)-Pr(++) = Pr(.+)-Pr(+.) - for any k > 1 points, can directly to reduce from k to 1 P(-----------(-+)++++++++++) = P( (-+) ) = P( (.+) ) - P( (+.) ) Slide 23

Outline Active Property Testing (Submitted) Self-Verifying Bayesian Active learning (published) Transfer Learning (Accepted) Learning with General Types of Query Learning with General Types of Query (in progress) - Learning DNF formula - Learning DNF formula - Learning Voronoi - Learning Voronoi Slide 24

Learning with General Query Construct problem-specific queries used to efficiently learn those problems having no known efficient algorithms to PAC-learn. Slide 25

Learning DNF formulas - DNF formulas: n is # of variables; poly-sized DNF: # of terms is n O(1) e.g. set of fn like f = (x1 ∧ x2) ∨ (x1 ∧ x4) -Natural form of knowledge representation -[Valiant 1984]; a great challenge over 20 years -PAC-learning DNF formulas appears to be very hard. -Fastest known alg[KS01] runs in time exp(n 1/3 log 2 n). -If the alg forced to output a hypothesis which itself is a DNF, the problem is NP-hard. (x1 ∧ x2) (x1 ∧ x4) Slide 26

Thanks ! Thanks ! Slide 31

Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Similar presentations

Presentation on theme: "Active Perspectives on Computational Learning and Testing Liu Yang Slide 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.

Similar presentations

Presentation on theme: "Active Perspectives on Computational Learning and Testing Liu Yang Slide 1."— Presentation transcript:

Similar presentations

About project

Feedback