Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Similar presentations


Presentation on theme: "Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,"— Presentation transcript:

1 Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory, 2007

2 I. Introduction

3 Learning Boolean formula f Problem: Boolean formula f hidden in a black box. Told that f is from class C of formulas. Task: “Learn” f f(x 1, x 2, x 3 ) = x 1 Λ x 3

4 Learning representation f of Boolean function Problem: f hidden in a black box. Told that f is from class C of representations of Boolean functions. Task: “Learn” f f = x 1 + x 2 (mod 2)

5 Boolean functions can represent Whether a person is good or bad Whether an email message is spam Whether tumor is malignant Whether a book is a romance novel etc.

6 How hard is it to learn target f? Need to specify: Type of information available What’s meant by “learning” Learning Models

7 II. Learning Models

8 Valiant’s PAC Model (1984) PAC = Probably Approximately Correct Type of info available: – Random examples: Value of f on “random” points in its domain Success Criterion: –Approximate learning Output h that is approximately functionally equivalent to f

9 Query Models (this talk) Type of info available: –Oracles that answer questions about f Success Criterion: –Exact learning Output h where h ≡ f Want to learn f within “polynomial” number of queries, in “polynomial” time –polynomial in n and size of f

10 Types of queries Membership queries (point evaluation) Question: What is f(x)? Answer: f(x) Equivalence queries Question: Is h ≡ f? h is hypothesis Answer: Yes if so, else x such that f(x) ≠ h(x) x is counterexample

11 Definition: A membership and equivalence query algorithm learns a class C of representations if given 1. Oracles to answer membership and equivalence queries for some f in C 2. The number n of variables of f the algorithm outputs a representation h s.t. f ≡ h Say algorithm runs in polynomial time if running time is poly(n, size of f)

12 About membership and equivalence queries Assume queries answered perfectly Membership queries –Black-box interpolation –Perfect answers often not available in practice Equivalence queries –Can be simulated in PAC model Test whether f(x) = h(x) on random examples x –Relation to mistake-bound learning in on-line model

13 III. Example Query Algorithm

14 E.g. C = Boolean monomials Boolean monomial = conjunction of literals f(x 1, x 2, x 3 ) = ¬x 1 Λ x 3 = ¬x 1 x 3 x 1 ¬x 2 x 3 ¬x 2

15 Learning monomial f(x 1, x 2, x 3 ) 1. Ask equiv. query: Is f ≡ 0? Suppose get counterexample f(1,0,1)=1 2. For each x i, determine if it appears in monomial f with negation, without negation, or not at all Since x 1 =1 in counterexample, x 1 appears in f without negation, or not at all. Ask membership query: What is f(0, 0, 1)? If answer is 0, x 1 without negation is in monomial f If answer is 1, x 1 does not appear in f at all Do similarly for x 2 and x 3

16 Learning Boolean monomials Previous approach learns Boolean monomials in n+1 equivalence and membership queries, polynomial time Can also learn Boolean monomials with equivalence queries alone Need exponential queries (worst-case) with membership queries alone. If monomial includes all n variables, f=1 for only one of the 2 n points in its domain

17 IV. Four interesting representation classes

18 1. DNF Formulas OR of ANDs f = ¬x 1 x 2 x 3 V ¬x 1 x 4 V x 1 x 2 x 3 Natural way of describing classification rule Not known whether DNF learnable in polynomial time with membership and equivalence queries (or in PAC model) Best known algorithm runs in time

19 2. Boolean linear threshold functions f = 1 if x 1 + x 2 + x 3 > 2 = 0 otherwise Learnable in polynomial time, equivalence q’s 3. Polynomials over GF[2] (integers mod 2) f = x 2 x 3 + x 1 x 3 + x 1 x 2 x 3 Learnable in polynomial time, memb+equiv q’s

20 4. Boolean decision trees Learnable in polynomial time, memb+equiv q’s x1x1 x3x3 01 x2x2 01 =1 =0 =1 =0 =1

21 Representation and size Can represent every Boolean function as DNF formula, GF[2] polynomial, or decision tree –But sizes of representations can be very different –e.g. Parity function Representation as GF[2] polynomial is small f(x 1,…,x n ) = x 1 + x 2 + … + x n (mod 2) Requires DNF formula of size exponential in n Requires decision tree of size exponential in n

22 V. Learning with Polynomial Number of Queries

23 Halving Algorithm Generic algorithm for learning with poly number of queries Assume (for simplicity) know size s of target f Keep set V of all possible f Initially, V contains all representations in C (on n variables) of size s Repeat until success: –Use V to construct Majority Hypothesis h –Ask equivalence query with h –Either “yes” (success), or receive counterexample. –If the latter, update V

24 f6f6 f1f1 f8f8 f3f3 f5f5 V Majority Hypothesis h For each x in domain of f h (x) = 1 if majority of f i ’s in V have f i (x) = 1 = 0 if majority of f i ’s in V have f i (x) = 0 Counterexample to majority hypothesis eliminates at least half of f i ’s in V Number of equivalence queries of Halving Algorithm is log 2 (Original size of V)

25 VI. Challenge: Learn in polynomial time

26 If restrict hypotheses to be in C May be NP-hard to learn Computational hardness of learning –Tools to prove: Complexity theory, NP-completeness reductions, non-approximability May require exponential number of queries to learn Informational hardness of learning –Tools to prove: Structural properties of C, combinatorial arguments

27 Example Suppose C is class of 2-term DNF formulas and want to learn C with equivalence queries alone NP-hard to learn 2-term DNF formulas with equivalence queries alone if hypotheses must be 2-term DNF formulas f = ¬x 1 x 3 V x 1 x 2

28 2-term DNF formulas can be factored f = ¬x 1 x 3 V x 1 x 2 = (¬x 1 V x 2 )(x 3 V x 1 ) (x 3 V x 2 ) Result is 2-CNF formula –AND of ORs in which each OR has at most 2 literals –Size of 2-CNF formula O(n 2 ) –2-CNF formulas can be learned in poly-time with equivalence queries alone (how?) Learn 2-term DNF formula using algorithm for learning 2-CNF formulas.

29 Learning 2-CNF 2-CNF formula f = (¬x 1 V x 2 )(x 3 V x 1 ) (x 3 V x 2 ) Can be viewed as monomial over new variable set {y 1, y 2, …,} y 1 = (¬x 1 V x 2 ) y 2 = (x 1 V x 2 ) y 3 = (x 2 V ¬x 3 ) etc. Learn 2-CNF formulas using algorithm for learning monomials by translating between original vars and new vars

30 Two Useful Techniques 1.To show C learnable, find C' s.t. –C' poly-time learnable –each f in C has equivalent f' in C' of size at most polynomially larger. –Learn C using algorithm for C’ 2. Use existing algorithm with new variable set

31 BUT… Even if allow hypotheses not from C, can still be hard to learn C in polynomial time If C sufficiently rich class of Boolean circuits/formulas –Can show that C can represent cryptographic primitives –Learning C as hard as breaking cryptographic primitives

32 VII. Learning GF[2] Polynomials and Decision Trees

33 GF[2] Polynomials and Decision Trees Poly-time learnable with membership and equivalence queries using algorithm for learning Hankel matrix representations (multiplicity automata) –Useful Technique 1 Hankel matrix representations learnable using variant of Deterministic Finite Automaton learning algorithm

34 Hankel Matrix H of f(x 1,…,x n ) View f as function on binary strings Rows/columns of H indexed by all binary strings. H[x,y] = f(x◦y) if |x|+|y|=n = 0 otherwise

35 Hankel matrix of f(x 1,x 2 ) = x 1 V x 2 ε 0 1 00 01 10 11 111... ε 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 00 0 01 1 10 1 11 1 …

36 Learning Hankel matrices of Boolean functions Can represent Hankel matrix compactly –Suffices to specify particular O(r 2 ) entries, where r is rank of matrix –Running time of Hankel matrix algorithm polynomial in r, n Lemma: If f(x 1,…,x n ) is a GF[2] polynomial with s terms, then rank of its Hankel matrix is poly(n,s) Lemma: If f(x1,…,xn) is a decision tree with s nodes, then rank of its Hankel matrix is poly(n,s) Use Hankel matrix algorithm to learn GF[2] polynomials and decision trees

37 VIII. Summary Definition of Query Learning Models Halving algorithm for learning with polynomial number of equivalence queries Techniques for polynomial-time learning Examples of classes learnable in polynomial-time Barriers to polynomial-time learning

38 Selected References Learning Models –Valiant, L. G., A Theory of the Learnable. Communications of the ACM, 1984 –Angluin, D. Queries and concept learning. Machine Learning 2(4), 1988 Learning Algorithms –Beimel, A., Bergadano, F., Bshouty, N. H., Kushilevitz, E., and Varricchio, S. Learning functions represented as multiplicity automata. Journal of the ACM (3), 2000 –Maass, W. and Turan, G. On the complexity of learning with counterexamples. Proc. of the 30th IEEE Symposium on Foundations of Computer Science (FOCS), 1989 –Klivans, A. and Servedio, R. Learning DNF in Time 2^{O(n^{1/3})}. Journal of Computer and System Sciences 68(2), 2004

39 Hardness of Learning –Kearns, M. J. and Valiant, L. G. Cryptographic limitations on learning Boolean formulae and finite automata. J. ACM (1), 1994 –Angluin, D. Negative results for equivalence queries. Machine Learning (5), 1990 –Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V., and Wilkins, D. How many queries are needed to learn? J. ACM (43), 1996


Download ppt "Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,"

Similar presentations


Ads by Google