Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.

Similar presentations


Presentation on theme: "Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour."— Presentation transcript:

1 Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour

2 PAC Learning model There exists a distribution D over domain X Examples: –use c for target function (rather than c t ) Goal: –With high probability (1-  ) –find h in H such that –error(h,c ) <  –  arbitrarily small.

3 VC: Motivation Handle infinite classes. VC-dim “replaces” finite class size. Previous lecture (on PAC): –specific examples –rectangle. –interval. Goal: develop a general methodology.

4 The VC Dimension C collection of subsets of universe U VC(C) = VC dimension of C: size of largest subset T  U shattered by C T shattered if every subset T’  T expressible as T  (an element of C) Example: C = {{a}, {a, c}, {a, b, c}, {b, c}, {b}} VC(C) = 2{b, c} shattered by C Plays important role in learning theory, finite automata, comparability theory, computational geometry

5 Definitions: Projection Given a concept c over X –associate it with a set (all positive examples) Projection (sets) –For a concept class C and subset S –  C (S) = { c  S | c  C} Projection (vectors) –For a concept class C and S = {x 1, …, x m } –  C (S) = { | c  C}

6 Definition: VC-dim Clearly |  C (S) |  2 m C shatters S if |  C (S) | =2 m (S is shattered by C) VC dimension of a class C: –The size d of the largest set S that shatters C. –Can be infinite. For a finite class C –VC-dim(C)  log |C|

7 Example S is Shattered by C VC: A combinatorial measure of a function class complexity

8 Calculating VC dimensionality The VC dimension is at least d if there exists some sample |S| = d which is shattered by C. This does not mean that all samples of size d are shattered by C. (Three point on a single line in 2d) Conversely, in order to show that the VC dimension is at most d, one must show that no sample of size d + 1 is shattered. Naturally, proving an upper bound is more difficult than proving the lower bound on the VC dimension.

9 Example 1: Interval 1 0 C 1 ={c z | z  [0,1] } c z (x) = 1  x  z

10 Example 2: line C 2 ={c w | w=(a,b,c) } c w (x,y) = 1  ax+by  c

11 Line: Hyperplane VC dim > 3

12 VC dim < 4 4 points can not be shattered

13 Example 3: Parallel Rectangle

14 VC Dim of Rectangles

15 Example 4: Finite union of intervals Any set of points can be covered Thus VC dim =

16 Example 5 : Parity n Boolean input variables T  {1, …, n} f T (x) =  i  T x i Lower bound: n unit vectors Upper bound –Number of concepts –Linear dependency

17 Example 6: OR n Boolean input variables P and N subsets {1, …, n} f P,N (x) = (  i  P x i )  (  i  N  x i ) Lower bound: n unit vectors Upper bound –Trivial 2n –Use ELIM (get n+1) –Show second vector removes 2 (get n)

18 Example 7: Convex polygons

19

20 Example 8: Hyper-plane VC-dim(C 8 ) = d+1 Lower bound –unit vectors and zero vector Upper bound! C 8 ={c w,c | w  d } c w,c (x) = 1   c

21 Complexity Questions Given C, compute VC(C) since VC(C)  log |C|, can compute in O(n log n ) time (Linial-Mansour-Rivest 88) probably can’t do better: problem is LOG NP-complete (Papadimitriou-Yannakakis 96) Often C has a small implicit representation: C(i, x) is a polynomial-size circuit such that C(i, x) = 1 iff x belongs to set i implicit version is  3 -complete (Schaefer 99) (as hard as  a  b  c  (a, b, c) for CNF formula  )

22 Sampling Lemma Lemma: Let W X be chosen randomly such that |W| ε|X|. A set of O(1/ε ln(1/δ)) points sampled independently and uniformly at random from X intersects W with probability at least (1- δ) Proof: Any sample x is in W with probability at least ε. Thus, the probability that all samples do not intersect with W is at most δ:

23 ε-Net Theorem Theorem: Let VC-dimension of (X,C) be d 2 and 0 ε ½. ε-net for (X,C) of size at most O(d/ε ln(1/ε)). If we choose O(d/ε ln(d/ε) + 1/ε ln(1/δ)) points at random from X, then the resulting set N is an ε-net with probability δ. Exercise 3, Submission next week A polynomial bound on the sample size for PAC learning

24 Radon Theorem Definitions: –Convex set. –Convex hull: conv(S) Theorem: –Let T be a set of d+2 points in R d –There exists a subset S of T such that –conv(S)  conv(T \ S)  Proof!

25 Hyper-plane: Finishing the proof Assume d+2 points T can be shattered. Use Radon Theorem to find S such that –conv(S)  conv(T \ S)  Assign point in S label 1 –points not in S label 0 There is a separating hyper-plane How will it label conv(S)  conv(T \ S)

26 Lower bounds: Setting Static learning algorithm: –asks for a sample S of size m(  ) –Based on S selects a hypothesis

27 Lower bounds: Setting Theorem: –if VC-dim(C) =  then C is not learnable. Proof: –Let m = m(0.1,0.1) –Find 2m points which are shattered (set T) –Let D be the uniform distribution on T –Set c t (x i )=1 with probability ½. Expected error ¼. Finish proof!

28 Lower Bound: Feasible Theorem –VC-dim(C)=d+1, then m(  )=  (d/  ) Proof: –Let T be a set of d+1 points which is shattered. –D samples: z 0 with prob. 1-8  z i with prob. 8  /d

29 Continue –Set c t (z 0 )=1 and c t (z i )=1 with probability ½ Expected error 2  Bound confidence –for accuracy 

30 Lower Bound: Non-Feasible Theorem –For two hypoth. m(  )=  ((log 1  )    ) Proof: –Let H={h 0, h 1 }, where h b (x)=b –Two distributions: –D 0 : Prob. is ½ -  and is ½ +  –D 1 : Prob. is ½ +  and is ½ - 


Download ppt "Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour."

Similar presentations


Ads by Google