The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.

The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion

Introduction The complexity of leaning is measured mainly along Information computation. two axis: Information and computation. Information complexity Information complexity enjoys a rich theory that yields rather crisp sample size and convergence rate guarantees. Computational complexity The focus of this talk is the Computational complexity of learning. While playing a critical role in any application,its theoretical understanding is far less satisfactory.

Outline of this Talk 1.Some background. 2. Survey of recent pessimistic hardness results. 3. New efficient learning algorithms for some basic learning architectures.

The Label Prediction Problem Given some domain X set X S A sample S of labeled X members of X is generated by some (unknown) distribution x For a next point x, predict its label Data files of drivers Will the current customer file a claim? Drivers in a sample are labeled according to whether they filed an insurance claim Formal DefinitionExample

The Agnostic Learning Paradigm  H X  Choose a Hypothesis Class H of subsets of X.  Sh HS  For an input sample S, find some h in H that fits S well.  h  For a new point x, predict a label according to its membership in h.

The Mathematical Justification H If H is not too rich (has small VC-dimension) hH then, for every h in H, hS the agreement ratio of h on the sample S x is a good estimate of its probability of success on a new x.

The Mathematical Justification - Formally SD If S is sampled i.i.d., by some D over X  {0, 1}> 1-  X  {0, 1} then with probability > 1-  Agreement ratioProbability of success

The Model Selection Issue Output of the the learning Algorithm Best regressor for P Approximation Error Estimation Error Computational Error The Class H

The Computational Problem  :{0, 1}  Input: A finite set of {0, 1}-labeled S R n points S in R n.  :  Output: Some ‘hypothesis’ h in H that maximizes the number of correctly classified points of S.

Half-spaces of Linear We shall focus on the class NP Hard Find best hyperplane for arbitrary samples S ? Find hyperplane approximating the optimal for arbitrary S Feasable (Perceptron Algorithms) Find best hyperplane for separable S

For each of the following classes, approximating the H best agreement rate for h in H (on a given input S sample S ) up to some constant ratio, is NP-hard : MonomialsConstant width Monotone MonomialsHalf-spaces Balls Axis aligned Rectangles Threshold NN’s with constant 1st-layer width BD-Eiron-Long Bartlett- BD Hardness-of-Approximation Results

The SVM Solution R n Rather than bothering with non-separable data, make the data separable - by embedding it into some high-dimensional R n

A Problem with the SVM method   (|X|)  In “most” cases the data cannot be made separable unless the mapping is into dimension  (|X|). This happens even for classes of small VC-dimension.   For “most” classes, no mapping for which concept-classified data becomes separable, has large margins. In all of these cases generalization is lost! In all of these cases generalization is lost!

Data-Dependent Success   Note that the definition of success for agnostic learning is data-dependent; The success rate of the learner on S is compared to that of the best h in H.   We extend this approach to a data-dependent success definition for approximations; The required success-rate is a function of the input data.

A New Success Criterion A A learning algorithm A is  margin  successful S  R n  {0,1} if, for every input S  R n  {0,1}, |{(x,y)  S: A (s) (x) = y}| > |{(x,y): h(x)=y and d(h, x) >  h for every half-space h.

Some Intuition   If there exist some optimal h which separates with generous margins, then a  margin algorithm must produce an optimal separator. On the other hand,   If every good separator can be degraded by small perturbations, then a  margin algorithm can settle for a hypothesis that is far from optimal.

A New Positive Result  For every positive , there is an efficient  margin algorithm. That is, the algorithm that classifies correctly as many input points as any half-space can classify correctly with margin 

 The positive result  For every positive  there is a  - margin algorithm whose running time is polynomial in |S| and n. A Complementing Hardness Result   Unless P = NP, no algorithm can do this in time polynomial in 1/  and in |S| and n ).

A  -margin Perceptron Algorithm   On input S consider all k-size sub-samples.   For each such sub-sample find its largest margin separating hyperplane.   Among all the (~|S| k ) resulting hyperplanes. choose the one with best performance on S. (The choice of k is a function of the desired margin   k ~   

Other  margin Algorithms Each of the following algorithms can replace the “find the largest margin separating hyperplane”   The usual “Perceptron Algorithm”.   “Find a point of equal distance from x 1, … x k “.   Phil Long’s ROMMAalgorithm. These are all very fast online algorithms.

Directions for Further Research   Can similar efficient algorithms be derived for more complex NN architectures?   How well do the new algorithms perform on real data sets?   Can the ‘local approximation’ results be extended to more geometric functions?

The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.

Similar presentations

Presentation on theme: "The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.

Similar presentations

Presentation on theme: "The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion."— Presentation transcript:

Similar presentations

About project

Feedback