Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Week 2 Lecture 2.

Similar presentations


Presentation on theme: "Machine Learning Week 2 Lecture 2."— Presentation transcript:

1 Machine Learning Week 2 Lecture 2

2 Hand In It is online. Web board forum for Matlab questions
Comments and corrections very welcome. I will upload new versions as we go along. Currently we are at version 3 Your data is coming. We might change it over time.

3 Quiz Go through all Questions

4 Recap

5 Impossibility of Learning!
x1 x2 x3 f(x) 1 ? What is f? There are 256 potential functions 8 of them has in sample error 0 Assumptions are needed

6 No Free Lunch "All models are wrong, but some models are useful.”
George Box Machine Learning has many different models and algorithms There is no single best model that works best for all problems (No Free Lunch Theorem) Assumptions that works well in one domain may fail in another

7 Probabilistic Approach
Repeat N times independently μ is unknown Sample mean: ν #heads/N Sample: h,h,h,t,t,h,t,t,h Hoeffdings Inequality Sample mean is probably approximately correct PAC

8 Classification Connection Testing a Hypothesis
Unknown Target Fixed Hypothesis Probability Distribution over x is probability of picking x such that f(x) ≠ h(x) is probability of picking x such that f(x) = h(x) μ is the sum of the probability of all the points X where hypothesis is wrong This is just the sum Sample Mean true error rate μ

9 Learning? Only Verification not Learning
For finite hypothesis sets we used union bound Make sure is close to and minimize

10 Error Functions Walmart. Discount for a given person Error Function
CIA Access (Friday bar stock) Error Function h(x)/f(x) Lying True Est. Lying Est. True h(x)/f(x) Lying True Est. Lying Est. True 1000 1 1 1000 Point being. Depends on application

11 Unknown Probability Distribution P(x)
Final Diagram Unknown Target Unknown Probability Distribution P(x) Learn Importance P(y | x) Data Set Learning Algorithm Hypothesis Set Final Hypothesis Error Measure e If x has very low probability then it is not really gonna count.

12 Today We are still only talking classification Test Sets
Work towards learning with infinite size hypothesis spaces for classification Reinvestigate Union Bound Dichotomies Break Points

13 The Test Set Split your data into two parts D-train,D-test
Fixed hypothesis h, N independent data points, and any ε>0 Split your data into two parts D-train,D-test Train on D-train and select hypothesis h Test h on D-test, error Apply Hoeffding bound to

14 Test Set Strong Bound: 1000 points then with 98% probability, in sample error will be within 5% of out of sample error Unbiased Just as likely to better than worse Problem lose data for training If Error is high it is not a help that it will also be high in practice Can NOT be used to select h (contamination)

15 Learning With Probability 1-δ
Pick a tolerance (risk) δ of failing you can accept Set RHS equal to δ and solve for ε = With Probability 1-δ Generalization Bound Why we minimize in sample error.

16 Union Bound

17 Union Bound Learning We did not subtract overlapping events!!!
Learning algorithm pick hypothesis hl P(hl is bad) is less than the probability that some hypothesis is bad We did not subtract overlapping events!!!

18 Hypotheses seem correlated
Change h2 h1 The differencesin E_out is the small triangle in between if h1 is bad (poor generalization) then probably so is h2 Hope to improve union bound result

19 Goal Replace M with something like effective number of hypotheses
General bound. E.g. independent, target function and input distribution Simple would be nice.

20 Look at finite point sets
Dichotomy bit string of length N Fixed set of N points X = (x1,..,xN) Hypothesis set Each gives a dichotomy How Many Different Dichotomies do we get? At Most Capturing the “expressiveness” of the hypothesis set on X

21 Growth Function Fixed set of N points X = (x1,..,xN) Hypothesis set

22 Example 1: Positive Rays
1-Dimensional input space (points on the real line) a Only Change When a moves to different interval

23 Example 2: Intervals 1-Dimensional input space (points on the real line) a1 a2 a1,a2 in separate parts + Put in same

24 Example 3: Convex Sets 2-Dimensional input space (points in the plane)
CIRCLE IS JUST FOR ILLUSTRATION

25 Goal Continued Prove we can replace M with growth function
Generalization Bound Imagine we can replace M with growth function RHS is dropping exponentially fast in N If Growth function is a polynomial in N then RHS still drops exponentially in N Bright Idea. Prove Growth function is polynomial in N Prove we can replace M with growth function

26 Bounding Growth Function
Might be hard to compute Instead of computing the exact value Prove that it is bounded by a polynomial

27 Shattering and Break Point
If then we say that shatters (x1,…,xN) If no data set of size K can be shattered by then K is a break point for If K is a break point for then so is all numbers larger than K? Why?

28 Revisit Examples Positive Rays Intervals Convex sets a a1 a2

29 2D Linear Classification (Hyperplanes)

30 2D Linear Classification
3 Points on a line For 2D Linear Classification Hypothesis set 4 is a break point

31 Break Points and Growth Function
If has a break point then the growth function is polynomial (needs proof) If not then it is not! By definition of break point:

32 Break Point Game Has Break Point 2 x1 x2 x3 1 x1 x2 1 1 1 1 1 1
1 x1 x2 1 1 1 Row 1,2,3,4 1 1 Row 6,5,2,1 1 Row 7,5,3,2 Impossible for 1 Row 8,5,3,2

33 Proof Coming If has a break point then the
growth function is polynomial

34 B(n,k) is the maximal number of dichotomies
Definition: B(n,k) is the maximal number of dichotomies possible on N points such that no subset of k points can be shattered by the dichotomies. More general than hypothesis sets If no data set of size K can be shattered by then K is a break point for B stands for binomial for any with break point k

35 Computing B(n,k) – Boundary Cases
Cannot shatter set of size 1. There is no way of picking dichotomies that gives different classes for a point. There is only one dichotomy since a different dichotomy would give different class for at least one point There is only one point, this only 2 dichotomies are possible

36 Compute B(N,k)- Recursion
List L with all dichotomies in B(n,k)

37 Recursion Consider the first n_1 points, there are α+β different (S2 sets are identical here) They can still at most shatter k points, e.g. B(N-1,k) is an upper bound Consider the first n_1 points in S2. If they can shatter k-1 points we can extend with last point where we have both combinations for all dichotomies. This gives k points we can shatter a contradiction. On the n-1 first we can only have a+b dichotomies since s20,s21 er ens på de første N-1 per konsruktion. Det betyder så at det er bundet at B(N-1,k) da vi heller ikke må shatter en delmængde af størrelse k af N-1 punkter når vi må på N Kigger på beta så kan vi se på s20. Hvis den kan shatter en mængde af størrelse k-1 på de første n-1 værdier så kan vi sammen med s2+ shatter en mængde af størrelse K på n punkter da vi har begge kombinationer af xn så e.g. tag de dichotomies der shatter k-1 punkter og brug dem igen, men tag både den fra s20 og den fra s21 så shatter vi de samme punkter som før samt xn det vil sige at

38 Proof Coming Base Cases:

39 Induction Step Show for N0+1 for k>1 (k=1 was base case)
should be 0 change parameter

40 Continue Make it into one sum Recurrence for binomials
Add in zero index again QED


Download ppt "Machine Learning Week 2 Lecture 2."

Similar presentations


Ads by Google