# Machine Learning Week 3 Lecture 1. Programming Competition

## Presentation on theme: "Machine Learning Week 3 Lecture 1. Programming Competition"— Presentation transcript:

Machine Learning Week 3 Lecture 1

Programming Competition http://users-cs.au.dk/jasn/algocomp/programming_contest.pdf

Hand In

Student Comparison Efter 5 minutter: Train: MCC: 0.9481 0.9458 0.8737 0.8686 0.9017 0.8407 0.9316 0.9088 0.8493 0.8644 [[5693 0 23 19 13 44 48 6 68 9] [ 1 6508 33 33 6 38 7 19 87 10] [ 47 64 5192 100 123 23 103 108 163 35] [ 27 32 142 5355 5 252 39 70 141 68] [ 12 31 45 6 5378 8 61 13 43 245] [ 91 46 44 220 70 4522 109 29 210 80] [ 42 18 55 7 54 77 5605 3 54 3] [ 42 63 108 29 76 10 4 5718 18 197] [ 29 139 84 154 25 146 48 31 5095 100] [ 38 31 37 91 211 43 4 192 63 5239]] Test: MCC: 0.9525 0.9643 0.8855 0.8848 0.9060 0.8585 0.9308 0.9068 0.8543 0.8777 [[ 957 0 2 2 0 5 9 1 4 0] [ 0 1102 2 4 0 2 4 1 20 0] [ 11 4 903 15 15 1 14 18 42 9] [ 4 0 21 909 0 32 3 11 20 10] [ 1 2 6 1 913 1 11 1 8 38] [ 11 2 5 41 11 750 16 11 38 7] [ 13 3 4 3 15 11 906 1 2 0] [ 3 12 25 7 9 0 0 936 3 33] [ 9 8 9 25 8 20 12 15 859 9] [ 11 8 5 11 42 10 0 20 8 894]]

Today Recap Learning with infinite hypothesis sets Bias Variance The end of learning theory….

Recap

The Test Set – Take a look at E out Fixed hypothesis h, N independent data points, and any ε>0 Split your data into two parts D-train,D-test Train on D-train and select hypothesis h Test h on D-test, error Apply Hoeffding bound to Cannot be used for anything else

Generalization Bound Goal: Extend to infinite hypothesis spaces

Dichotomies Dichotomy bit string of length N Fixed set of N points X = (x 1,..,x N ) Hypothesis set Each gives a dichotomy

Growth, Shattering, Breaks If then we say that shatters (x 1,…,x N ) If no data set of size K can be shattered by then K is a break point for

Revisit Examples Positive Rays Intervals Convex sets a a1a1 a2a2 Impossible Dichotomy

2D Hyperplanes 3 is not a break point. Shatter Some Point Set of size 3

2D Hyperplanes 4 is break point Impossible Dichotomies 3 Points on a line Triangle With Point Inside Else

Very Important Theorem If has a break point then the growth function is polynomial Polynomial of max degree k-1

Growth Function Generalization Bound If growth function is polynomial we can learn with infinite hypothesis sets!!! VC Theorem Proof: Book Appendix (Intuition in book)

VC Dimension The VC Dimension of a hypothesis set is maximal number of points it can shatter, e.g. max N such that It is denoted d vc The smallest break point minus one

Revisit Examples Positive Rays Intervals Convex sets 2D Hyperplanes a a1a1 a2a2 VC Dim. 1 2 ∞ 3 Unbreakable

VC Dimension For Linear Classification VC Dimension for d dimensional inputs (d+1 parameters with the bias) is d+1 Proof Coming Up Show d vc ≥ d+1 Show d vc < d+2

d vc ≥ d+1 Idea: Make points (Vectors) independent, by using one dimension for each point X is a Matrix (d+1)x(d+1) Rows = Points Invertible Find a set of d+1 points we can shatter

d vc ≥ d+1 Consider Dichotomy Find θ such that Solve

d vc < d+2 They must be linearly dependent (more vectors than dimensions) Ignore zero terms For any d+2 points we must prove there is a dichotomy hyperplanes can not capture, Considerd+1 dimensional points (vectors)

d vc < d+2 Dichotomy: Claim. This dichotomy is impossible to capture with hyperplanes

d vc = d+1 For Hyperplanes d vc = number of free parameters

Vapnik-Chervonenskis (VC) Generalization Bound With Probability at least 1-δ For any tolerance δ >0 Quote Book “The VC Generalization bound is the most important mathematical result in the theory of learning”

VC Generaliation Bound Independent of learning algorithm, target function, P(x), and out of sample error – General Indeed We need to bound Growth function for all of our hypothesis spaces. We showed #free parameters for hyperplanes.

Exercise 2.5 with N=100, Probability that to be within 0.1 of ? Happens with probability 1-δ < 0 e.g. Ridiculous a

Cost of Generality Growth Function was really worst case Independent of P(x), target, out of sample error

Sample Complexity Fix tolerance δ, (success probability ≥ 1-δ) Consider generalization error at most ε. How big N? Upper bound it more by using VC dim polynomial

Sampling Complexity Plug in ε,δ = 0.1 Lets Plot function sc_vcdim() dat = 5000:2500:100000; hold off; plot(dat,dat,'r-','linewidth',3); hold on for i=3:9 tmp = 800*log(40*((2.*dat).^i+1)); plot(dat,tmp,'b-','linewidth',3); end

Sampling Complexity Book Statement. In Practice

VC Interpretation We can learn with infinite hypothesis sets. VC Dimension captures Effective number of parameters/ degrees of Freedom In Sample Error + Model Complexity

As a figure VC Dimension In Sample Error Model Complexity Out of sample error Balance These In Sample Error + Model Complexity

Model Selection t Models m 1,…,m t Which is better? E in (m 1 ) + Ω(m 1 ) E in (m 2 ) + Ω(m 2 ). E in (m t ) + Ω(m t ) Pick the minimum one Problem, Slack in Ω

Vapnik-Chervonenkis (VC) Theorem Test Set Estimate (M=1) a lot tighter With probability 1-delta For any delta > 0

Learning Summary There is theory for regression as well. We will not go there Move on to bias variance Last learning theory in this course.

Bias Variance Decomposition Consider Least Squares Error Measure again and see if we can understand out of sample Error. For simplicity, assume our target function is noiseless, e.g. it is an exact function of the input.

Experiments Take two functions ax+b, cx 2 + dx+e Target function which is ¼ < x < 3/4 Repeatedly pick 3 random data points (x,target(x)) and fit our models Plot it and see

Bias Variance Out of sample error we get depends on hypothesis. Hypothesis is result of learning algorithm which is a function of the training data Training data effects out of sample error Think of data as a random variable and analyze what happens. What happens if we repeatedly sample data and run our learning algorithm.

Notation = Hypothesis learned on D

Bias Variance Bias Variance

Bias Variance

Learning Curves Plot of In sample and out of sample error as a function of input size