Presentation is loading. Please wait.

Presentation is loading. Please wait.

GAPS IN OUR KNOWLEDGE? a central problem in data modelling and how to get round it Rob Harrison AC&SE.

Similar presentations


Presentation on theme: "GAPS IN OUR KNOWLEDGE? a central problem in data modelling and how to get round it Rob Harrison AC&SE."— Presentation transcript:

1 GAPS IN OUR KNOWLEDGE? a central problem in data modelling and how to get round it Rob Harrison AC&SE

2 Agenda sampling density & distribution representation accuracy vs generality regularization trading off accuracy and generality cross validation what happens when we can’t see

3 The Data Modelling Problem y= f(x) z=y+e multivariate & non-linear measurement errors {x i, z i } i = 1:N z i = f(x i )+e i infer behaviour everywhere from a few examples little or no prior information on f(x)

4 A Simple Example one cycle of a sine wave “well-spaced” samples enough data (N = 6) noise-free

5

6

7 “observational” data

8

9

10

11

12 Sparsely Sampled two well-spaced samples

13

14

15

16 Densely Sampled 200 well-spaced samples

17

18 Large N 200 poorly-spaced samples

19

20 What’s So Hard? the gaps get more (well-spaced) data lack of prior knowledge can’t see dimension!!

21 Two Dimensions same sampling density (N = 6 2 ) well-spaced?

22

23

24

25 Dimensionality lose ability to see the shape of f(x) try it in 13-D number of samples exponential in d if N OK in 1-D, N d needed in d-D how do we know if “well-spaced”? how can we sample where the action is? observational vs experimental data!

26 Generic Structure use e.g. a power series Stone-Weierstrass other bases e.g. Fourier series … y = a 5 x 5 + a 4 x 4 + a 3 x 3 + …+ a 1 x + a 0  = 5 & six samples – no error!

27 PROBLEM SOLVED!

28 Generic Structure use e.g. a power series Stone-Weierstrass other bases e.g. Fourier series … y = a 5 x 5 + a 4 x 4 + a 3 x 3 + …+ a 1 x + a 0  = 5 & six samples – no error!  > 5 still no error but …

29 poor inter-sample behaviour how can we know without looking?

30 Generic Structure use e.g. a power series Stone-Weierstrass other bases e.g. Fourier series … y = a 5 x 5 + a 4 x 4 + a 3 x 3 + …+ a 1 x + a 0  = 5 & six samples – no error!  > 5 still no error but … measurement error

31

32 Generic Structure use e.g. a power series Stone-Weierstrass other bases e.g. Fourier series … y = a 5 x 5 + a 4 x 4 + a 3 x 3 + …+ a 1 x + a 0  = 5 & six samples – no error!  > 5 still no error but … measurement error model is as complex as data

33 Curse Of Dimension we can still use the idea but … in 2-D we get 21 terms direct and cross products in d-D we get (d+  )!/d!  ! e.g. transform a 16x16 bitmap by  =3 polynomial and get ~ 3 million terms sample size / distribution practical for “small” problems

34 Other Basis Functions Gaussian radial basis functions additional design choices how many?, where?, how wide? adaptive sigmoidal basis functions how many?

35 zero error? Overfitting (sample data) rough components

36 Underfitting (sample data) over-smooth components

37 v. small error “just right” components Goldilocks

38 Restricting “Flexibility” can we use data to tell the estimator how to behave? regularization/penalization penalize roughness e.g. SSE +  Q use potentially complex structure data constrains where it can Q constrains elsewhere

39 four narrow grbfs RMSE = 0.80

40 four narrow grbfs + penalty RMSE = 0.24

41 How Well Are We Doing? so far we know answer for d > 2 we are “blind” what’s happening between samples? test on new samples in between VALIDATION compute a performance index GENERALIZATION

42 Hold-out Method keep back P% for testing wasteful sample dependent Training RMSE 0.23 Testing RMSE 0.38

43 Cross Validation leave-one-out CV train on all but one test that one repeat N times compute performance m-fold CV divide sample into m non-overlapping sets proceed as above all data used for training and testing more work but realistic performance estimates used to choose “hyper-parameters” e.g. , number, width …

44 Conclusion gaps in our data cause most of our problems noise more data only part of the answer distribution / parsimony restricting flexibility helps if no evidence to contrary cross validation is a window into d-D estimates inter-sample behaviour


Download ppt "GAPS IN OUR KNOWLEDGE? a central problem in data modelling and how to get round it Rob Harrison AC&SE."

Similar presentations


Ads by Google