Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Syntactic Justification of Occam’s Razor 1 John Woodward, Andy Evans, Paul Dempster Foundations of Reasoning Group University of Nottingham Ningbo, China.

Similar presentations


Presentation on theme: "A Syntactic Justification of Occam’s Razor 1 John Woodward, Andy Evans, Paul Dempster Foundations of Reasoning Group University of Nottingham Ningbo, China."— Presentation transcript:

1 A Syntactic Justification of Occam’s Razor 1 John Woodward, Andy Evans, Paul Dempster Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学

2 Overview  Occam’s Razor  Sampling of Program Spaces (Langdon)  Definitions  Assumptions  Proof  Further Work  Context 2

3 Occam’s Razor  Occam’s Razor says has been adopted by the machine learning community to mean;  “Given two hypotheses which agree with the observed data, pick the simplest, as this is more likely to make the correct predictions” 3

4 Definitions Program Hypothesis Size Function Set of predictions (concept) Complexity 4

5 5

6 Langdon 1 (Foundation of Genetic Programming) 1. The limiting distribution of functions is independent of program size! There is a correlation between the frequency in the limiting distribution and the complexity of a function. 6

7 Langdon 2 (Foundation of Genetic Programming) 7

8 Hypothesis-Concept Spaces 8

9 Notation  P is the hypothesis space (i.e. a set of programs).  |P| is the size of the space (i.e. the cardinality of the set of programs).  F is the concept space (i.e. a set of functions represented by the programs in P).  |F| is the size of the space (i.e. the cardinality of the set of functions).  If two programs pi and pj map to the same function (i.e. they are interpreted as the same function, I(pi)=f=I(pj)), they belong to the same equivalence class (i.e. pi is in [pj] ↔ I(pi)=I(pj)). The notation [p] denotes the equivalence class which contains the program p (i.e. given I(pi)=I(pj), [pi]=[pj]). The size of an equivalence class [p] is denoted by |[p]|. 9

10 Two assumptions 1. Uniformly sample the hypothesis space, probability of sampling a given program is 1/|P|. 2. There are fewer hypotheses that represent complex functions  |[p1]|>|[p2]| ↔ c(f1)

11 Proof  starting from a statement of the assumption  |[p1]|>|[p2]| ↔ c(f1)< c(f2)  Dividing the left hand side by |P|,  |[p1]|/|P|>|[p2]|/|P| ↔ c(f1)< c(f2)  As |[p1]|/|P| = p(I(p1)) =p(f1), we can rewrite as  p(f1)>p(f2) ↔ c(f1)< c(f2)  a mathematical statement of Occam’s razor. 11

12 Restatement of Occam’s Razor  Often stated as “prefer the shortest consistent hypothesis”  Restatement of Occam’s Razor: The preferred function is the one that is represented most frequently. The equivalence class which contains the shortest program is represented most frequently. 12

13 Summary  Occam’s razor states “pick the simplest hypothesis consistent with data”  We agree, but for a different reason.  Restatement. Pick the function that is represented most frequently (i.e. belongs to the largest equivalence class). Occam’s razor is concerned with probability, and we present a simple counting argument.  Unlike many interpretations of Occam’s razor we do not throw out more complex hypotheses we count them in [p].  We offer no reason to believe the world is simple, our razor only gives a reason to predict using the simplest hypothesis. 13

14 further work To prove Assumption 2  there are fewer hypotheses that represent complex functions: |[p1]|>|[p2]| ↔ c(f1)

15 Further work  Further work -> to prove out assumptions.  Does it depend on the primitive set???  How are the primitive linked together (e.g. tree, lists, directed acyclic graphs…) 15

16 How does nature compute?  Heuristics such as Occam’s razor need not be explicitly present as rules.  Random searches of an agents generating capacity may implicitly carry heuristics.  Axiomatic reasoning probably comes late. 16

17 Thanks & Questions? 1) Thomas M. Cover and Joy A. Thomas. Elements of information theory. John Wiley and Sons ) Michael J. Kearns and Umesh V. Vazirani. An introduction to computational learning theory. MIT Press, ) William B. Langdon. Scaling of program fitness spaces. Evolutionary Computation, 7(4): , ) Tom M. Mitchell. Machine Learning. McGraw-Hill ) S. Russell and P. Norvig. Artificial Intelligence: A modern approach. Prentice Hall, ) G. I. Webb. Generality is more significant than complexity: Toward an alternative to occam’s razor. In 7 th Australian Joint Conference on Artificial Intelligence – Artificial Intelligence: Sowing the Seeds for the Future, 60-67, Singapore, 1994, World Scientific. 7) Ming Li and Paul Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications (2 nd Ed.). Springer Verlag. 17


Download ppt "A Syntactic Justification of Occam’s Razor 1 John Woodward, Andy Evans, Paul Dempster Foundations of Reasoning Group University of Nottingham Ningbo, China."

Similar presentations


Ads by Google