No Free Lunch (NFL) Theorem Many slides are based on a presentation of Y.C. Ho Presentation by Kristian Nolde.

Slides:



Advertisements
Similar presentations
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Equity, Efficiency and Need
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Part 3: The Minimax Theorem
Game-theoretic analysis tools Necessary for building nonmanipulable automated negotiation systems.
Chapter 4: Reasoning Under Uncertainty
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
April 2, 2015Applied Discrete Mathematics Week 8: Advanced Counting 1 Random Variables In some experiments, we would like to assign a numerical value to.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
The General Linear Model. The Simple Linear Model Linear Regression.
Christos alatzidis constantina galbogini.  The Complexity of Computing a Nash Equilibrium  Constantinos Daskalakis  Paul W. Goldberg  Christos H.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Chapter 6 Information Theory
1 Hiring Problem and Generating Random Permutations Andreas Klappenecker Partially based on slides by Prof. Welch.
Using process knowledge to identify uncontrolled variables and control variables as inputs for Process Improvement 1.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Incomplete Contracts Renegotiation, Communications and Theory December 10, 2007.
Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold Matthew J. Streeter Genetic Programming, Inc. Mountain View, California
ECE 667 Synthesis and Verification of Digital Systems
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 13, Slide 1 Chapter 13 From Randomness to Probability.
Multiplying Signed Numbers © Math As A Second Language All Rights Reserved next #9 Taking the Fear out of Math ×
1 1 Slide Decision Theory Professor Ahmadi. 2 2 Slide Learning Objectives n Structuring the decision problem and decision trees n Types of decision making.
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
3. Counting Permutations Combinations Pigeonhole principle Elements of Probability Recurrence Relations.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Copyright © 2010 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
Optimal Bayes Classification
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
M ONTE C ARLO SIMULATION Modeling and Simulation CS
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
From Randomness to Probability Chapter 14. Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen,
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Basic Principles (continuation) 1. A Quantitative Measure of Information As we already have realized, when a statistical experiment has n eqiuprobable.
Models for Strategic Marketing Decision Making. Market Entry Decisions To enter first or to wait Sources of First-Mover Advantages –Technological leadership.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
Sampling and estimation Petter Mostad
1 a1a1 A1A1 a2a2 a3a3 A2A Mixed Strategies When there is no saddle point: We’ll think of playing the game repeatedly. We continue to assume that.
The bin packing problem. For n objects with sizes s 1, …, s n where 0 < s i ≤1, find the smallest number of bins with capacity one, such that n objects.
Central Limit Theorem Let X 1, X 2, …, X n be n independent, identically distributed random variables with mean  and standard deviation . For large n:
1 Distributed Vertex Coloring. 2 Vertex Coloring: each vertex is assigned a color.
Copyright © 2010 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
AP Statistics From Randomness to Probability Chapter 14.
Hypothesis testing and statistical decision theory
Efficiency and Equity in a Competitive Market
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Choice under the following Assumptions
Reasoning Under Uncertainty in Expert System
Objective of This Course
Information Theoretical Analysis of Digital Watermarking
Implementation of Learning Systems
Chapter 2. Simplex method
Presentation transcript:

No Free Lunch (NFL) Theorem Many slides are based on a presentation of Y.C. Ho Presentation by Kristian Nolde

– 2/ August 2004 General notes Goal: Give an intuitive feeling for the NFL Present some mathemtical background To keep in mind NFL is an impossibility theorem, such as –Gödel‘s proof in mathematics (roughly: some facts cannot be proved or disaproved in any mathematical system) –Arrow‘s theorem in economics (in principle, perfect democracy is not realizable) Thus, practicle use is limited ?!?

– 3/ August 2004 The No Free Lunch Theorem Without specific structural assumptions, no optimization scheme can perform better than blind search on the average But blind search is very inefficient! Prob (at least one out of N samples is in the top-n for search space of size |  |) ~ nN/|  | ex. Prob= for |  |=10 9, n=1000, N=1000

– 4/ August 2004 Assume a finite World Finite # of input symbols (x’s) and finite # of output symbols (y’s) => finite # of possible mappings from input to output (f’s)

– 5/ August 2004 The Fundamental Matrix F x1x1 x2x2 x |X| f1f1 f2f2 f |F| FACT: equal number of 0’s and 1’s in each row! In each row, each value of Y appear |Y| |X|-1 times! Averaged over all f, the value is independent of x!

– 6/ August 2004 Compare Algorithms Think of two algorithms: a 1 and a 2 e.g. a 1 always selects from x 1 to x.5|X| a 2 always selects from x.5|X| to x |X| For specific f: a 1 or a 2 may be bettter. However, if f is not known average performance of both is equal: where d is a sample and d y is the cot value associated with d.

– 7/ August 2004 Comparing Algorithms Continued Case 1: Algorithms can be more specific, e.g. assume a certain realization f k, a 1 Case 2: Or, they can be more general, assume more uniform distribution of possible f, a 2. Then performance of a 1 will be excellent for f k but catastrophic for all other cases (great performance, no robustness) Contrary, a 2 performs mediocre for all cases, but doesn‘t fail (poor performance, high robustness) Common Sense says: Robustness * Efficiency = Constant or Generality * Depth = Constant

– 8/ August 2004 Implication 1 Let x be the optimization variable, f the performance function, and y the performance, i.e., y=f(x) then averaged over all possible optimization problems, the result is choice independent if you don’t know the structure of f (which column you are dealing with), blind choice is as good as any!

– 9/ August 2004 Implications 2 Let X be the strategy (control law, decision rule) space = |decisions| |information|, f the performance function, and y the performance, i.e., y=f(x) Same conclusion for stochastic optimal control, adaptive control, decision theory, game theory, learning control, etc. A “good”algorithm must be qualified!

– 10/ August 2004 Implications 2 Let X be the space of all possible representation (as in genetic algorithms), or space of all possible algorithms to apply to a class of problems Without understanding of the problem, blind choice is as good as any. “understanding” means you know which column of the F matrix you are dealing with

– 11/ August 2004 Implications 3 Even if you know which columns or group of columns you are dealing with => you can specialize the choice of rows You must accept that you will suffer LOSSES should other choices of column occur due to uncertainties or disturbances

– 12/ August 2004 The Fundamental Matrix F x1x1 x2x2 x |X| f1f1 f2f2 f |F| Assume a distribution of the columns, then pick a row that results in minimal expected losses or maximal performance. This is stochastic optimization

– 13/ August 2004 Implications 5 Worse, if you should estimate the probabilities incorrectly, then your stochastically optimized solution may suffer catastrophic bad outcomes more frequent then you like. Reason: you have already used up more of the good outcomes in your “optimal” choice. What are left are bad ones that are not suppose to occur! (HOT Design & power law -Doyle)

– 14/ August 2004 Implications 6 Generality for generality sake is not very fruitful Working on a specific problem can be rewarding Because: –the insight can be generalized –the problem is practically important –the effect