1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.

Slides:



Advertisements
Similar presentations
Sublinear Algorithms … Lecture 23: April 20.
Advertisements

VC Dimension – definition and impossibility result
Inapproximability of Hypergraph Vertex-Cover. A k-uniform hypergraph H= : V – a set of vertices E - a collection of k-element subsets of V Example: k=3.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Based on slides by Y. Peng University of Maryland
Applied Informatics Štefan BEREŽNÝ
Chapter 2 Multivariate Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Approximation Algorithms for Unique Games Luca Trevisan Slides by Avi Eyal.
Bundling Equilibrium in Combinatorial Auctions Written by: Presented by: Ron Holzman Rica Gonen Noa Kfir-Dahav Dov Monderer Moshe Tennenholtz.
The number of edge-disjoint transitive triples in a tournament.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
The main idea of the article is to prove that there exist a tester of monotonicity with query and time complexity.
Derandomized DP  Thus far, the DP-test was over sets of size k  For instance, the Z-Test required three random sets: a set of size k, a set of size k-k’
Social Choice Theory By Shiyan Li. History The theory of social choice and voting has had a long history in the social sciences, dating back to early.
Contents Introduction Related problems Constructions –Welch construction –Lempel construction –Golomb construction Special properties –Periodicity –Nonattacking.
Social Choice Theory By Shiyan Li. History The theory of social choice and voting has had a long history in the social sciences, dating back to early.
Visual Recognition Tutorial
Sampling and Approximate Counting for Weighted Matchings Roy Cagan.
1 Separator Theorems for Planar Graphs Presented by Shira Zucker.
K-Coloring k-coloring: A k-coloring of a graph G is a labeling f: V(G)  S, where |S|=k. The labels are colors; the vertices of one color form a color.
Definition and Properties of the Cost Function
A Fourier-Theoretic Perspective on the Condorcet Paradox and Arrow ’ s Theorem. By Gil Kalai, Institute of Mathematics, Hebrew University Presented by:
Lecture II.  Using the example from Birenens Chapter 1: Assume we are interested in the game Texas lotto (similar to Florida lotto).  In this game,
Introduction to Graph Theory
Linear Algebra Chapter 4 Vector Spaces.
UNIVERSAL FUNCTIONS A Construction Using Fourier Approximations.
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
Chapter 3 Vector Spaces. The operations of addition and scalar multiplication are used in many contexts in mathematics. Regardless of the context, however,
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Chapter 2 Mathematical preliminaries 2.1 Set, Relation and Functions 2.2 Proof Methods 2.3 Logarithms 2.4 Floor and Ceiling Functions 2.5 Factorial and.
 2004 SDU Lecture 7- Minimum Spanning Tree-- Extension 1.Properties of Minimum Spanning Tree 2.Secondary Minimum Spanning Tree 3.Bottleneck.
Number Theory Project The Interpretation of the definition Andre (JianYou) Wang Joint with JingYi Xue.
1 Elections and Manipulations: Ehud Friedgut, Gil Kalai, and Noam Nisan Hebrew University of Jerusalem and EF: U. of Toronto, GK: Yale University, NN:
1 Rainbow Decompositions Raphael Yuster University of Haifa Proc. Amer. Math. Soc. (2008), to appear.
1 Asymptotically optimal K k -packings of dense graphs via fractional K k -decompositions Raphael Yuster University of Haifa.
10. Lecture WS 2014/15 Bioinformatics III1 V10 Metabolic networks - Graph connectivity Graph connectivity is related to analyzing biological networks for.
1 Quasi-randomness is determined by the distribution of copies of a graph in equicardinal large sets Raphael Yuster University of Haifa.
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 22.
Main Menu Main Menu (Click on the topics below) Combinatorics Introduction Equally likely Probability Formula Counting elements of a list Counting elements.
Introduction to Graph Theory Lecture 13: Graph Coloring: Edge Coloring.
Introduction to Graph Theory
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
1 Covering Non-uniform Hypergraphs Endre Boros Yair Caro Zoltán Füredi Raphael Yuster.
Some Favorite Problems Dan Kleitman, M.I.T.. The Hirsch Conjecture 1. How large can the diameter of a bounded polytope defined by n linear constraints.
CS Lecture 26 Monochrome Despite Himself. Pigeonhole Principle: If we put n+1 pigeons into n holes, some hole must receive at least 2 pigeons.
Introduction to Graph Theory
12. Lecture WS 2012/13Bioinformatics III1 V12 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.
Approximation Algorithms based on linear programming.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Discrete Mathematics Lecture # 17 Function. Relations and Functions  A function F from a set X to a set Y is a relation from X to Y that satisfies the.
Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
3.8 Complex Zeros; Fundamental Theorem of Algebra
Based on slides by Y. Peng University of Maryland
3.3 Applications of Maximum Flow and Minimum Cut
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
The Curve Merger (Dvir & Widgerson, 2008)
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
V11 Metabolic networks - Graph connectivity
I.4 Polyhedral Theory (NW)
V12 Menger’s theorem Borrowing terminology from operations research
I.4 Polyhedral Theory.
V11 Metabolic networks - Graph connectivity
V11 Metabolic networks - Graph connectivity
Locality In Distributed Graph Algorithms
Presentation transcript:

1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista

2 In this lecture we would like to study the extent to which concepts of choice used in economic theory imply “learnable” behavior. Learning theory which was introduced and developed by Valiant, Angluin and others at the early 1970’s deals with the question how well a family of objects (functions in this lecture) is learnable from examples. Valiants learning concept is statistical i.e. the examples are chosen randomly.

3 In order to analyze Learnability we will use a basic model of statistical learning theory introduced by Valiant called the model of PAC- Learnability (PAC stands for “probably approximately correct”). It is based on choosing examples randomly according to some probability distribution.

4 Let Let0    1 and be a probability distribution on U. We say that F is learnable from t examples with probability of at least 1   with respect to the probability distribution if the following assertion holds:

5 For every f  F if and are chosen at random and independently according to the probability distribution and if f  F satisfies, i = 1, 2,..., t, then with probability of at least 1  . We say that F is learnable from t examples with probability of at least 1   if this is the case with respect to every probability distribution on U.

6 Given a set X of N alternatives, a choice function c is a mapping which assigns to a nonempty subset S  X an element c(S)  S. A “rationalizable” choice function is consistent with maximizing behavior, i.e. there is a linear ordering on the alternatives in X and c(S) is the maximum among the elements of S with respect to this ordering.

7 Rationalizable choice functions are characterized by the Independence of Irrelevant alternatives condition (IIA): the chosen element from a set is also chosen from every subset which contains it. A class of choice functions is symmetric if it is invariant under permutations of the alternatives.

8 In our lecture we will concentrate in the proof of the next two theorems. Theorem 1.1. A rationalizable choice function can be statistically learned with high probability from a number of examples which is linear in the number of alternatives. Theorem 1.2. Every symmetric class of choice functions requires at least a number of examples which is linear in the number of alternatives for learnability in the PAC-model.

9

10 Theorem 2.1. For a fixed value of 0    1, the number of examples t needed to learn a class of functions with probability of at least 1   is bounded above an below by a linear function of the P-dimension.

11 Proposition 2.2. Let Then, Proof: The proof is obvious since in order of the P-dimension of F to be s we need at least distinct functions.

12 If F can be learned from t examples with probability of at least 1   then so can F' (since we can choose to be supported only on U').

13 Let C be the class of rationalizable choice functions defined on nonempty subsets of a set X of alternatives where |X| =N. |C| =N! Proposition 2.2  Proposition 2.1  The number of examples required to learn a rationalizable function in the PAC-model is O(NlogN).

14 Theorem 3.1. Proof: We will first show that Consider the elements For each an order relation which satisfies gives an appropriate function.

15 Proof of Theorem 3.1(cont.) We will now show that We need to show that for every N sets and N elements i=1,2,…,N there is a subset S  {1,2,…,N} such that there is no linear order on the ground set X for which is the maximal element of if and only if i  S. Let, clearly we can assume for every k. Let

16 Proof of Theorem 3.1(cont.) For every j = 1,2,…,N consider the following vector:. Where:

17 Proof of Theorem 3.1(cont.) Note that all vectors belong to an N-1 dimensional of vectors whose sum of coordinates is 0. Therefore the vectors are linearly dependent. Suppose now that and that not all equal zero. Let S={ j| }. We will now show that there is no c  C such that when k  S and when k  S.

18 Proof of Theorem 3.1(cont.) Assume to the contrary that there is such a function c. Let, and let Denote by y the mth coordinate in the linear combination we will show that y is positive. If or if then

19 Proof of Theorem 3.1(cont.) Assume now that and therefore. There are two cases to consider: Therefore Contradiction!!!

20 Theorem 1.1. A rationalizable choice function can be statistically learned with high probability from a number of examples which is linear in the number of alternatives. Proof: Theorem 1.1 follows from Theorem 3.1 and theorem 2.1.

21 Let. Let 0  ,   1 and be a probability distribution on U. Let g be an arbitrary function. Define the distance from g to F, dist(g,F), to be the minimum probability over f  F that f(x)  g(x), with respect to. Given t random elements (drawn independently according to ), define the empirical distance of g to F, as: min

22 Theorem 4.1. There exists K( ,  ) such that for every probability distribution on U and every function, the number of independent random examples t needed such that with probability of at least 1- , is at most

23 Corollary 4.2. For every probability distribution on U and every function, if g agrees with a function in F on t independent random examples and t  then: dist(g,F) <  with probability of at least 1- .

24 The class of rationalizable choice functions is symmetric under relabeling of the alternatives. Mathematically speaking, every permutation  on X induces a symmetry among all choice functions given by A class of choice functions will be called symmetric if it is closed under all permutations of the ground set of alternatives X.

25 A choice function defined on pairs of elements is an asymmetric preference relation. Every choice function describes an asymmetric preference relation by restricting it to pairs of elements. Every choice function defined on pairs of elements of X describes a tournament whose vertices are the elements of X, such that for two elements x,y  X, c({x,y})=x if and only if in the graph induced by the tournament there is an edge oriented from x to y.

26 Theorem 5.1. (1) The P-dimension of every symmetric class C of preference relations (considered as choice functions on pairs) on N alternatives is at least  N/2 . (2) When N  8 the P-dimension is at least N-1. (3) when N  68, if the P-dimension is precisely N-1, then the class is the class of order relations.

27 Proof of Theorem 5.1 (1)Let Let Let c  C and assume without loss of generality that Let R  {1,2,…,m}. We will define as follows: (If N is odd define.)

28 Proof of Theorem 5.1(cont.) Therefore: andhence is satisfactory.

29 Proof of Theorem 5.1(cont.) (2) To prove part (2) we will use the next conjecture made by Rosenfeld and proved by Havet and Thomson: When N  8, for every path P on N vertices with an arbitrary orientation of the edges, every tournament on N vertices contains a copy of P.

30 Proof of Theorem 5.1(cont.) Let c be a choice function in the class and consider the tournament T described by c. Let Every choice function c' on describes a directed path P. Suppose that a copy of P can be found in our tournament and that the vertices of this copy (in the order they appear on the path) are Define a permutation  by The choice function  (c) will agree with c' on

31 Theorem 5.1 implies the following Corollary: Corollary 5.2. The P-dimension of every symmetric class of choice functions on N alternatives, N  8 is at least N-1.

32 Consider a tournament with N players such that for two players i and j there is a probability that i beats j in a match between the two. Among a set A  N of players let c(A) to be the player most likely to win a tournament involving the players in A. Consider the class W of choice functions that arise in this model where

33 Theorem 6.1. The class of choice functions W requires examples for learning in the PAC-model.

34 Consider m polynomials in r variables For a point the sign pattern where

35 Theorem 6.2. If the degree of every is at most D and if 2m>r then the number of sign patterns given by the polynomials is at most

36 Given a set A of players, the probability that the k-th player will be the winner in a tournament between the players of A is described by a polynomial Q(A,k) in the variables as follows: Let M= to be an s by s matrix representing the out come of all matches between the players of A in a possible tournament such that if player i won the match against player j and otherwise.

37 Proof of Theorem 6.1(cont.) The probability that such a matrix M will represent the results of matches in a tournament is: Define Q(A,k)= C(A) is the player k  A for which Q(A,k) is maximal.

38 Proof of Theorem 6.1(cont.) Q(A,k) is a polynomial of degree N(N-1)/2 in N(N-1)/2 variables i, j  A. Now consider Q(A,k, j) = Q(A,k) - Q(A, j) for all subsets A  N and k, j  A k  j. We have all together less than polynomials in N(N-1)/2 variables The degree of these polynomials is at most N(N-1)/2.

39 Proof of Theorem 6.1(cont.) Now c(A)=k  Q(A,k, j) >0 for every j  A j  k. Therefore the choice function given by a vector of probabilities is determined by the sign pattern of all polynomials Q(A,k, j). We can now invoke Warren’s theorem with r = D = and According to Warren’s theorem the number of different sign patterns of the polynomials is at most

40 Proof of Theorem 6.1(cont.) Therefore it follows from theorem 2.1 and Proposition 2.2 that the number of examples needed to learn W in the PAC-model is

41 Our main result determined the P-dimension of the class of rationalizable choice functions and showed that the number of examples needed to learn a rationalizable choice function is linear in the number of alternatives. We also described a mathematical method for analyzing the statistical learnability of complicated choice models.