Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.

Similar presentations


Presentation on theme: "1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista."— Presentation transcript:

1 1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista

2 2 In this lecture we would like to study the extent to which concepts of choice used in economic theory imply “learnable” behavior. Learning theory which was introduced and developed by Valiant, Angluin and others at the early 1970’s deals with the question how well a family of objects (functions in this lecture) is learnable from examples. Valiants learning concept is statistical i.e. the examples are chosen randomly.

3 3 In order to analyze Learnability we will use a basic model of statistical learning theory introduced by Valiant called the model of PAC- Learnability (PAC stands for “probably approximately correct”). It is based on choosing examples randomly according to some probability distribution.

4 4 Let Let0    1 and be a probability distribution on U. We say that F is learnable from t examples with probability of at least 1   with respect to the probability distribution if the following assertion holds:

5 5 For every f  F if and are chosen at random and independently according to the probability distribution and if f  F satisfies, i = 1, 2,..., t, then with probability of at least 1  . We say that F is learnable from t examples with probability of at least 1   if this is the case with respect to every probability distribution on U.

6 6 Given a set X of N alternatives, a choice function c is a mapping which assigns to a nonempty subset S  X an element c(S)  S. A “rationalizable” choice function is consistent with maximizing behavior, i.e. there is a linear ordering on the alternatives in X and c(S) is the maximum among the elements of S with respect to this ordering.

7 7 Rationalizable choice functions are characterized by the Independence of Irrelevant alternatives condition (IIA): the chosen element from a set is also chosen from every subset which contains it. A class of choice functions is symmetric if it is invariant under permutations of the alternatives.

8 8 In our lecture we will concentrate in the proof of the next two theorems. Theorem 1.1. A rationalizable choice function can be statistically learned with high probability from a number of examples which is linear in the number of alternatives. Theorem 1.2. Every symmetric class of choice functions requires at least a number of examples which is linear in the number of alternatives for learnability in the PAC-model.

9 9

10 10 Theorem 2.1. For a fixed value of 0    1, the number of examples t needed to learn a class of functions with probability of at least 1   is bounded above an below by a linear function of the P-dimension.

11 11 Proposition 2.2. Let Then, Proof: The proof is obvious since in order of the P-dimension of F to be s we need at least distinct functions.

12 12 If F can be learned from t examples with probability of at least 1   then so can F' (since we can choose to be supported only on U').

13 13 Let C be the class of rationalizable choice functions defined on nonempty subsets of a set X of alternatives where |X| =N. |C| =N! Proposition 2.2  Proposition 2.1  The number of examples required to learn a rationalizable function in the PAC-model is O(NlogN).

14 14 Theorem 3.1. Proof: We will first show that Consider the elements For each an order relation which satisfies gives an appropriate function.

15 15 Proof of Theorem 3.1(cont.) We will now show that We need to show that for every N sets and N elements i=1,2,…,N there is a subset S  {1,2,…,N} such that there is no linear order on the ground set X for which is the maximal element of if and only if i  S. Let, clearly we can assume for every k. Let

16 16 Proof of Theorem 3.1(cont.) For every j = 1,2,…,N consider the following vector:. Where:

17 17 Proof of Theorem 3.1(cont.) Note that all vectors belong to an N-1 dimensional of vectors whose sum of coordinates is 0. Therefore the vectors are linearly dependent. Suppose now that and that not all equal zero. Let S={ j| }. We will now show that there is no c  C such that when k  S and when k  S.

18 18 Proof of Theorem 3.1(cont.) Assume to the contrary that there is such a function c. Let, and let Denote by y the mth coordinate in the linear combination we will show that y is positive. If or if then

19 19 Proof of Theorem 3.1(cont.) Assume now that and therefore. There are two cases to consider: Therefore Contradiction!!!

20 20 Theorem 1.1. A rationalizable choice function can be statistically learned with high probability from a number of examples which is linear in the number of alternatives. Proof: Theorem 1.1 follows from Theorem 3.1 and theorem 2.1.

21 21 Let. Let 0  ,   1 and be a probability distribution on U. Let g be an arbitrary function. Define the distance from g to F, dist(g,F), to be the minimum probability over f  F that f(x)  g(x), with respect to. Given t random elements (drawn independently according to ), define the empirical distance of g to F, as: min

22 22 Theorem 4.1. There exists K( ,  ) such that for every probability distribution on U and every function, the number of independent random examples t needed such that with probability of at least 1- , is at most

23 23 Corollary 4.2. For every probability distribution on U and every function, if g agrees with a function in F on t independent random examples and t  then: dist(g,F) <  with probability of at least 1- .

24 24 The class of rationalizable choice functions is symmetric under relabeling of the alternatives. Mathematically speaking, every permutation  on X induces a symmetry among all choice functions given by A class of choice functions will be called symmetric if it is closed under all permutations of the ground set of alternatives X.

25 25 A choice function defined on pairs of elements is an asymmetric preference relation. Every choice function describes an asymmetric preference relation by restricting it to pairs of elements. Every choice function defined on pairs of elements of X describes a tournament whose vertices are the elements of X, such that for two elements x,y  X, c({x,y})=x if and only if in the graph induced by the tournament there is an edge oriented from x to y.

26 26 Theorem 5.1. (1) The P-dimension of every symmetric class C of preference relations (considered as choice functions on pairs) on N alternatives is at least  N/2 . (2) When N  8 the P-dimension is at least N-1. (3) when N  68, if the P-dimension is precisely N-1, then the class is the class of order relations.

27 27 Proof of Theorem 5.1 (1)Let Let Let c  C and assume without loss of generality that Let R  {1,2,…,m}. We will define as follows: (If N is odd define.)

28 28 Proof of Theorem 5.1(cont.) Therefore: andhence is satisfactory.

29 29 Proof of Theorem 5.1(cont.) (2) To prove part (2) we will use the next conjecture made by Rosenfeld and proved by Havet and Thomson: When N  8, for every path P on N vertices with an arbitrary orientation of the edges, every tournament on N vertices contains a copy of P.

30 30 Proof of Theorem 5.1(cont.) Let c be a choice function in the class and consider the tournament T described by c. Let Every choice function c' on describes a directed path P. Suppose that a copy of P can be found in our tournament and that the vertices of this copy (in the order they appear on the path) are Define a permutation  by The choice function  (c) will agree with c' on

31 31 Theorem 5.1 implies the following Corollary: Corollary 5.2. The P-dimension of every symmetric class of choice functions on N alternatives, N  8 is at least N-1.

32 32 Consider a tournament with N players such that for two players i and j there is a probability that i beats j in a match between the two. Among a set A  N of players let c(A) to be the player most likely to win a tournament involving the players in A. Consider the class W of choice functions that arise in this model where

33 33 Theorem 6.1. The class of choice functions W requires examples for learning in the PAC-model.

34 34 Consider m polynomials in r variables For a point the sign pattern where

35 35 Theorem 6.2. If the degree of every is at most D and if 2m>r then the number of sign patterns given by the polynomials is at most

36 36 Given a set A of players, the probability that the k-th player will be the winner in a tournament between the players of A is described by a polynomial Q(A,k) in the variables as follows: Let M= to be an s by s matrix representing the out come of all matches between the players of A in a possible tournament such that if player i won the match against player j and otherwise.

37 37 Proof of Theorem 6.1(cont.) The probability that such a matrix M will represent the results of matches in a tournament is: Define Q(A,k)= C(A) is the player k  A for which Q(A,k) is maximal.

38 38 Proof of Theorem 6.1(cont.) Q(A,k) is a polynomial of degree N(N-1)/2 in N(N-1)/2 variables i, j  A. Now consider Q(A,k, j) = Q(A,k) - Q(A, j) for all subsets A  N and k, j  A k  j. We have all together less than polynomials in N(N-1)/2 variables The degree of these polynomials is at most N(N-1)/2.

39 39 Proof of Theorem 6.1(cont.) Now c(A)=k  Q(A,k, j) >0 for every j  A j  k. Therefore the choice function given by a vector of probabilities is determined by the sign pattern of all polynomials Q(A,k, j). We can now invoke Warren’s theorem with r = D = and According to Warren’s theorem the number of different sign patterns of the polynomials is at most

40 40 Proof of Theorem 6.1(cont.) Therefore it follows from theorem 2.1 and Proposition 2.2 that the number of examples needed to learn W in the PAC-model is

41 41 Our main result determined the P-dimension of the class of rationalizable choice functions and showed that the number of examples needed to learn a rationalizable choice function is linear in the number of alternatives. We also described a mathematical method for analyzing the statistical learnability of complicated choice models.


Download ppt "1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista."

Similar presentations


Ads by Google