Game Theory: Optimal stopping problem I: Introduction Sarbar Tursunova.

Game Theory: Optimal stopping problem I: Introduction Sarbar Tursunova

Table of content 1. The definition of the Problem 2. The house-selling problem 3. Maximizing the average 4. The one-armed bandit 5. Detecting a change-point

The definition of the problem Stopping rule problems are associated with two objects: a sequance of random variables, X 1, X 2,......., whose joint distribution is assumed known a sequence of real-valued reward functions: y o, y 1 (x 1 ), y 2, (x 1,x 2 ),........ Your problem is to choose a time to stop to maximize the expected reward.

The house-selling problem Assume: X n - the amount of the offer received on day n c- cost of living c>0 amount of cost observation -------------------------------------------------- Offer X n Accept or deny?

The problems with recall were introduced by MacQueen and Miller (1960), Derman and Sacks( 1960) and Chow and Robbins (1961), and, with discount rather than cost, by Karlin (1962). In the economics literature, this problem is called a job search problems and is attributed to George Stigler (1961, 1962). An unemployed worker is searching for a job. Each search costs a certain amount in time and lost wages. How many searched should the worker undertake before accepting the best offer so far found? For a review form this viewpoint, see Lippman and McCall (1976)

Maximizing the average Observation of coin being tossed What stopping rule should you employ to maximize your expected payoff? An how great an expected payoff can you obtain? This problem first time was studied by Y.S. Chow and H. Robbins (1965)

Put the problem of maximizing the average in the form of a stopping rule problem: let X 1, X 2.... be independent identially distributed random variables with a known distribution having a finite mean m, and let y 0 =m y n (x 1,.....,x n )= (x 1 +...+x n )/n for n=1,2,... y ∞ (x 1,x 2,...)= m This assumes that if you don't take any observations you receive m.

The One-armed bandit (Bradt, Johnson and Karlin (1956)) Given: The Standard treatment T2, probability p 0 Treatment T1, unknown probability p A group of patients = n Decide which treatment to give to give each patient= ??? Objective is to cure as many of the patients as possible. Your payoff is the number of patients cured. Shows that if it is ever optimal to use T2 on a patient, then it is optimal to continue to use T2 on all subsequent patients. The problem is when to start a treatment.

Detecting a change-point (Shiryaev (1963)) random variables X 1, X 2,..... distribution F o time T other distribution F 1 change c>0 Total cost : Y n = cI {n /T} for n=0,1,... and Y ∞ = ∞

In this display, I(A) the indicator function of a set A; for example, I {n<T} is equal to 1 if n<T, and to zero otherwise. Since T is a random unobservable quantity, we may replace Y n by its conditional expected value given X 1,......, X n y n =cP(T>N/F n )+E((n-T) + /F n ) for n=0,1,..... and Y ∞ = ∞ applications include monitoring hear patients for a change in pulse rate, monitoring a production line for a change in quality.

Thank you for your attention!!!

Game Theory: Optimal stopping problem I: Introduction Sarbar Tursunova.

Similar presentations

Presentation on theme: "Game Theory: Optimal stopping problem I: Introduction Sarbar Tursunova."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Game Theory: Optimal stopping problem I: Introduction Sarbar Tursunova.

Similar presentations

Presentation on theme: "Game Theory: Optimal stopping problem I: Introduction Sarbar Tursunova."— Presentation transcript:

Similar presentations

About project

Feedback