Presentation is loading. Please wait.

Presentation is loading. Please wait.

Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Similar presentations


Presentation on theme: "Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)"— Presentation transcript:

1 Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

2 Sponsored Search  How does it work? Search engine displays ads next to search results Advertisers pay search engine per click  Who benefits from it? Main source of funding for search engines Information flow from advertisers to users

3 Sponsored Search  Click-through-rate (CTR): given an ad and a query, CTR = probability that the ad receives a click  Optimal policy to maximize search engine’s revenue: display ads of highest (CTR x bid) value Search query results Sponsored search results

4 Challenges in Sponsored Search  Problem: CTRs initially unknown estimating CTRs requires going around the circle  Exploration/Exploitation Tradeoff: explore ads to estimate CTRs exploit known high-CTR ads to maximize revenue refine CTR estimates record clicks show ads earn revenue

5 The Advertisement Problem  Problem: Advertiser A i submits ad a i,j for Query phrase Q j User clicks on a ij -> A i pays b ij (the “bid value”) Queries arrive one after another Select ads to show for each query, in an online fashion  Constraints: Show at most C ads per query Advertisers have daily budgets: A i pays at most d i  Goal: Maximize search engine’s revenue Advertisers Query phrases a 1,1 A1A1 Q1Q1 a 2,1 a 1,3 A2A2 A3A3 Q2Q2 Q3Q3 a 3,2 d1d1 d2d2 d3d3 BudgetsAds

6 Our Approach  Unbudgeted Advertisement Problem Isomorphic to multi-armed bandit problem  Budgeted Advertisement Problem Similar to bandit problem, but with additional budget constraints that span arms Introduce Budgeted Multi-armed Multi-bandit problem (BMMP)

7 Unbudgeted Advertisement Problem as Multi-armed Bandit Problem  Bandit: Classical example of online learning under the explore/exploit tradeoff K arms. Arm i has an associated reward r i and unknown payoff probability p i Pull C arms at each time instant to maximize the reward accrued over time p1p1 p2p2 p3p3  Isomorphism: query phrase bandit instance; ads arms; CTR payoff probability; bid reward

8 Policy for Unbudgeted Problem  Policy “MIX” ( adopted from [Auer et. al. ML’02] )  When query phrase Q j arrives Compute the priority p i,j of each ad a i,j where p i,j = (e i,j + sqrt(2 ln n j / n i,j )). b i,j  e i,j is the MLE of the CTR value of a i,j  b i,j is the price or bid value of ad a i,j  n i,j : # times ad a i,j has been shown in the past  n j : # times query Q j has been answered Display the C highest-priority ads

9 Budgeted Multi-armed Multi-Bandit problem (BMMP)  Finite set of bandit instances; each instance has a finite number of arms  Each arm has an associated type  Each type T i has budget d i Upper limit on the total amount of reward that can be generated by the arms of type T i  An external actor invokes a bandit instance at each time instant the policy must choose C arms of the invoked instance

10 Meta Policy for BMMP  Input: BMMP instance and policy POL for the conventional multi-armed bandit problem  Output: The following Policy BPOL Run POL in parallel for each bandit instance B i Whenever B i is invoked:  Discard arm(s) with depleted budget  If one or more arms was discarded, restart POL i  Let POL i decide which of the remaining arms to activate

11 Performance Guarantee of BPOL  OPT = algorithm that knows in advance: 1.Full sequence of bandit invocations 2.Payoff probabilities  Claim: bpol(N) >= opt(N)/2 – O(f(N)) bpol(N): total expcted reward of BPOL policy after N bandit invocations opt(N): total expected reward of OPT f(N): regret of POL after N invocations of the regular bandit problem

12 Proof of Performance Guarantee  Divide the time instants into 3 categories: 1 : BPOL chooses an arm of higher expected reward than OPT  opt 1 (N) <= bpol 1 (N) 2 : BPOL chooses an arm of lower expected reward because OPT’s arm has run out of budget  opt 2 (N) <= bpol 2 (N) + (#types. max reward) 3 : otherwise  opt 3 (N) = O(f(N))  Claim (implies from the above bounds) opt(N) <= bpol(N) + bpol(N) + O(1) + O(f(N)) bpol(N) >= opt(N)/2 – O(f(n))

13 Advertisement Policies  BMIX : Output of our generic BPOL policy when given MIX as input  BMIX-E : Replace sqrt(2 ln n j / n i,j ) in priority p i,j by sqrt(min(0.25, V(n i,j,n j )). ln n j / n i,j ), where V(n i,j,n j ) = e i,j.(1-e i,j ). sqrt(2 ln n j / n i,j ) Suggested in Auer. et. al. ML’02. Purpose: Aggressive exploitation  BMIX-T : Replace b i,j in priority p i,j by b i,j. throttle(d i ‘), throttle(d i ‘) = 1-e^(- d i ‘/d i ) where d i ‘ is the remaining budget of advertiser A i Suggested in Mehta et. al. FOCS’05 Purpose: Delay the depletion of advertisers’ budgets  BMIX-ET: with both E and T modifications

14 Experiments  Simulations over real data  Data: 85,000 query phrases from Yahoo! query log Yahoo! ads with daily budget constraints CTRs drawn from Yahoo!’s CTR distribution Simulated user clicks using the CTR values  Time horizon = multiple days Policies carried over the CTR estimates from one day to the next

15 Results  GREEDY : select ads with highest current reward estimate (e i,j. b i,j ) Does not explore. Only exploits. *Revenue values scaled for confidentiality reasons

16 Conclusion  Search advertisement problem Exploration/exploitation tradeoff Model as multi-armed bandit  Introduced new Bandit variant Budgeted multi-armed multi-bandit problem (BMMP) New policy for BMMP with performance guarantee  In paper: Variable set of ads (ads come and go) Prior CTR estimates


Download ppt "Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)"

Similar presentations


Ads by Google