Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.

Similar presentations


Presentation on theme: "Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski."— Presentation transcript:

1 Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski

2 The Content Match Problem Advertisers Ads DB Ads Ad impression: Showing an ad to a user (click)

3 The Content Match Problem Advertisers Ads Ad click: user click leads to revenue for ad server and content provider Ads DB (click)

4 The Content Match Problem Advertisers Ads DB Ads The Content Match Problem: Match ads to pages to maximize clicks

5 The Content Match Problem Advertisers Ads DB Ads Maximizing the number of clicks means:  For each webpage, find the ad with the best Click-Through Rate (CTR),  but without wasting too many impressions in learning this.

6 Online Learning Maximizing clicks requires:  Dimensionality reduction  Exploration  Exploitation Both must occur together Online learning is needed, since the system must continuously generate revenue

7 Taxonomies for dimensionality reduction Root Apparel Computers Travel Already exist Actively maintained Existing classifiers to map pages and ads to taxonomy nodes Page/Ad Learn the matching from page nodes to ad nodes  dimensionality reduction

8 Online Learning Maximizing clicks requires:  Dimensionality reduction  Exploration  Exploitation Can taxonomies help in explore/exploit as well?  Taxonomy ?

9 Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

10 Background: Bandits Bandit “arms” p1p1 p2p2 p3p3 (unknown payoff probabilities) Pull arms sequentially so as to maximize the total expected reward Estimate payoff probabilities p i Bias the estimation process towards better arms

11 Background: Bandits Webpage 1 Bandit “arms” Webpage 2 Webpage 3 = ads ~10 6 ads ~10 9 pages

12 Background: Bandits Ads Webpages Content Match =A matrix Each row is a bandit Each cell has an unknown CTR One bandit Unknown CTR

13 Background: Bandits Bandit Policy 1.Assign priority to each arm 2.“Pull” arm with max priority, and observe reward 3.Update priorities Priority 1 Priority 2 Priority 3 Allocation Estimation

14 Background: Bandits Why not simply apply a bandit policy directly to our problem? Convergence is too slow ~10 9 bandits, with ~10 6 arms per bandit Additional structure is available, that can help  Taxonomies

15 Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

16 Multi-level Policy Ads Webpages …… …… classes Consider only two levels

17 Multi-level Policy Apparel Compu- ters Travel …… …… Consider only two levels Travel Compu- ters Apparel Ad parent classes Ad child classes Block One bandit

18 Multi-level Policy Apparel Compu- ters Travel …… …… Key idea: CTRs in a block are homogeneous Ad parent classes Block One bandit Travel Compu- ters Apparel Ad child classes

19 Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

20 Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

21 C AC T A T Multi-level Policy (Allocation) ? Page classifier Classify webpage  page class, parent page class Run bandit on ad parent classes  pick one ad parent class

22 C AC T A T Multi-level Policy (Allocation) Classify webpage  page class, parent page class Run bandit on ad parent classes  pick one ad parent class Run bandit among cells  pick one ad class In general, continue from root to leaf  final ad ? Page classifier ad

23 C AC T A T Multi-level Policy (Allocation) Bandits at higher levels use aggregated information have fewer bandit arms  Quickly figure out the best ad parent class Page classifier

24 Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

25 Multi-level Policy (Estimation) CTRs in a block are homogeneous Observations from one cell also give information about others in the block How can we model this dependence?

26 Multi-level Policy (Estimation) Shrinkage Model S cell | CTR cell ~ Bin (N cell, CTR cell ) CTR cell ~ Beta (Params block ) # clicks in cell # impressions in cell All cells in a block come from the same distribution

27 Multi-level Policy (Estimation) Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Prior block + (1-α).S cell /N cell Estimated CTR Beta prior (“block CTR”) Observed CTR

28 Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

29 Experiments Root 20 nodes 221 nodes … ~7000 leaves Taxonomy structure We use these 2 levels Depth 0 Depth 7 Depth 1 Depth 2

30 Experiments Data collected over a 1 day period Collected from only one server, under some other ad-matching rules (not our bandit) ~229M impressions CTR values have been linearly transformed for purposes of confidentiality

31 Experiments (Multi-level Policy) Multi-level gives much higher #clicks Number of pulls Clicks

32 Experiments (Multi-level Policy) Multi-level gives much better Mean-Squared Error  it has learnt more from its explorations Mean-Squared Error Number of pulls

33 Experiments (Shrinkage) Number of pulls Mean-Squared Error Clicks without shrinkage with shrinkage Shrinkage  improved Mean-Squared Error, but no gain in #clicks

34 Outline Problem Background: Multi-armed bandits Proposed Multi-level Policy Experiments Related Work Conclusions

35 Related Work Typical multi-armed bandit problems Do not consider dependencies Very few arms Bandits with side information Cannot handle dependencies among ads General MDP solvers Do not use the structure of the bandit problem Emphasis on learning the transition matrix, which is random in our problem.

36 Conclusions Taxonomies exist for many datasets They can be used for Dimensionality Reduction Multi-level bandit policy  higher #clicks Better estimation via shrinkage models  better MSE


Download ppt "Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski."

Similar presentations


Ads by Google