Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)

Similar presentations


Presentation on theme: "Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)"— Presentation transcript:

1 Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)

2 Research © 2008 Yahoo! Online Advertising Multi-billion dollar industry, high growth –$9.7B in 2006 (17% increase), total $150B Why this will continue? –Broadband cheap, ubiquitous –“Getting things done” easier on the internet –Advertisers shifting dollars Why does it work? –Massive scale, automated, low marginal cost –Key: Monetize more and better, “learn from data” –New discipline “Computational Advertising”

3 Research © 2008 Yahoo! What is “Computational Advertising”? New scientific sub-discipline, at the intersection of –Large scale search and text analysis –Information retrieval –Statistical modeling –Machine learning –Optimization –Microeconomics

4 Research © 2008 Yahoo! Online advertising: 6000 ft Overview Advertisers Ad Network Ads Content Pick ads User Content Provider Examples: Yahoo, Google, MSN, RightMedia, …

5 Research © 2008 Yahoo! Outline Background on online advertising –Sponsored Search, Content Match, Display, Unified marketplace The Fundamental Problem Statistical sub-problems: –Description –Existing methods –Challenges

6 Research © 2008 Yahoo! Different flavors Online Advertising Revenue Models Advertising Setting Misc. CPMCPCCPA DisplayContent Match Sponsored Search Ad exchanges

7 Research © 2008 Yahoo! Revenue Models CPMCPCCPA Advertisers Ads Content Pick ads User Cost Per iMpression $$ $ Content Provider Ad Network

8 Research © 2008 Yahoo! Revenue Models CPMCPCCPA Advertisers Ads Content Pick ads User Cost Per Click $$ $ Content Provider Ad Network click

9 Research © 2008 Yahoo! Revenue Models CPMCPCCPA Advertisers Ads Content Pick ads User Cost Per Action $$ $ Content Provider Ad Network click Advertiser landing page

10 Research © 2008 Yahoo! Revenue Models Example: Suppose we show an ad N times on the same spot Under CPM: Revenue = N * CPM Under CPC: Revenue = N * CTR * CPC CPMCPCCPA Click-through Rate (probability of a click given an impression) Depends on auction mechanism

11 Research © 2008 Yahoo! Auction Mechanism Revenue depends on type of auction –Generalized First-price: CPC = bid on clicked ad –Generalized Second-price: CPC = bid of ad below clicked ad (or the reserve price) CPC could be modified by additional factors [Optimal Auction Design in a Multi-Unit Environment: The Case of Sponsored Search Auctions] by Edelman+/2006 [Internet Advertising and the Generalized Second Price Auction…] by Edelman+/2006

12 Research © 2008 Yahoo! Revenue Models Example: Suppose we show an ad N times on the same spot Under CPM: Revenue = N * CPM Under CPC: Revenue = N * CTR * CPC Under CPA: Revenue = N * CTR * Conv. Rate * CPA CPMCPCCPA Conversion Rate (probability of a user conversion on the advertiser’s landing page given a click)

13 Research © 2008 Yahoo! Revenue Models CPM website traffic CPC website traffic + ad relevance Revenue dependence CPA website traffic + ad relevance + landing page quality Relevance to advertisers Prices and Bids Ease of picking ads

14 Research © 2008 Yahoo! Background Online Advertising Revenue Models Advertising Setting Misc. CPMCPCCPA DisplayContent Match Sponsored Search Ad exchanges

15 Research © 2008 Yahoo! Advertising Setting Advertisers Ad Network Content Pick ads User Content Provider Ads What do you show the user? How does the user interact with the ad system?

16 Research © 2008 Yahoo! Advertising Setting DisplayContent Match Sponsored Search

17 Research © 2008 Yahoo! Advertising Setting DisplayContent Match Sponsored Search Pick ads

18 Research © 2008 Yahoo! Advertising Setting Graphical display ads Mostly for brand awareness Revenue model is typically CPM DisplayContent Match Sponsored Search

19 Research © 2008 Yahoo! Advertising Setting DisplayContent Match Sponsored Search Content match ad

20 Research © 2008 Yahoo! Advertising Setting DisplayContent Match Sponsored Search Pick ads Text ads Match ads to the content

21 Research © 2008 Yahoo! Advertising Setting The user intent is unclear Revenue model is typically CPC Query (webpage) is long and noisy DisplayContent Match Sponsored Search

22 Research © 2008 Yahoo! Advertising Setting DisplayContent Match Sponsored Search Search Query Sponsored Search Ads

23 Research © 2008 Yahoo! Advertising Setting DisplayContent Match Sponsored Search Pick ads Text ads Search Query Match ads to the query

24 Research © 2008 Yahoo! Advertising Setting User “declares” his/her intention Click rates generally higher than for Content Match Revenue model is typically CPC (recently some CPA) Query is short and less noisy than Content Match DisplayContent Match Sponsored Search

25 Research © 2008 Yahoo! Summary Different revenue models –Depends on the goal of the advertiser campaign Brand awareness –Display advertising –Pay per impression (CPM) Attracting users to advertised product –Content Match, Sponsored Search –Pay per click (CPC), Pay per action (CPA)

26 Research © 2008 Yahoo! Background Online Advertising Revenue Models Advertising Setting Misc. CPMCPCCPA DisplayContent Match Sponsored Search Ad exchanges

27 Research © 2008 Yahoo! Unified Marketplace Publishers, Ad-networks, advertisers participate together in a singe exchange Publishers put impressions in the exchange; advertisers/ad-networks bid for it CPM, CPC, CPA are all integrated into a single auction mechanism

28 Research © 2008 Yahoo! Overview: The Open Exchange Transparency and value Has ad impression to sell -- AUCTIONS Bids $0.50 Bids $0.75 via Network… … which becomes $0.45 bid Bids $0.65—WINS! AdSense Ad.com Bids $0.60

29 Research © 2008 Yahoo! Unified scale: Expected CPM Campaigns are CPC, CPA, CPM They may all participate in an auction together Converting to a common denomination is a challenge

30 Research © 2008 Yahoo! Outline Background on online advertising The Fundamental Problem Statistical sub-problems: –Description –Existing methods –Challenges

31 Research © 2008 Yahoo! Outline Background on online advertising The Fundamental Problem –Display advertising –Sponsored Search and Content Match Statistical sub-problems: –Description –Existing methods –Challenges

32 Research © 2008 Yahoo! Display Advertising

33 Research © 2008 Yahoo! Display Advertising Main goal of advertisers: Brand Awareness Revenue Model: Primarily Cost per impression (CPM) Traditional Advertising Model: 1.Ads are targeted at particular demographics (user characteristics) 1.GM ads on Y! autos shown to “males above 55” 2.Mortgage ad shown to “everybody on Y! Front page” 2.Book a slot well in advance –“2M impressions in Jan next year” –These future impressions must be guaranteed by the ad network

34 Research © 2008 Yahoo! Display Advertising Fundamental Problem: Guarantee impressions to advertisers 3 2 4 2 2 1 1 Young US Female Y! Mail 1.Predict Supply: How many impressions will be available? Demographics overlap 2.Predict Demand: How much will advertisers want each demographic?

35 Research © 2008 Yahoo! Display Advertising Fundamental Problem: Guarantee impressions to advertisers 3 2 4 2 2 1 1 Young US Female Y! Mail 1.Predict Supply 2.Predict Demand 3.Find the optimal allocation subject to supply and demand constraints

36 Research © 2008 Yahoo! Display Advertising Fundamental Problem: Guarantee impressions to advertisers 1.Predict Supply 2.Predict Demand 3.Find the optimal allocation, subject to constraints Optimal in terms of what objective function?

37 Research © 2008 Yahoo! Allocation through Optimization Optimal in terms of what objective function? –E.g. Maximize value of remaining inventory Cherry-picks valuable inventory, saves it for later –Fairness “Spreads the wealth” subject to constraints sisi supplydemand djdj x ij

38 Research © 2008 Yahoo! Example 3 2 4 2 2 1 1 Young US Female Y! Mail US & Y (2) Supply Pools Demand US, Y, nF Supply = 2 Price = 1 US, Y, F Supply = 3 Price = 5 Supply Pools How should we distribute impressions from the supply pools to satisfy this demand?

39 Research © 2008 Yahoo! Example (Cherry-picking) Cherry-picking: Fulfill demands at least cost US & Y (2) Supply Pools Demand US, Y, nF Supply = 2 Price = 1 US, Y, F Supply = 3 Price = 5 How should we distribute impressions from the supply pools to satisfy this demand? (2)

40 Research © 2008 Yahoo! Example (Fairness) Cherry-picking: Fulfill demands at least cost Fairness: Equitable distribution of available supply pools US & Y (2) Supply Pools Demand US, Y, nF Supply = 2 Cost = 1 US, Y, F Supply = 3 Cost = 5 How should we distribute impressions from the supply pools to satisfy this demand? (1)

41 Research © 2008 Yahoo! Objective functions

42 Research © 2008 Yahoo! Display Advertising Fundamental Problem: Guarantee impressions to advertisers 1.Predict Supply 2.Predict Demand 3.Find the optimal allocation, subject to constraints –Pick the right objective function Further issues: –Risk Management: Supply and demand forecasts should have both mean and variance –Forecast aggregation: Forecasts may be needed over multiple resolutions, in time and in demographics

43 Research © 2008 Yahoo! Display Advertising Fundamental Problem: Guarantee impressions to advertisers 1.Predict Supply 2.Predict Demand 3.Find the optimal allocation, subject to constraints –Pick the right objective function Forecasting accuracy is critical! –Overshoot  under-delivery of impressions  unhappy advertisers –Undershoot  loss in revenue

44 Research © 2008 Yahoo! Outline Background on online advertising The Fundamental Problem –Display advertising –Sponsored Search and Content Match Statistical sub-problems: –Description –Existing methods –Challenges

45 Research © 2008 Yahoo! Sponsored Search and Content Match Given a query: –Select the top-k ads to be shown on the k slots to maximize total expected revenue What is total expected revenue?

46 Research © 2008 Yahoo! Example (Content Match) Ad Position 1 Ad Position 2 Ad Position 3

47 Research © 2008 Yahoo! Example (Content Match)

48 Research © 2008 Yahoo! Reminder: Auction Mechanism Revenue depends on type of auction –Generalized First-price: CPC = bid on clicked ad –Generalized Second-price: CPC = bid of ad below clicked ad (or the reserve price) CPC could be modified by additional factors Total expected revenue = revenue obtained in a given time window [Optimal Auction Design in a Multi-Unit Environment: The Case of Sponsored Search Auctions] by Edelman+/2006 [Internet Advertising and the Generalized Second Price Auction…] by Edelman+/2006

49 Research © 2008 Yahoo! Sponsored Search and Content Match Given a query: –Select the top-k ads to be shown on the k slots to maximize total expected revenue What affects the total revenue? –Relevance of the ad to the query –Bids on the ads –User experience on the ad landing page (ad “quality”) –Expected total revenue is some function of these.

50 Research © 2008 Yahoo! Sponsored Search and Content Match Given a query: –Select the top-k ads to be shown on the k slots to maximize total expected revenue Fundamental Problem: –Estimate relevance of the ad to the query

51 Research © 2008 Yahoo! Ad Relevance Computation

52 Research © 2008 Yahoo! Overview Information Retrieval (IR) –Techniques –Challenges Machine Learning using Click Feedback Online Learning

53 Research © 2008 Yahoo! IR-based ad matching “Why not use a search engine to match ads to context?” –Ads are the “documents” –Context (user query or webpage content) is the “query” Three broad approaches: –Vector space models –Probabilistic models –Language models Open-source software is available: –Lemur (www.lemurproject.org)

54 Research © 2008 Yahoo! IR-based ad matching Vector space models: –Each word/phrase in the vocabulary is a separate dimension –Each ad and query is a point in this vector space –Example: cosine similarity Probabilistic models Language models

55 Research © 2008 Yahoo! Q1: How can we score the goodness of an ad for a context? Cosine similarity: Advantages: –Simple and easy to interpret –Normalizes for different ad and context lengths IR-based ad matching Ad vector Query vector

56 Research © 2008 Yahoo! IR-based ad matching Vector space models Probabilistic models: –Predict, for every (ad, query) pair, the probability that the ad is relevant to the query –Example: Okapi BM25 Language models

57 Research © 2008 Yahoo! Q1: How can we score the goodness of an ad for a context? Okapi BM25: IR-based ad matching Term Frequency in ad Parameters Norm. document length Inverse Document Frequency Term Frequency in query

58 Research © 2008 Yahoo! Q1: How can we score the goodness of an ad for a context? Okapi BM25: Advantages: –Different terms are weighted differently –Tunable parameters –Good performance IR-based ad matching Term Frequency in ad Norm. document length Term Frequency in query

59 Research © 2008 Yahoo! IR-based ad matching Vector space models Probabilistic models Language models: –Ads and queries are generated by statistical models of how words are used in the language –What statistical models can be used? –How do we translate query and ad generation probabilities into relevance?

60 Research © 2008 Yahoo! IR-based ad matching What statistical models can be used? –Bigram model –Multinomial model Given any ad or query, we can compute the parameter setting most likely to have generated the document Term Frequency Term probability (model parameters) Total length

61 Research © 2008 Yahoo! IR-based ad matching How do we translate query and ad generation probabilities into relevance? Method 1 Compute most likely query and ad params Generate ad using query params High probability  high relevance Query Query params Ad Ad params

62 Research © 2008 Yahoo! IR-based ad matching How do we translate query and ad generation probabilities into relevance? Method 2 Compute most likely query and ad params Generate query using ad params High probability  high relevance Query Query params Ad Ad params

63 Research © 2008 Yahoo! IR-based ad matching How do we translate query and ad generation probabilities into relevance? Method 3 Compute most likely query and ad params Compute KL-divergence between params Low KL-divergence  high relevance Query Query params Ad Ad params

64 Research © 2008 Yahoo! IR-based ad matching New methods to combine syntactic and semantic information For example, “ A Semantic Approach to Contextual Advertising” by Broder+/SIGIR/2007 –Words only provide syntactic clues –Classify ads and queries into a common taxonomy –Taxonomy matches provide semantic clues

65 Research © 2008 Yahoo! Overview Information Retrieval (IR) –Techniques –Challenges Machine Learning using Click Feedback Online Learning

66 Research © 2008 Yahoo! Challenges of IR-based ad matching Word matches might not always work

67 Research © 2008 Yahoo! Woes of word matching Extract Topical info Increases coverage, more relevant match

68 Research © 2008 Yahoo! Challenges of IR-based ad matching Word matches might not always work Works well for frequent words, what about rare words? Long tail, big revenue impact. –Remedy: Add more matching dimensions (phrase,…) Static, does not capture effect of external factors –E.g. high interest in basketball page due to an event; dies off after the event –Click feedback a powerful way of capturing such latent effects; difficult to do it through relevance only Relevance scores may not correspond to CTR; does not provide estimates of expected revenue

69 Research © 2008 Yahoo! Challenges of IR-based ad matching Heterogeneous corpus (query, ads). Single tfidf scores not applicable. In content match, queries long and noisy Partial feedback does not work –Not scalable Ads are small, relevance of landing page difficult to determine (video, image, text)

70 Research © 2008 Yahoo! Machine Learning using Click Feedback

71 Research © 2008 Yahoo! Overview Information Retrieval (IR) Machine Learning using Click Feedback –Advantages and Challenges of Click Feedback –Feature-based models Description Case Studies –Hierarchical Models –Matrix Factorization and Collaborative Filtering –Challenges and Open Problems Online Learning

72 Research © 2008 Yahoo! Learning from Click Feedback Learning relevance from partial human-labeled training data –Attractive but not scalable Users provide us direct feedback through ad clicks –Low cost and automated learning mechanism –Large amounts of feedback for big ad-networks Estimation problem: –Estimate CTR = Pr(click| query, ad, user)

73 Research © 2008 Yahoo! Learning from Clicks: Challenges Noisy labels –Clicks (unscrupulous users gaming the system) –Negatives (not clear; I never click on ads ) Sparseness –(query, ad) matrix has billions of cells; long tail Too few data points in large number of cells; MLE has high variance Goal is to learn the best cells, not all cells Dynamic and seasonal effects –CTRs evolve; subject to seasonal effects Summer, Halloween,.. Palin ads popular yesterday, not today

74 Research © 2008 Yahoo! Challenges continued Selection bias –We never showed watch ads on golf pages Positional bias, presentation bias –Same ad performs differently at different positions Slate bias –Performance of ad depends on other ads that were displayed

75 Research © 2008 Yahoo! Overview Information Retrieval (IR) Machine Learning using Click Feedback –Advantages and Challenges of Click Feedback –Feature-based models Description Case Studies –Hierarchical Models –Matrix Factorization and Collaborative Filtering –Challenges and Open Problems Online Learning

76 Research © 2008 Yahoo! Feature based approach Query, Ad characterized by features –Query: bag-of-words, phrases, topic,… –Ads: bag-of-words, keywords, size,… Query feature vector: q Ad feature vector: a Pr(Click|Q,A) = f(q,a;θ) Example: Logistic regression –log-odds(Pr(Click|Q,A)) = q ’ W a –W estimated from data

77 Research © 2008 Yahoo! Feature based models: Challenges Challenges –High dimensional, need to regularize (Priors) –De-bias for positional and slate effects –Negative events to be weighted appropriately Go through case studies reported in literature

78 Research © 2008 Yahoo! Predicting Clicks: Estimating the Click-through rates of new ads: Richardson et al, WWW 2007 Estimate CTR of new ads in Sponsored search Log-odds(CTR(ad)) = w i f i (ad) Features used: –Bid term CTRs of related ads (from other accounts) CTRs of all other ads with keyword “camera” –Appearance, attention, advertiser reputation, landing page quality, relevance of bid terms to ad, bag-of- words in ad. Does not capture interactions between (query, ad), main focus is to estimate CTR of new ads only Negative events down-weighted based on eye- tracking study

79 Research © 2008 Yahoo! Combining relevance with Click Feedback, Chakrabarti et al, WWW 08 Content Match application CTR estimation for arbitrary (page, ad) pairs Features : –Bag-of-words in query, ads; relevance scores from IR –Cross-product of words: Occurs in both page and ad Learn to predict click data using such features Prediction function amenable to WAND algorithm –Helps with fast retrieval at serve time

80 Research © 2008 Yahoo! Proposed Method A logistic regression method model for CTR CTRMain effect for page (how good is the page) Main effect for ad (how good is the ad) Interaction effect (words shared by page and ad) Model parameters

81 Research © 2008 Yahoo! Proposed Method M p,w = tf p,w M a,w = tf a,w I p,a,w = tf p,w * tf a,w So, IR-based term frequency measures are taken into account

82 Research © 2008 Yahoo! Proposed Method Two sources of complexity –Adding in IR scores –Word selection for efficient learning

83 Research © 2008 Yahoo! Proposed Method How can IR scores fit into the model? –What is the relationship between logit(p ij ) and cosine score? –Quadratic relationship Cosine score logit(p ij )

84 Research © 2008 Yahoo! Proposed Method How can IR scores fit into the model? This quadratic relationship can be used in two ways –Put in cosine and cosine 2 as features –Use it as a prior

85 Research © 2008 Yahoo! Proposed Method Word selection –Overall, nearly 110k words in corpus –Learning parameters for each word would be: Very expensive Require a huge amount of data Suffer from diminishing returns –So we want to select ~1k top words which will have the most impact

86 Research © 2008 Yahoo! Proposed Method Word selection –Data based: Define an interaction measure for each word Higher values for words which have higher-than-expected CTR when they occur on both page and ad

87 Research © 2008 Yahoo! Experiments Recall Precision 25% lift in precision at 10% recall

88 Research © 2008 Yahoo! Overview Information Retrieval (IR) Machine Learning using Click Feedback –Advantages and Challenges of Click Feedback –Feature-based models Description Case Studies –Hierarchical Models –Matrix Factorization and Collaborative Filtering –Challenges and Open Problems Online Learning

89 Research © 2008 Yahoo! Regelsen and Fain, 2006 Estimate CTR of terms by “borrowing strength” at multiple resolutions Hierarchical clustering of related terms –Clustering advertiser keyword matrix Estimating CTR at finer resolutions by using information at coarser resolutions –Weighted average, more weight to finer resolutions –Weights selected heuristically, no principled approach

90 Research © 2008 Yahoo! Estimation in the “tail” A more principled approach to “Estimating Rates of Rare Events at Multiple Resolutions” [KDD/2007] Contextual Advertising –Show an ad on a webpage (“impression”) –Revenue is generated if a user clicks –Problem: Estimate the click-through rate (CTR) of an ad on a page Most (ad, page) pairs have very few impressions, if any, and even fewer clicks  Severe data sparsity

91 Research © 2008 Yahoo! Estimation in the “tail” Use an existing, well-understood hierarchy –Categorize ads and webpages to leaves of the hierarchy –CTR estimates of siblings are correlated  The hierarchy allows us to aggregate data Coarser resolutions –provide reliable estimates for rare events –which then influences estimation at finer resolutions

92 Research © 2008 Yahoo! System overview Retrospective data [URL, ad, isClicked] Crawl URLs Classify pages and ads Rare event estimation using hierarchy a sample of URLs Impute impressions, fix sampling bias

93 Research © 2008 Yahoo! Sampling of webpages Naïve strategy: sample at random from the set of URLs  Sampling errors in impression volume AND click volume Instead, we propose: –Crawling all URLs with at least one click, and –a sample of the remaining URLs  Variability is only in impression volume

94 Research © 2008 Yahoo! Imputation of impression volume Ad classes Page classes sums to #impressions on ads of this ad class [column constraint] sums to ∑n ij + K.∑m ij [row constraint] sums to Total impressions (known) #impressions = n ij + m ij + x ij Clicked pool Sampled Non-clicked pool Excess impressions (to be imputed)

95 Research © 2008 Yahoo! Imputation of impression volume Level 0 Level i Page hierarchy Ad hierarchy Region = (page node, ad node) Region Hierarchy  A cross-product of the page hierarchy and the ad hierarchy Page classes Ad classes Region

96 Research © 2008 Yahoo! Imputation of impression volume sums to [block constraint] Level i Level i+1

97 Research © 2008 Yahoo! Imputing x ij Level i Level i+1 Iterative Proportional Fitting [Darroch+/1972] Initialize x ij = n ij + m ij Iteratively scale x ij values to match row/col/block constraint Ordering of constraints: top- down, then bottom-up, and repeat block Page classes Ad classes

98 Research © 2008 Yahoo! Imputation: Summary Given –n ij (impressions in clicked pool) –m ij (impressions in sampled non-clicked pool) –# impressions on ads of each ad class in the ad hierarchy We get –Estimated impression volume Ñ ij = n ij + m ij + x ij in each region ij of every level

99 Research © 2008 Yahoo! System overview Retrospective data [page, ad, isclicked] Crawl Pages Classify pages and ads Rare event estimation using hierarchy a sample of pages Impute impressions, fix sampling bias

100 Research © 2008 Yahoo! Rare rate modeling 1.Freeman-Tukey transform: –y ij = F-T(clicks and impressions at ij) ≈ transformed-CTR –Variance stabilizing transformation: Var(y) is independent of E[y]  needed in further modeling

101 Research © 2008 Yahoo! S ij S parent(ij) Rare rate modeling 2.Generative Model (Tree-structured Markov Model) y ij y parent(ij) covariates β ij variance V ij Unobserved “state” variance W ij V parent(ij) β parent(ij) W parent(ij)

102 Research © 2008 Yahoo! Rare rate modeling Model fitting with a 2-pass Kalman filter: –Filtering: Leaf to root –Smoothing: Root to leaf Linear in the number of regions

103 Research © 2008 Yahoo! Tree-structured Markov model

104 Research © 2008 Yahoo! Scalable Model fitting Multi-resolution Kalman filter

105 Research © 2008 Yahoo! Multi-Resolution Kalman filter: Mathematical overview

106 Research © 2008 Yahoo! Experiments 503M impressions 7-level hierarchy of which the top 3 levels were used Zero clicks in –76% regions in level 2 –95% regions in level 3 Full dataset DFULL, and a 2/3 sample DSAMPLE

107 Research © 2008 Yahoo! Experiments Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE Some of these regions R >0 get clicks in DFULL A good model should predict higher CTRs for R >0 as against the other regions in R

108 Research © 2008 Yahoo! Experiments We compared 4 models –TS: our tree-structured model –LM (level-mean): each level smoothed independently –NS (no smoothing): CTR proportional to 1/Ñ –Random: Assuming |R >0 | is given, randomly predict the membership of R >0 out of R

109 Research © 2008 Yahoo! Experiments TS Random LM, NS

110 Research © 2008 Yahoo! Experiments Enough impressions  little “borrowing” from siblings Few impressions  Estimates depend more on siblings

111 Research © 2008 Yahoo! Related Work Multi-resolution modeling –studied in time series modeling and spatial statistics [Openshaw+/79, Cressie/90, Chou+/94] Imputation –studied in statistics [Darroch+/1972] Application of such models to estimation of such rare events (rates of ~10 -3 ) is novel

112 Research © 2008 Yahoo! Summary A method to estimate –rates of extremely rare events –at multiple resolutions –under severe sparsity constraints The method has two parts –Imputation  incorporates hierarchy, fixes sampling bias –Tree-structured generative model  extremely fast parameter fitting

113 Research © 2008 Yahoo! Overview Information Retrieval (IR) Machine Learning using Click Feedback –Advantages and Challenges of Click Feedback –Feature-based models Description Case Studies –Hierarchical Models –Matrix Factorization and Collaborative Filtering –Challenges and Open Problems Online Learning

114 Research © 2008 Yahoo! Collaborative Filtering Collaborative filtering –Similarity based methods Rating (CTR) for query u of ad i Ad-ad similarity matrix Local neighborhood of ad i

115 Research © 2008 Yahoo! Collaborative Filtering Collaborative filtering –Similarity based methods –Possible adaptation –Challenges: Learning similarity Simultaneously incorporating query and ad similarities Feature- based model Collaborative filtering model

116 Research © 2008 Yahoo! Matrix Factorization –Each query (ad) is a linear combination of latent factors –Solve for factors, under some regularization and constraints Factor coefficients for query Factor coefficients for ad

117 Research © 2008 Yahoo! Matrix Factorization Bi-clustering –Predictive Discrete latent factor models, Agarwal and Merugu, KDD 07.

118 Research © 2008 Yahoo! Overview Information Retrieval (IR) Machine Learning using Click Feedback –Advantages and Challenges of Click Feedback –Feature-based models Description Case Studies –Hierarchical Models –Matrix Factorization and Collaborative Filtering –Challenges and Open Problems Online Learning

119 Research © 2008 Yahoo! Challenges of Feature-based models Learns from clicks but still misses context in many instances as in relevance based approach Introducing features that are too granular makes it hard to learn CTR reliably Does not capture the dynamics of the system Training cost is high Slow prediction functions inadmissible due to latency constraints

120 Research © 2008 Yahoo! Challenges of Feature-based models Other methods –Boosting, Neural nets, Decision Trees, Random Forests, …… Local models –Mixture of experts: Fit local, think global Hierarchical modeling with multiple trees –User interest, query, ad,.. –Each tree is different –How to perform smoothing with multiple disparate trees?

121 Research © 2008 Yahoo! Challenges of Feature-based models Combining cold start with warm start together main challenge in collaborative filtering based methods We believe, solving basic issues more challenging –Positional bias –Selection bias –Correlation in ads on a slate –Dynamic CTR; seasonal variations

122 Research © 2008 Yahoo! Online learning

123 Research © 2008 Yahoo! Overview Information Retrieval (IR) Machine Learning using Click Feedback Online Learning

124 Research © 2008 Yahoo! Online learning for ad matching All previous approaches learn from historical data This has several drawbacks: –Slow response to emerging patterns in the data due to special events like elections, … –Initial systemic biases are never corrected If the system has never shown “sound system dock” ads for the “iPod” query, it can never learn if this match is good –System needs to be retrained periodically

125 Research © 2008 Yahoo! Online learning for ad matching Solution: Combining exploitation with exploration –Exploitation: Pick ads that are good according to current model –Exploration: Pick ads that increase our knowledge about the entire space of ads Multi-armed bandits –Background –Applications to online advertising –Challenges and Open Problems

126 Research © 2008 Yahoo! Background: Bandits Bandit “arms” p1p1 p2p2 p3p3 (unknown payoff probabilities) “Pulling” arm i yields a reward: reward = 1 with probability p i (success) reward = 0 otherwise (failure)

127 Research © 2008 Yahoo! Background: Bandits Goal: Pull arms sequentially so as to maximize the total expected reward –Estimate payoff probabilities p i –Bias the estimation process towards better arms Bandit “arms” p1p1 p2p2 p3p3 (unknown payoff probabilities)

128 Research © 2008 Yahoo! Background: Bandits An algorithm to sequentially pick the arms is called a bandit policy Regret of a policy = how much extra payoff could be gained in expectation if the best arm is always pulled –Of course, the best arm is not known to the policy –Hence, the regret is the price of exploration –Low regret implies that the policy quickly converges to the best arm What is the optimal policy?

129 Research © 2008 Yahoo! Background: Bandits Which arm should be pulled next? –Not necessarily what looks best right now, since it might have had a few lucky successes –Seems to depend on some complicated function of the successes and failures of all arms argmax g(s 1, f 1, s 2, f 2, …, s k, f k ) ? Number of successes Number of failures

130 Research © 2008 Yahoo! Background: Bandits What is the optimal policy? Consider a bandit which –has an infinite time horizon, but –future rewards are geometrically discounted R total = R(1) + γ.R(2) + γ 2.R(3) + … (0<γ<1) Theorem [Gittins/1979]: The optimal policy decouples and solves a bandit problem for each arm independently argmax {g 1 (s 1, f 1 ), g 2 (s 2, f 2 ), …, g k (s k, f k )} argmax g(s 1, f 1, s 2, f 2, …, s k, f k ) ?

131 Research © 2008 Yahoo! Background: Bandits What is the optimal policy? Theorem [Gittins/1979]: The optimal policy decouples and solves a bandit problem for each arm independently –Significantly reduces the dimension of the problem space –Gives a minimum regret bound of O(log T) –But, the optimal functions g i (s i, f i ) are hard to compute –Need approximate methods…

132 Research © 2008 Yahoo! Background: Bandits Bandit Policy 1.Assign priority to each arm 2.“Pull” arm with max priority, and observe reward 3.Update priorities Priority 1 Priority 2 Priority 3 Allocation Estimation

133 Research © 2008 Yahoo! Background: Bandits One common policy is UCB1 [Auer/2002] Number of successes Number of failures Total number of observations Number of observations of arm i Observed payoff Factor representing uncertainty

134 Research © 2008 Yahoo! Background: Bandits As total observations T becomes large: –Observed payoff tends asymptotically towards the true payoff probability –The system never completely “converges” to one best arm; only the rate of exploration tends to zero Observed payoff Factor representing uncertainty

135 Research © 2008 Yahoo! Background: Bandits Sub-optimal arms are pulled O(log T) times Hence, UCB1 has O(log T) regret This is the lowest possible regret Observed payoff Factor representing uncertainty

136 Research © 2008 Yahoo! Online learning for ad matching Solution: Combining exploitation with exploration –Exploitation: Pick ads that are good according to current model –Exploration: Pick ads that increase our knowledge about the entire space of ads Multi-armed bandits –Background –Applications to online advertising –Challenges and Open Problems

137 Research © 2008 Yahoo! Background: Bandits Webpage 1 Bandit “arms” Webpage 2 Webpage 3 = ads ~10 6 ads ~10 9 pages

138 Research © 2008 Yahoo! Background: Bandits Ads Webpages Content Match =A matrix Each row is a bandit Each cell has an unknown CTR One bandit Unknown CTR

139 Research © 2008 Yahoo! Background: Bandits Why not simply apply a bandit policy directly to our problem? Convergence is too slow ~10 9 bandits, with ~10 6 arms per bandit Additional structure is available, that can help  Taxonomies

140 Research © 2008 Yahoo! Taxonomies for dimensionality reduction Root Apparel Computers Travel Already exist Actively maintained Existing classifiers to map pages and ads to taxonomy nodes Page/Ad A bandit policy that uses this structure can be faster

141 Research © 2008 Yahoo! Outline Multi-level Bandit Policy for Content Match Experiments Summary

142 Research © 2008 Yahoo! Multi-level Policy Ads Webpages …… …… classes Consider only two levels

143 Research © 2008 Yahoo! Multi-level Policy Apparel Compu- ters Travel …… …… Consider only two levels Travel Compu- ters Apparel Ad parent classes Ad child classes Block One bandit

144 Research © 2008 Yahoo! Multi-level Policy Apparel Compu- ters Travel …… …… Key idea: CTRs in a block are homogeneous Ad parent classes Block One bandit Travel Compu- ters Apparel Ad child classes

145 Research © 2008 Yahoo! Multi-level Policy CTRs in a block are homogeneous –Used in allocation (picking ad for each new page) –Used in estimation (updating priorities after each observation)

146 Research © 2008 Yahoo! Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) –Used in estimation (updating priorities after each observation)

147 Research © 2008 Yahoo! C AC T A T Multi-level Policy (Allocation) ? Page classifier Classify webpage  page class, parent page class Run bandit on ad parent classes  pick one ad parent class

148 Research © 2008 Yahoo! C AC T A T Multi-level Policy (Allocation) Classify webpage  page class, parent page class Run bandit on ad parent classes  pick one ad parent class Run bandit among cells  pick one ad class In general, continue from root to leaf  final ad ? Page classifier ad

149 Research © 2008 Yahoo! C AC T A T ad Multi-level Policy (Allocation) Bandits at higher levels use aggregated information have fewer bandit arms  Quickly figure out the best ad parent class Page classifier

150 Research © 2008 Yahoo! Multi-level Policy CTRs in a block are homogeneous Used in allocation (picking ad for each new page) Used in estimation (updating priorities after each observation)

151 Research © 2008 Yahoo! Multi-level Policy (Estimation) CTRs in a block are homogeneous –Observations from one cell also give information about others in the block –How can we model this dependence?

152 Research © 2008 Yahoo! Multi-level Policy (Estimation) Shrinkage Model S cell | CTR cell ~ Bin (N cell, CTR cell ) CTR cell ~ Beta (Params block ) # clicks in cell # impressions in cell All cells in a block come from the same distribution

153 Research © 2008 Yahoo! Multi-level Policy (Estimation) Intuitively, this leads to shrinkage of cell CTRs towards block CTRs E[CTR] = α.Prior block + (1-α).S cell /N cell Estimated CTR Beta prior (“block CTR”) Observed CTR

154 Research © 2008 Yahoo! Experiments Root 20 nodes 221 nodes … ~7000 leaves Taxonomy structure We use these 2 levels Depth 0 Depth 7 Depth 1 Depth 2

155 Research © 2008 Yahoo! Experiments Data collected over a 1 day period Collected from only one server, under some other ad-matching rules (not our bandit) ~229M impressions CTR values have been linearly transformed for purposes of confidentiality

156 Research © 2008 Yahoo! Experiments (Multi-level Policy) Multi-level gives much higher #clicks Number of pulls Clicks

157 Research © 2008 Yahoo! Experiments (Multi-level Policy) Multi-level gives much better Mean-Squared Error  it has learnt more from its explorations Mean-Squared Error Number of pulls

158 Research © 2008 Yahoo! Experiments (Shrinkage) Number of pulls Mean-Squared Error Clicks without shrinkage with shrinkage Shrinkage  improved Mean-Squared Error, but no gain in #clicks

159 Research © 2008 Yahoo! Summary Taxonomies exist for many datasets They can be used for –Dimensionality Reduction –Multi-level bandit policy  higher #clicks –Better estimation via shrinkage models  better MSE

160 Research © 2008 Yahoo! Online learning for ad matching Solution: Combining exploitation with exploration –Exploitation: Pick ads that are good according to current model –Exploration: Pick ads that increase our knowledge about the entire space of ads Multi-armed bandits –Background –Applications to online advertising –Challenges and Open Problems

161 Research © 2008 Yahoo! Challenges and Open Problems Bandit policies typically assume stationarity But, sudden changes are the norm in the online advertising world: –Ads may be suddenly removed when they run out of budget –New ads are constantly added to the system –The total number of ads is huge, and full exploration may be too costly –Mortal multi-armed bandits [NIPS/2008]

162 Research © 2008 Yahoo! Mortal Multi-armed Bandits Traditional bandit policies like UCB1 spend a large fraction of their initial pulls on exploration –Hard-earned knowledge may be lost due to finite arm lifetimes Method 1 (Sampling): –Pick a random sample from the set of available arms –Run UCB1 on sample, until some fraction of arms in the sample are lost –Pro: Quicker convergence, more exploitation –Con: Best arm in the sample may be worse than best arm overall –Pick sample size to control this tradeoff

163 Research © 2008 Yahoo! Mortal Multi-armed Bandits Traditional bandit policies like UCB1 spend a large fraction of their initial pulls on exploration –Hard-earned knowledge may be lost due to finite arm lifetimes Method 2 (Payoff threshold): –New bandit policy: If the observed payoff of any arm is higher than a threshold, pull it till it expires –Pro: Good arms, once found, are exploited quickly –Con: While exploiting good arms, the best arm may be starving and may expire without being found –Pick threshold to control this tradeoff

164 Research © 2008 Yahoo! Mortal Multi-armed Bandits Challenges: –Selecting the critical sample size or threshold correctly, for arbitrary payoff distributions –What if even the payoff distribution is unknown?

165 Research © 2008 Yahoo! Challenges and Open Problems Mortal multi-armed bandits What if the bandit policy has some information about the budget? –The bandit policy can control which arms expire, and when –“Handling Advertisements of Unknown Quality in Search Advertising” by Pandey+/NIPS/2006 Combining budgets with extra knowledge of ad CTRs –E.g., Using an ad taxonomy Using a bandit scheme to infer/correct an ad taxonomy

166 Research © 2008 Yahoo! Conclusions

167 Research © 2008 Yahoo! Conclusions We provided an introduction to Online Advertising –Discussed the eco-system and various actors involved –Discussed different flavors of online advertising Sponsored Search, Content Match, Display Advertising

168 Research © 2008 Yahoo! Conclusions Online Advertising Revenue Models Advertising Setting Misc. CPMCPCCPA DisplayContent Match Sponsored Search Ad exchanges

169 Research © 2008 Yahoo! Conclusions Outlined associated statistical challenges –Sponsored search, Content Match, Display We believe the following to be a technical roadmap Offline Modeling Online Models Time series Explore/Exploit Multi-armed bandits Regression, collaborative filtering, mixture of experts Multi-resolution models Selection bias Slate correlation Noisy labels

170 Research © 2008 Yahoo! Conclusions Offline Modeling –By far the best studied so far –Not a careful study of selection bias, slate correlations, noisy labels. Good opportunity here –More emphasis on matrix structure, goal is to estimate interactions Explore/Exploit –Some work using multi-armed bandits; long way to go Time series model to capture temporal aspects –Little work Holistic approach that combines all components in a principled way


Download ppt "Research © 2008 Yahoo! Statistical Challenges in Online Advertising Deepak Agarwal Deepayan Chakrabarti (Yahoo! Research)"

Similar presentations


Ads by Google