Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Advertising Open lecture at Warsaw University January 7/8, 2011 Ingmar Weber Yahoo! Research Barcelona Please interrupt me.

Similar presentations

Presentation on theme: "Online Advertising Open lecture at Warsaw University January 7/8, 2011 Ingmar Weber Yahoo! Research Barcelona Please interrupt me."— Presentation transcript:

1 Online Advertising Open lecture at Warsaw University January 7/8, 2011 Ingmar Weber Yahoo! Research Barcelona Please interrupt me at any point!

2 Disclaimers & Acknowledgments This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc. or any other entity. Algorithms, techniques, features, etc. mentioned here might or might not be in use by Yahoo! or any other company. Some of the slides in this lecture are based on slides for “Introduction to Computational Advertising”, given by A. Broder and V. Josifovski at Stanford University.

3 Goals of this Presentation Give an overview of the two main types of online advertising; (i) search advertising and (ii) display advertising Explain the key technical aspects behind with a focus on computational aspects This time: more breadth Next time: more depth (you tell me where!)

4 Types of Online Advertising Search Advertising Display Advertising Advertising Classifieds Sponsorships … Part 1 Part 2

5 Part 0 Setting the Scene

6 Different Advertising Objectives Brand Advertising You’re not expected to buy a rolex watch tomorrow. Direct Marketing Tries to cause an (almost) immediate reaction. What’s different?

7 US Online Spending share by objective What’s bigger? Branding or direct response?

8 Lots of $$$ (or zloty) Poland’s state deficit in 2010: ~$11 billion Poland’s agriculture GDP: ~$32 billion

9 Part 1 Search Advertising

10 “impression”/“pageview” The Life of an Ad - Terminology “click” “click-through rate”: (# clicks)/(# impressions) “landing page” “conversion” or “action” “conversion rate”: (# conversions)/(# page visits) “target page” “tracking code”

11 Search Advertising Advertisements are sold in auctions –Advertisers bid on search terms [show live] Different payment models –CPC (cost per click) Advertiser pays $X when an ad gets clicked –CPA (cost per action) Advertiser pays $Y when a click on an ad leads to a (trans-)action/purchase –CPM (cost per mille [page impressions]) Advertiser pays $Z for 1,000 ad displayments de-facto standard growing popularity used for display ads

12 Advertisers compete for search terms “warsaw hotels”, “online advertising”, … A click has a different value for different advertisers depends on profit margin and on conversion rate There’s a ranked list of sponsored search results Assumption: higher ranking => more clicks (CTR) Advertisers bid for a (good) slot in the results $ 0.01 per click - $ per click Search engine decides the order/inclusion slots are assigned to (successful) bidders When a user clicks on a sponsored search result … … payment is made by the advertiser Bidding for search terms Search engines need to decide: * How should the slots be assigned? * How much should be paid per click? Advertisers need to decide: * How much to bid? 99% of web site visitors don’t purchase anything 1% buy a computer - c onversion rate (from click to transaction) Profit per computer sold $100 Expected profit per visitor $1 – value of a single visit/click Guess the most expensive search term? How would you do it?

13 How much do people typically pay?


15 How much does X cost? Try to guess some expensive key words –Clear (commercial) intent –Very high value for new customer Keyword tool –Small competition … The winner is … –Mesothelioma

16 Exercise Build six teams Think of terms to bid on (exact match) and corresponding ads. You can choose the target page! You’ll get 5 EUR per team to target the US&Canada search market Ads will go live around 18h00 today (Friday) and we’ll look at the results tomorrow (Saturday) around 16h00

17 Exercise All ads will run under my account All keywords have to be “distinct” (system doesn’t allow self-competition) Assigned in reversing round robin fashion (1,2,3,3,2,1,1,2,3,…) Max 5 key words and 1 ad per team The team with the largest number of clicks by 16h00 on Saturday wins Please, no cheating

18 Pricing of Ads How was it done? What was wrong with that? How is it done now? Does that solve all problems?

19 Historic Overture mechanism Slot assignment by bid order Assign the slots in the order of the bid values higher bid => higher slot When a user clicks, you pay your bid value You bid $1.00 per click? - You pay $1.00 per click! Simple. - Intuitive. - Used for many years. What’s wrong with this?

20 End of story? – No, because … Difficult for advertisers to “play” this “game”: There’s no equilibrium! Scenario: Two available ad slots with CTR 5% and 4% respectively Three bidders with valuations $20, $18, $10 per click What happens? Bidder 2 bids $10.01 to beat Bidder 1 and to get a slot Bidder 1 will not pay more than $10.02 Then bidder 2 bids $10.03 Then bidder 1 bids $10.04 … and the fun continues until $14 … when it all collapses back to $10.01 Difficult to “play” this game optimally. Potential feeling of “being cheated”.

21 End of story? – And no, because … Ads can have different motivations –Motivating an action/purchase/click –Simply placing/marketing a brand ebay could afford to bid for every term …... because no one will click the ad! “Buy * on ebay!” * = world peace, grandmother, happiness, … ebay cares more about page impressions Want to get rid of high-bidding free riders.

22 Addressing the first problem: Second price auction If only a single slot exists, do the following: Assign the slot to the highest bidder. Ex: Slot goes to Bidder 1 who bid $17. Let him pay the second highest bid. Ex: Bidder 1 pays $15, Bidder 2’s bid. Theorem (Vickrey ‘61): Bidding truthfully is a dominant strategy in this setting. (c.f. stamp auctions 1878+)

23 Second Price Auction Explained This ad slot is worth €1 to me. I bid €0.80! I bid €0.90!I bid €1.50!I bid €0.70! Pays €0.70. But could have bid €1.00.Loses item. Should have bid €1.00.Loses item. But could have bid €1.00. He’s “lying”. Only works for a single slot … Bidding “truthfully” is always best. Regardless of what others do. Your title here Your cool ad text goes here.

24 Addressing the first problem: Generalized second price auction If many slots exist, do the following: Assign the slots in (decreasing) order of the bids. Let each one pay the next (lower) bid. Called: Generalized second price (GSP) auction Is bidding “truthfully” a dominant strategy? Are there any dominant strategies?

25 Addressing the first problem: Generalized second price auction Same scenario again: Two available ad slots with CTR 5% and 4% respectively Three bidders with valuations $20, $18, $10 per click What happens if everyone bids truthfully ($20, $18, $10 respectively)? Bidder 1: ($20-$18)*0.05 = $0.10 profit per page impression Bidder 2: ($18-$10)*0.04 = $0.32 profit per page impression Bidder 3: $0.00 profit per page impression If bidder 1 bids $11 instead … … his profit is ($20-$10)*0.04 = $0.40 per page impression Bidding “truthfully” is not a dominant strategy in GSP. In fact, no dominant strategy exists for GSP.

26 So, still saw-tooth under GSP? As long as you bid less than the higher bid, your payment doesn’t change … … but the guy above gets charged more. So: Bidder 2 increases bid to stay just slightly below bidder 1 No difference for his position/payment But payment of other bidder 1 goes up Bidder 1 can “retaliate” by underbidding bidder 2 Bidder 1 now pays less (for a worse slot) Bidder 2 now pays more (for a better slot) Bidder 1 and bidder 2 have swapped position and (kind of) bids. “locally envy-free” if these games don’t happen.

27 Locally envy-free equilibria “Internet Advertising and the GSP Auction: Selling Billions of Dollars Worth of Keywords”, Edelman et al., 2006 A (pure Nash) equilibrium is locally envy- free if for any rank i: ® i s g(i) – p (i) ¸ ® i - 1 s g(i) – p (i-1) ® i = CTR at rank i (think “volume”) p (i) = cost for rank i small i = low rank = high CTR

28 Locally envy-free equilibria Lemma 1: A locally envy-free equilibrium of the GSP game corresponds to a stable assignment. Stable assignment: nobody wants to swap position and payment with anybody else Proof: No swap with positions below as we have an equilibrium: could just undercut advertiser to make this swap. Remains to show: no swap with positions (far) above.

29 Locally envy-free equilibria Proof (ctd): Claim: resulting order is “assortative”, i.e. in the order of the s g(i) : ® i s g(i) – p (i) ¸ ® i + 1 s g(i) – p (i+1) (equilibrium) ® i + 1 s g(i+1) – p (i+1) ¸ ® i s g(i+1) – p (i) (envy-free) Gives: ( ® i - ® i + 1 ) s g(i) ¸ ( ® i - ® i + 1 ) s g(i+1)

30 Locally envy-free equilibria Proof (ctd): Suppose i wants to go to m ® j + 1 ). Then add and cancel. Get: ® i s g(i) – p (i) ¸ ® m s g(i) – p (m)

31 Locally envy-free equilibria Lemma 2: When there are more advertisers than slots, then any stable assignment corresponds to a locally envy free equilibrium of the GSP game. Could be an empty set …but Theorem: Bidding b j = p V,(j-1) / ® j - 1 gives a locally envy-free equilibrium with VCG payments. Here p V,(j-1) are VCG payments. Why is this of little practical relevance?

32 So, still saw-tooth under GSP? At least GSP has equilibria, though not in dominant strategies. GSP is “reasonably stable”. Payment depends on position, not on bid directly.

33 “Correct” generalization of SP: Vickrey-Clarke-Groves Mechanism Assume “no ebay”: CTR depends only on slot Assign the slots in bid order … (again) Advertiser X has to pay for loss in (bid * clicks) (Sum of (b i ¢ CTR i ) before X enters the game - sum of (b i ¢ CTR i ) of other players after X enters) / CTR X Example: …. next slide …

34 “Correct” generalization of SP: Vickrey-Clarke-Groves Mechanism Same scenario again: 3 advertisers: bids $20, $18, $10 (their valuations) Two slots: CTR 5%, CTR 4% [think: 5 clicks, 4 click] Slots go to bids $20 and $18 respectively. Corresponding payments? Advertiser 1: W/o adv. 1, sum over adv. 2 and 3 $18* $10*0.04 = $1.30 W/ adv. 1, sum only over adv. 2 $18*0.04 = $0.72 Payment by advertiser 1: ($1.30-$0.72)/0.05 = $11.6 (per click) Advertiser 2: Without adv. 2, sum over adv. 1 and 3 $20* $10*0.04 = $1.40 With adv. 2, sum only over adv. 1 $20*0.05 = $1.00 Payment by advertiser 2: ($1.40-$1.00)/0.04 = $10 (per click)

35 “Correct” generalization of SP: Vickrey-Clarke-Groves Mechanism Theorem: Bidding “truthfully” is a dominant strategy in this mechanism. Vickrey got Nobel prize in economics in ‘ 96 (a few days before his death) VCG mechanism not used for web advertising! Still have ebay problem …

36 Addressing the “ebay problem” Slot assignment by revenue order Have weights for different advertisers Measure probability of click (= quality of ad) ctr ebay = 0.001, ctr ingmar = 0.01 Assign slots in (decreasing) order of ctr i ¢ b i (~ revenue for search engine) Pay minimum bid needed to stay ahead: p i = ctr i+1 ¢ b i+1 /ctr i Revenue ordering vs. bid ordering 30% more revenue per page impression

37 GSP in Practice GSP with revenue ordering used by all major search engines But with modifications … –minimum price (“reserve price”) –number of slots is variable –quality of landing page to avoid frustration –positional constraints –…–…

38 “Putting Nobel Prize-winning theories to work” ? Google’s unique auction model uses Nobel Prize-winning economic theory to eliminate the winner’s curse – that feeling that you’ve paid too much. While the auction model lets advertisers bid on keywords, the AdWords™ Discounter makes sure that they only pay what they need in order to stay ahead of their nearest competitor.

39 Knowing the Click-Through Rates How do we know the click-through rates? –Estimated from past performance What if a new advertiser arrives? –If we show his ads, lose chance to show other good ads. –If we don’t show his ads, might not discover a new high-performing ad. Solution:Explore-Exploit What is the problem?

40 Multi-Armed Bandits $1 $3 $10 $4 $2 $10 $8 $4 $6 First, explore! Now, exploit! Expect $2 Expect $8 Expect $6

41 Multi-Armed Bandits Set of k bandits, i.e. real distributions B = {R 1, …, R K } ¹ k = mean(R k ) ¹ * = max k { ¹ k } Game is played for H rounds Regret: ½ (H) = H ¹ * -    r t where r t is the (random) reward at time t Want ½ (H)/H ! 0 with probability 1 as H ! 1 Suggestions?

42 Multi-Armed Bandits Epsilon-greedy strategy: The currently best bandit is selected for a fraction of 1- ² of the rounds, and a bandit selected uniformly at random for a fraction of ². Restless Bandit Problem – distributions change Arm Acquiring Bandit – new bandits arrive

43 Practical CTR Complications CTR depends also presence/absence of other ads And what the user has seen in the past And on quality of search results Should we show the worst search results so that users are “desperate” and click the ads?

44 Fraud Click fraud –On opponent's paid search results (10%-20%) –On the contextual ads of your homepage Impression fraud –Give your opponent a lower CTR –Lowers the amount you’ll have to bid What should search engines do? –All search engines do not bill for fraudulent clicks –See case “Lane’s Gifts v. Google” Other kinds?

45 Does CPA Solve Fraud? Click fraud no longer works. Only get charged for “actions”, aka conversion. Now advertisers can cheat by underreporting conversions. Can Y!/G trust advertisers? Have to hand over monitoring to search engine. Can advertisers trust Y!/G? Very, very sparse data to derive estimates. Hard for Y!/G to make optimal decisions. End of story?

46 Mobile Sponsored Search Mobile devices offer more context –Location –More short-term needs -> more monetizable More focused user attention –Can’t just open another tab while loading More positive associations –People tend to feel “closer” to their mobile

47 Summary of Part 1 Search advertising is a multi-billion dollar business Allows very targeted advertising Fair payment model: you only pay for clicks (CPC) How much you pay depends on –Your bid –Fraction of people clicking your ad (CTR) Payment reasonably stable and “gaming” is difficult Practical problems such as learning CTRs and avoiding click fraud

48 Exercise 6 teams …

49 Part 2 Display Advertising


51 Historical note: banners Banners seem to be the oldest standard format in use According to Wikipedia the first banner ad ever was sold in 1993 by Global Network Navigator (GNN) to Heller, Ehrman, White, & McAuliffe, a legal firm popular in Silicon Valley. GNN was a popular pre-Yahoo! directory eventually sold to AOL in 1995 Heller Ehrman White & McAuliffe was started in 1890 and went bankrupt in In 1929 they negotiated the financing of the Bay Bridge.

52 Display Advertising Usually sold on a CPM basis Guaranteed delivery (GD): deliver 30 million impressions on in Feb ’11 –Typically large, “premium” campaigns Non-guaranteed delivery (NGD): sold in auctions on the spot market at varying prices –Typically smaller, ad-hoc campaigns

53 How much does it cost?

54 Components of a GD system 1. Forecast supply and demand How many users will visit a page in a certain period? 2. Forecast NGD pricing How much could we get on the spot market? 3. Admission control & pricing 30m impressions in July 2011 on Should we accept the contract? Can we meet the guarantee? What price should we charge? How are other contracts impacted? 4. “Optimal” allocation of impressions to active contracts What is the objective function? Cannot re-run after every impression due to scalability. 5. Ad serving Demand (long term) depends on quality of allocation! “females, 30-50, high income” more valuable than “teenager drop-outs” Cannot only use low value impressions to satisfy contract “Simple” (stochastic) packing problem?

55 Optimal Allocation Optimal allocation –Maximize a stated objective function subject to supply and demand constraints What objective? –Value of the remaining inventory? - Good for publisher –Maximize quality? - Good for advertiser Need to balance utilities: publisher, advertiser, user, & network!

56 Representative Allocations A. Ghosh & al., “Randomized Bidding for Maximally Representative Allocation”, Yahoo! Research Technical Report Unless the targeting is very fine-grained there is a wide spectrum of quality of impressions matching a typical contract Contract says: Male, US, auto interests. What should be supply to this contract? –Is it OK to supply 100% 15 year-old males, daydreaming about cars, weekly allowances $25 ? –Advertiser probably wants/expects a representative sample of car-buying US male population

57 Publisher’s potential strategies Assume publisher has just one GD contract Suboptimal strategy: –Deliver first all impressions to the contract –Only after the contract is met, sell in spot market Bad for the publisher because some of the GD pageviews may fetch lot more money on the spot than the contract value Better strategy –Put up every pageview on auction (as a seller) –Also place a bid on it for the contract (as a buyer) –Value determined by probability & penalty of not fulfilling the contract Why suboptimal?

58 Publisher-optimal bid strategy If target is 30 million, place the smallest constant bid in each round so that exactly 30 million pageviews are won All excess inventory will be sold to someone else (not the GD contract) at a higher price. “Unfair” to the GD contract –All impressions delivered are of low value 2 a.m. viewers viewers from poor neighborhoods basically, viewers nobody wanted!

59 Volume vs. price of winning bids on spot market Volume = number of impressions sold at p ~ price density Price p Price on sport market used as proxy for “quality” of impression

60 Publisher-Optimal Volume Price Find position for the arrow such that area before the arrow = d (GD Advertiser gets the cheapest stuff)

61 Advertiser-Optimal Volume Price Find position for the arrow such that area after the arrow = d (GD Advertiser gets the most expensive stuff)

62 Compromises The GD contract could get half of the bottom stuff and half of the top stuff More fine-grained: –Of the supply selling at every price, give d/s fraction to the GD contract. –Then, price distribution in GD mirrors the intrinsic distribution in the total supply. –Objective function must penalize deviation from this ideal.

63 Problem setting Assume the publisher knows the distribution of the external winning bid on the spot market Notation –p = price (winning bid) –f(p) = price density = the highest bid is drawn i.i.d. from f –s = total supply (inventory) of impressions –d = demand (GD volume) for the contract –t = target spend per impression (budget) d/s is the fraction of the total supply that needs to be delivered to the (unique!) contract

64 Find an allocation a(p) a(p)/s = fractional allocation to GD at price p, that is: –There are s*f(p)*dp impressions available at price p (or rather in interval [p,p+dp) –The GD contract gets a(p)/s * s*f(p)*dp = a(p)*f(p)*dp impressions at price p Ideal: a(p)/s = d/s for all p Objective: close to this ideal u measures distance

65 Allocation Constraints a() is not assumed continuous a priori If indeed a(p)/s = d/s for all p, constraint is satisfied!

66 Allocation Constraints = the dollar amount “lost” due meeting the contract. So we must have Recall t = the average budget per impression. Publisher does get more than this per impression.

67 Final Optimization Problem Minimize over a() Subject to No solution if t (cost per impression) is too small.

68 Possible distance: Kullback-Leibler divergence K-L divergence between two nonnegative functions is

69 K-L Optimization Problem Minimize over a() Subject to Parameter t governs revenue-fairness trade-off

70 Bidding strategy Now we have found an optimal allocation –At price p give fraction a(p)/s to GD How can we implement the optimal allocation a(p) in the auction environment? –We have to bid randomly –Bidding the same amount each round is suboptimal

71 Stochastic Bidding Recall a(p)/s is the fraction of supply available at price p that should be won for GD At price p, what fraction of the supply will be won for GD? Fraction won = prob{GD bid > p} = 1 – H(p) –H(p) is the GD bid distribution (cdf) –a(p)/s = 1 – H(p) Get a(p)/s from optimization, convert to H(p) –a(p) non-increasing Enter auction with probability a(0)/p

72 Targeting Which ads could be shown on a page via the spot market? Only they participate in bidding for the impressions.

73 Contextual Targeting

74 Taken from: How would you do it?

75 Demo Show textual ads Also sold on a CPC basis Which “queries” should be triggered by page?

76 Phrase Extraction for Contextual Advertising “Finding Advertising Keywords on Web Pages”, Yih et al., 2006 Goal: given a page find phrases that are good for placing ads Reverse search problem: given a page, find the queries that would match (summarize) the content of this page Select ads based on a single selected keyword: –Contextual Advertising translated into database approach of Sponsored Search –Reuse of the Sponsored Search infrastructure – lower cost –Ad Networks earn less per impression in CA Lower click-through rates (high-variance) Lower conversion (less clear intent) revenue share with the publisher

77 System Architecture Input: web page 1.Preprocessor process html -> text 2.Candidate Selector generate candidates = candidate bid phrases 3.Classifier score the candidates 4.Postprocessor Combine scores -> probability of being “useful” Output: bid phrases Machine learning?

78 1. Preprocessor Translate HTML into plain text Preserve the blocks in the original document Preserve info about outgoing anchor text, meta tags Open source HTML parser for scraping – BeautifulSoup Part-of-Speech (POS) tagger – record the type of the word Chunker – detecting noun phrases

79 2. Candidate Selection All phrases of length up to 5 (including single words) –Within a single page block (sentence) Two dimensions of candidate selection: –Individual occurrences extracted separately vs. combining all occurrences into entry per page (separate vs. combined) –Keep phrases or break up into individual words Label individual words with their relationship with a phrase (if phrases are broken up): –Beginning of a phrase –Inside a phrase –Last word of a phrase –…

80 3. Classifier Given a phrase predict if it is “keyword” usable for selecting ads –“adverse affects of coffee” vs. “sat down on breakfast table” For the whole phrase a single binary classifier –Logistic regression model P(Y=1|x) = 1/(1 + e -wx ) –x is vector of features of a given phrase –w is a vector of importance weights learned from the training set Decomposed – multi label classifier (B,I,L,…) –P(Y i =1|x) = e xw i /(  i e xw j )

81 3. Classifier: Features Linguistic features: is a noun; is a proper name; is a noun phrase; are all words in the phrase of the same type Capitalization: any/all/first word capitalization Section based features: –Hypertext – is the feature extracted from anchor text –Title, Meta tags, URL IR features: tf, idf, log(tf), log(idf), sentence length, phrase length, relative location in the document Query log features: log(phrase frequency), log(first/second/interior word frequency) Feature reconciliation –Binary features – OR of all occurrences –Real valued features – min Which features are most important?

82 4. Postprocessor Score reconciliation: instance with the highest score Separate words -> phrase probability: –p1= probability of a phrase: product of the confidence of the classification of each term –p0 = probability of all the words of the phrase being outside a keyword –score = p1/(p1+p0)

83 Experiments: Data 828 pages Indexed by MSN Have ads In the Internet Archive One page per domain Eliminate foreign and adult pages Editors (8) instructed to seek highly prominent keywords with advertising potential

84 Experiments: Metrics Editorial judgments Precision-recall – might be too difficult –Too long for the judges to find all the relevant phrases –Given a phrase – influence the judges A proxy for Precision-Recall –top-1 = top-1 result is in the list selected by the editor, count across the set of pages –top-10 = % of top-10 results in the editor set, averaged over the set of pages

85 Experiments: Results Best performance for combining occurrences and not breaking up into word.

86 Demographic Targeting Image is taken from:

87 A Glimpse at my Own Work

88 Behavioral Targeting […] for instance, if a visitor has a recent history of researching SUVs and is a regular visitor of Yahoo! Music, Yahoo! BT will have the insights to serve up a relevant SUV ad while the visitor is browsing the Yahoo! Music homepage.

89 Summary of Part 2 Display ads usually less targeted than search ads Translates to lower CTRs Ads sold in contracts (GD) and on the spot (NGD) Different targeting options Need lots of user data for good targeting –Yahoo!, Google, Facebook, …

90 Part 3 Afterthoughts

91 Banner Blindness People learn to ignore ads … … even when they are highly relevant –“Banner Blindness: The Irony of Attention Grabbing on the World Wide Web”, Benway ‘98 Danger of falling CTRs due to over-imposure –Might be beneficial to show less advertising

92 Search Result Bidding In current sponsored search systems, advertisers bid on query terms Could also bid on the search results –Show my ad whenever is returned –Show my ad whenever xyz appears in a snippet Why could this be useful?

93 Next time, Feb 25/26, 2011 This time I focused on breadth Next time I’ll focus on depth Which topics did you find most interesting? Do you want more theory? More of an “economic overview”? More hands-on insights? More academic papers?

94 Paid Summer Internships at Y! Research Barcelona Cool location –Best beach City in the world (NG) Cool colleagues –international, dynamic, open environment Cool data –search, mail, toolbar, finance, Flickr, … Cool projects –The goal is *always* to publish at top venues Deadline JANUARY 15

95 Dziekuje !

Download ppt "Online Advertising Open lecture at Warsaw University January 7/8, 2011 Ingmar Weber Yahoo! Research Barcelona Please interrupt me."

Similar presentations

Ads by Google