Presentation on theme: "Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge."— Presentation transcript:
Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge
Complex large-scale data in the enterpriseComplex large-scale data in the enterprise –What kind of data is available? –What technologies are used? –Tasks and enterprise-specific challenges? Methodology:Methodology: –Bayesian Inference in Factor Graph Models –PQL: Using SQL to describe probability models Applications:Applications: –Gamer Rating and Matchmaking: TrueSkill –Click-Through Rate Prediction: AdPredictor –Large-Scale Recommendations: Matchbox
Joint work with Tom Minka & Phillip Trelford
Online Services DivisionOnline Services Division –Web index –Search and Ad click logs (12-15 TB / day) –Hotmail, Instant messaging, Internet Explorer (100s million users) –MSN portal and Bing maps Xbox Live Gaming ServiceXbox Live Gaming Service –User transaction log data –Ranking and matchmaking data –Game instrumentation for user testing
Development and Software InstrumentationDevelopment and Software Instrumentation –Watson (customer feedback data) –Source depot (MS source code, e.g., Office, Windows) –Multilingual technical documentation BusinessBusiness –Customer databases –Sales and Marketing
Prediction of user behaviour and preferencesPrediction of user behaviour and preferences –Improve web search –Improve targeting for advertising –Spam filtering and content prioritisation Improve user experienceImprove user experience –Matchmaking for games –Multi-modal user interfaces (Natal, speech) Improve software development processImprove software development process –Improve productivity of developers –Analyse software for defects
Relational Databases/SQLRelational Databases/SQL –Great agility for analysis and reliability for business –Limited scalability –Need to import data into SQL Windows HPCWindows HPC –Complex computations / fine grained parallelism –Need to move data to HPC cluster CosmosCosmos –Take the computation to the data –Super efficient stream based computations
PrivacyPrivacy –Privacy limit the ways in which data can be used –Interesting trade-offs (differential privacy) IncentivesIncentives –Data produced by self-interested agents –Need to design incentive compatible mechanisms Exploration/ExploitationExploration/Exploitation –Results of inference feed back into business process and determine future observations. –Need to aim at long-term benefits
Definition: Graphical representation of product structure of a function (Wiberg, 1996)Definition: Graphical representation of product structure of a function (Wiberg, 1996) –Nodes: = Factors = Variables –Edges: Dependencies of factors on variables. Question:Question: –What are the marginals of the function (all but one variable are summed out)? –What is the mode of the function?
ss s2s2s2s2 s2s2s2s2 s1s1s1s1 s1s1s1s1 Bayes lawBayes law Factorising priorFactorising prior Factorising likelihoodFactorising likelihood Sum out latent variablesSum out latent variables t1t1t1t1 t1t1t1t1 t2t2t2t2 t2t2t2t2 dd yy
v v w w x x f1(v,w)f1(v,w) f1(v,w)f1(v,w) f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Sum of products becomes product of sums of all messages from neighbouring factors to variable! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)
w w x x f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Factors only need to sum out all their local variables! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)
x x f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Variables pass on the product of all incoming messages! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)
Three update equations (Aji & McEliece, 1997)Three update equations (Aji & McEliece, 1997) Update equations can be directly derived from the distributive law.Update equations can be directly derived from the distributive law. Efficient for messages in the exponential family.Efficient for messages in the exponential family. Calculate all marginals at the same time.Calculate all marginals at the same time.
Problem: The exact messages from factors to variables may not be closed under products.Problem: The exact messages from factors to variables may not be closed under products. Solution: Approximate the marginal as well as possible in the sense of minimal KL divergence.Solution: Approximate the marginal as well as possible in the sense of minimal KL divergence. Expectation Propagation (Minka, 2001): Approximate the marginal by moment-matching resulting inExpectation Propagation (Minka, 2001): Approximate the marginal by moment-matching resulting in
Map-Reduce for IID data –Map: Nodes compute messages m f i s from data y i and m f i s –Reduce: Combine messages m f i s into p s by multiplication Caveats: –All approximate data factors need the incoming message m s f i ! –All messages m f i s need to be stored if the same data point is considered multiple times s s y1y1 y1y1 y2y2 y2y2 y3y3 y3y3
Joint work with Ralf Herbrich & Jurgen Van Gael
People = AUGMENT DB.People ADD weight FLOAT
People FACTOR Normal(p.weight,75.0,25.0) FROM People p People
DrVisit People FACTOR Normal(g.weight, p.weight, 1.0) FROM People p, DrVisit g WHERE p.PersonID = g.PersonID DrVisit People
Joint work with Tom Minka & Phillip Trelford
Given:Given: –Match outcomes: Orderings among k teams consisting of n 1, n 2,..., n k players, respectively Questions:Questions: –Skill s i for each player such that –Global ranking among all players –Fair matches between teams of players
TrueSkill: Superfast convergence to True Skills Level char (Halo 2 Beta) SQLwildman (Halo 2 Beta) char (TrueSkill) SQLwildman (TrueSkill) Games played
LeaderboardLeaderboard –Global ranking of all players MatchmakingMatchmaking –For gamers: Most uncertain outcome –For inference: Most informative –Both are equivalent!
Joint work with Joaquin Quiñonero Candela, Onno Zoeter, Tom Borchert, Phillip Trelford
Advantages of improved probability estimates:Advantages of improved probability estimates: –Increase user satisfaction by better targeting –Fairer charges to advertisers –Increase revenue by showing ads with high click-thru rate Display (according to expected revenue) – Charge (per click) – $1.00 $2.00 $0.10 * 10% * 4% * 50% =$0.10 =$0.08 =$0.05 $0.80 $1.25 $0.05
Client IP Exact Match Broad Match Match Type Position ML-1 SB-1 SB-2 P(pClick) + +
No Click Click w1w1 w1w1 w2w2 w2w2 s s c c + +
AdPredictor is now running 100% Paid Search traffic in Microsofts Bing Search EngineAdPredictor is now running 100% Paid Search traffic in Microsofts Bing Search Engine Relevance and Click-Through Rate of Ads improvedRelevance and Click-Through Rate of Ads improved Calibrated CTR prediction provides solid foundation for further improvementsCalibrated CTR prediction provides solid foundation for further improvements AdPredictor explored for other tasks such as contextual and display advertisingAdPredictor explored for other tasks such as contextual and display advertising
Joint work with David Stern and Ralf Herbrich
AA BB CC DD Users Items ?????? Metadata?
User ID Male Female Gender Country UK USA 1.2m Height Item ID Horror Movie Genre Drama Documentary Comedy
rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02 ** ++ Message update functions powered by Infer.net
Preference Cone for user
Great variety of data sources and tasksGreat variety of data sources and tasks Challenges: privacy, incentives, explorationChallenges: privacy, incentives, exploration Tools: SQL, No-SQL, HPCTools: SQL, No-SQL, HPC Modelling platform (Factor Graphs & PQL):Modelling platform (Factor Graphs & PQL): –Represent uncertainty –Composable models –Distributed, data-centric computation Applications: TrueSkill, AdPredictor, MatchboxApplications: TrueSkill, AdPredictor, Matchbox Thanks!Thanks!