Some Vignettes from Learning Theory Robert Kleinberg Cornell University Microsoft Faculty Summit, 2009.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

The K-armed Dueling Bandits Problem
An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.
ICML 2009 Yisong Yue Thorsten Joachims Cornell University
Recommender Systems & Collaborative Filtering
TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.
Presenting work by various authors, and own work in collaboration with colleagues at Microsoft and the University of Amsterdam.
Optimizing search engines using clickthrough data
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Introduction to Network Mathematics (3) - Simple Games and applications Yuedong Xu 16/05/2012.
No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007.
Beat the Mean Bandit Yisong Yue (CMU) & Thorsten Joachims (Cornell)
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Eye Tracking Analysis of User Behavior in WWW Search Laura Granka Thorsten Joachims Geri Gay.
Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Algoritmi per Sistemi Distribuiti Strategici
Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Optimizing Recommender Systems Every.
George Lee User Context-based Service Control Group
Active Collaborative Filtering Machine Learning Group Department of Computer Science University of Toronto.
6/2/ An Automatic Personalized Context- Aware Event Notification System for Mobile Users George Lee User Context-based Service Control Group Network.
Click Evidence Signals and Tasks Vishwa Vinay Microsoft Research, Cambridge.
Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Top 10 Blackboard Tips for Students. 1. Search Blackboard Blackboard has a search tool, which allows you to search through all the course web sites and.
Centinel tournament ● A deck: the numbers in random order ● A game lasts until no numbers are left in deck ● A game is played like this (first player.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Theoretical Perspectives for Technology Integration.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Reinforcement Learning (1)
Commitment without Regrets: Online Learning in Stackelberg Security Games Nika Haghtalab Carnegie Mellon University Joint work with Maria-Florina Balcan,
Online Search Evaluation with Interleaving Filip Radlinski Microsoft.
Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.
Machine Learning Theory Maria-Florina (Nina) Balcan Lecture 1, August 23 rd 2011.
Modern Retrieval Evaluations Hongning Wang
Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.
online convex optimization (with partial information)
Recommender systems Drew Culbert IST /12/02.
The Dueling Bandits Problem Yisong Yue. Outline Brief Overview of Multi-Armed Bandits Dueling Bandits – Mathematical properties – Connections to other.
Upper Confidence Trees for Game AI Chahine Koleejan.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Internet Searching Made Easy Last Updated: Lesson Plan Review Lesson 1: Finding information on the Internet –Web address –Using links –Search.
©2015 Apigee Corp. All Rights Reserved. Preserving signal in customer journeys Joy Thomas, Apigee Jagdish Chand, Visa.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
AP STATISTICS LESSON SIMULATING EXPERIMENTS.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Information Retrieval
COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.
Cs Future Direction : Collaborative Filtering Motivating Observations:  Relevance Feedback is useful, but expensive a)Humans don’t often have time.
Modern Retrieval Evaluations Hongning Wang
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
GENETIC ALGORITHM By Siti Rohajawati. Definition Genetic algorithms are sets of computational procedures that conceptually follow steps inspired by the.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Recommender Systems & Collaborative Filtering
Accurately Interpreting Clickthrough Data as Implicit Feedback
Simultaneous Support for Finding and Re-Finding
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Chapter 4 Online Consumer Behavior, Market Research, and Advertisement
Marketing and Advertising in E-Commerce
How to quantify beautiful? How to engage users?
Exploratory search: New name for an old hat?
How does Clickthrough Data Reflect Retrieval Quality?
Chapter 12 Analyzing Semistructured Decision Support Systems
Future Direction : Collaborative Filtering
Interactive Information Retrieval
Presentation transcript:

Some Vignettes from Learning Theory Robert Kleinberg Cornell University Microsoft Faculty Summit, 2009

Prelude: Tennis or Boxing? You’re designing a sporting event with n players of unknown quality Spectators want to see matches between the highest-quality players No preference for variety or for seeing upsets Tennis solution: single-elimination tournament Boxing solution: players challenge the current champion until he/she is defeated Which is optimal? Or is a third alternative better?

Online Learning Algorithms that make decisions with uncertain consequences, guided by past experience

Multi-armed Bandits Decision maker picks one of k actions (slot machines) in each step, observes random payoff Try to minimize “regret” Opportunity cost of not knowing the best action a priori vs.

Multi-armed Bandits Studied for more than 50 years, but the theory is experiencing a renaissance influenced by the Web vs.

Example: Learning to Rank You have many different ranking functions for constructing a list of search results Interactively learn which is best for a user or population of users Elicit quality judgments using “interleaving experiments.” (Radlinski, Korup, Joachims, CIKM’08)

Example: Learning to Rank Much more reliable than other ways of detecting retrieval quality from “implicit feedback” E.g. abandonment rate, query reformulation rate, position of the clicked links This is like multi-armed bandits, but with a twist: you can compare two slot machines, but you can’t just pick one and observe its payoff

Interleaved Filter Choose arbitrary “incumbent”

Interleaved Filter Choose arbitrary “incumbent” Play matches against all other players in round-robin fashion… (noting mean, confidence interval)

Interleaved Filter Choose arbitrary “incumbent” Play matches against all other players in round-robin fashion… (noting mean, confidence interval) … until a challenger is better with high confidence

Interleaved Filter Choose arbitrary “incumbent” Play matches against all other players in round-robin fashion… (noting mean, confidence interval) … until a challenger is better with high confidence Eliminate old incumbent and all empirically worse players Repeat process with new incumbent…

Interleaved Filter Choose arbitrary “incumbent” Play matches against all other players in round-robin fashion… (noting mean, confidence interval) … until a challenger is better with high confidence Eliminate old incumbent and all empirically worse players Repeat process with new incumbent… … until only one player is left

Interleaved Filter This algorithm is information theoretically optimal Boxing is better than tennis!

Interleaved Filter This algorithm is information theoretically optimal Boxing is better than tennis! Thank you, Microsoft! Yisong Yue, the lead student on the project, is supported by a Microsoft Graduate Research Fellowship

Vignette #2: Learning with Similarity Information Recall the multi-armed bandit problem Can we use this for web advertising? Slot machines are banner ads, which one should I display on my site?

Vignette #2: Learning with Similarity Information Recall the multi-armed bandit problem Can we use this for web advertising? Slot machines are banner ads, which one should I display on my site? Scalability issue: there are 10 5 bandits, not 3! On the other hand, some ads are similar to others, and this should help

Solution: The Zooming Algorithm The set of alternatives (ads) are a metric space We designed a bandit algorithm for metric spaces, that starts out exploring a “coarse” action set and “zooms in” on regions that are performing well

Solution: The Zooming Algorithm The set of alternatives (ads) are a metric space We designed a bandit algorithm for metric spaces, that starts out exploring a “coarse” action set and “zooms in” on regions that are performing well

Thank you, Microsoft!! One of many collaborations with MSR over six years … a major influence on my development as a computer scientist Alex Slivkins

What Next? Often, the systems we want to analyze are composed of many interacting learners How does this influence the system behavior? Answering these questions requires combining: Game theory Learning theory Analysis of algorithms

Thank you, Microsoft!!! Joining our team next year… Katrina Ligett (Ph.D. CMU, 2009) Shahar Dobzinski (Ph.D. Hebrew U., 2009) …the top graduates this year in online learning theory and algorithmic game theory An unprecedented postdoc recruiting success for myself and Cornell Brought to you by the Microsoft Research New Faculty Fellowship!