TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.

Slides:

Advertisements

Similar presentations

SADNA – Ad Auction Session #2 Yishay Mansour Mariano Schain.

Advertisements

QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,

Fast Convergence of Selfish Re-Routing Eyal Even-Dar, Tel-Aviv University Yishay Mansour, Tel-Aviv University.

Chapter 5 Fundamental Algorithm Design Techniques.

1 Regret-based Incremental Partial Revelation Mechanism Design Nathanaël Hyafil, Craig Boutilier AAAI 2006 Department of Computer Science University of.

DBLA: D ISTRIBUTED B LOCK L EARNING A LGORITHM F OR C HANNEL S ELECTION I N C OGNITIVE R ADIO N ETWORKS Chowdhury Sayeed Hyder Department of Computer Science.

SADNA – Ad Auction lecture #3 Time Series Yishay Mansour Mariano Schain.

CompLACS Composing Learning for Artificial Cognitive Systems Year 2: Specification of scenarios.

Merge Sort 4/15/2017 6:09 PM The Greedy Method The Greedy Method.

Regret Minimizing Audits: A Learning-theoretic Basis for Privacy Protection Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha Carnegie Mellon.

Welfare and Profit Maximization with Production Costs A. Blum, A. Gupta, Y. Mansour, A. Sharma.

Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.

Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.

Neural Networks Marco Loog.

Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.

1 Auction or Tâtonnement – Finding Congestion Prices for Adaptive Applications Xin Wang Henning Schulzrinne Columbia University.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.

Linear-Programming Applications

A Principled Study of Design Tradeoffs for Autonomous Trading Agents Ioannis A. Vetsikas Bart Selman Cornell University.

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.

HAL R VARIAN FEBRUARY 16, 2009 PRESENTED BY : SANKET SABNIS Online Ad Auctions 1.

Trading Agent Competition (Supply Chain Management) and TacTex-05.

Monte-Carlo Tree Search

A Theoretical Study of Optimization Techniques Used in Registration Area Based Location Management: Models and Online Algorithms Sandeep K. S. Gupta Goran.

NOBEL WP Szept Stockholm Game Theory in Inter-domain Routing LÓJA Krisztina - SZIGETI János - CINKLER Tibor BME TMIT Budapest,

David Pardoe Doran Chakraborty Peter Stone The University of Texas at Austin Department of Computer Science TacTex-09: A Champion Bidding Agent for Ad.

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

Introduction Many decision making problems in real life

STDM - Linear Programming 1 By Isuru Manawadu B.Sc in Accounting Sp. (USJP), ACA, AFM

McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. 1.

1-1 1 McGraw-Hill/Irwin ©2009 The McGraw-Hill Companies, All Rights Reserved.

Predictive Analytics World CONFIDENTIAL1 Predictive Keyword Scores to Optimize PPC Campaigns Vincent Granville, Ph.D. Click Forensics February 19, 2009.

Online Algorithms By: Sean Keith. An online algorithm is an algorithm that receives its input over time, where knowledge of the entire input is not available.

Competitive Queue Policies for Differentiated Services Seminar in Packet Networks1 Competitive Queue Policies for Differentiated Services William.

The Greedy Method. The Greedy Method Technique The greedy method is a general algorithm design paradigm, built on the following elements: configurations:

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

Iterated Prisoner’s Dilemma Game in Evolutionary Computation Seung-Ryong Yang.

Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.

Spring 2008The Greedy Method1. Spring 2008The Greedy Method2 Outline and Reading The Greedy Method Technique (§5.1) Fractional Knapsack Problem (§5.1.1)

R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.

Towards Robust Revenue Management: Capacity Control Using Limited Demand Information Michael Ball, Huina Gao, Yingjie Lan & Itir Karaesmen Robert H Smith.

Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.

By: Kenny Raharjo 1. Agenda Problem scope and goals Game development trend Multi-armed bandit (MAB) introduction Integrating MAB into game development.

Figure 5: Change in Blackjack Posterior Distributions over Time.

Announcements Homework 1 Full assignment posted..

Imperfect Competition

Basics of financial management Chapter 12

Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

AdWords and Generalized On-line Matching

Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.

Training Deck – SEM & Facebook Ads

Harm van Seijen Bram Bakker Leon Kester TNO / UvA UvA

CPU Scheduling G.Anuradha

Coded Caching in Information-Centric Networks

Capacity Planning For Products and Services

Capacity Planning For Products and Services

Distributed Algorithms for DCOP: A Graphical-Game-Based Approach

Predictive Keyword Scores to Optimize Online Advertising Campaigns

Adaptive Choice of Information Sources

Production and Operations Management

Replications in Multi-Region Peer-to-peer Systems

Capacity Planning For Products and Services

Presentation transcript:

TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010

Overview Machine Learning approach: –Regret Minimization Simple: Adaptive scheme –Robust: Performance Bounds Low dependency on the exact models Started (very) late. –3 weeks (for everything) Influenced many of the strategic decisions

Regret Minimization: Overview Setting: Single player multiple actions At every time step: –Player Chooses a distribution over actions. –observes the gain of each action Can be even adversarial model Partial information model (MAB) Goal: –Maximize cumulative gain Benchmarks: –Best static choice of action (external regret) Guarantee: –Near optimal W.r.t. benchmark Vanishing average regret

RM Algorithm (full information) Main idea: Smoothed Greedy Best action – Highest weight Near-best action – High weight Inferior actions – low weight Non trivial analysis Many algorithms Polynomial Weights: Parameter u Maintain weights w i,t p i,t = w i,t /W t Initially w i,1 =1, W 1 =m At time step t: observed gains g i,t-1 : w i,t =w i,t-1 (1+u*g i,t-1 )

Applying Regret Minimization to AA: Challenges Partial Information –Explore vs. Exploit –There are Partial Information (MAB) Regret Minimization algo., –Similar regret bounds Higher dependency on the action space More time for initial exploration Very Large Action Space –Action = (bid, ad type, budget limit) for every query –Observed gain = Value Per Unit Sold for every query –Theoretical results may not directly apply

The elements of TAU scheme (Almost) constant high bids on specialty queries: –Reduce action space! –Win impression for every user in population – ease exploration! –Also… High conversion rate, High click-through rate, High revenue Adaptive score: based on Value Per Unit Sold: –Main limitation is capacity units –Use regret minimization to select action distribution Fractional allocation of capacity based on score –Based on regret minimization output Profitable queries gets most of the capacity –Maintain exploration a minimum budget to probe all queries and adapt to trends

Software reports Overall Capacity Control Analysis sales Analysis Analysis: Score, Est. Allocation quota scores, est. sales Bid Bid: cpc, limit est. cpc, est. convrate

Plans / Enhancements Features : –Burst Identification –Bottom fishing –Tuning parameters to capacity –ML to estimate sales –Reinforced learning of capacity allocation decisions Post Competition analysis: Validate Robustness –Varying game simulation parameters

Mariano Schain Thank You