Nonstochastic Multi-Armed Bandits With Graph-Structured Feedback Noga Alon, TAU Nicolo Cesa-Bianchi, Milan Claudio Gentile, Insubria Shie Mannor, Technion.

Slides:

Advertisements

Similar presentations

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Advertisements

How to Schedule a Cascade in an Arbitrary Graph F. Chierchetti, J. Kleinberg, A. Panconesi February 2012 Presented by Emrah Cem 7301 – Advances in Social.

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.

Heuristics for the Hidden Clique Problem Robert Krauthgamer (IBM Almaden) Joint work with Uri Feige (Weizmann)

Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.

Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.

1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.

Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)

Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.

Interchanging distance and capacity in probabilistic mappings Uriel Feige Weizmann Institute.

Noga Alon Institute for Advanced Study and Tel Aviv University

The number of edge-disjoint transitive triples in a tournament.

Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.

Online Ramsey Games in Random Graphs Reto Spöhel Joint work with Martin Marciniszyn and Angelika Steger.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.

Coloring the edges of a random graph without a monochromatic giant component Reto Spöhel (joint with Angelika Steger and Henning Thomas) TexPoint fonts.

Item Pricing for Revenue Maximization in Combinatorial Auctions Maria-Florina Balcan, Carnegie Mellon University Joint with Avrim Blum and Yishay Mansour.

1 Mazes In The Theory of Computer Science Dana Moshkovitz.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005

1 Computing Nash Equilibrium Presenter: Yishay Mansour.

Robust Network Design with Exponential Scenarios By: Rohit Khandekar Guy Kortsarz Vahab Mirrokni Mohammad Salavatipour.

Advanced Topics in Data Mining Special focus: Social Networks.

Coloring random graphs online without creating monochromatic subgraphs Torsten Mütze, ETH Zürich Joint work with Thomas Rast (ETH Zürich) and Reto Spöhel.

Complexity 1 Mazes And Random Walks. Complexity 2 Can You Solve This Maze?

Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.

Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

Computer Science 1 Web as a graph Anna Karpovsky.

Packing Element-Disjoint Steiner Trees Mohammad R. Salavatipour Department of Computing Science University of Alberta Joint with Joseph Cheriyan Department.

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

1 Distributed Computing Optical networks: switching cost and traffic grooming Shmuel Zaks ©

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

online convex optimization (with partial information)

Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.

Small subgraphs in the Achlioptas process Reto Spöhel, ETH Zürich Joint work with Torsten Mütze and Henning Thomas TexPoint fonts used in EMF. Read the.

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

An Algorithmic Proof of the Lopsided Lovasz Local Lemma Nick Harvey University of British Columbia Jan Vondrak IBM Almaden TexPoint fonts used in EMF.

Inoculation Strategies for Victims of Viruses and the Sum-of-Squares Partition Problem Kevin Chang Joint work with James Aspnes and Aleksandr Yampolskiy.

Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.

Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.

Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.

Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

1 The number of orientations having no fixed tournament Noga Alon Raphael Yuster.

Testing the independence number of hypergraphs

Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.

A Optimal On-line Algorithm for k Servers on Trees Author : Marek Chrobak Lawrence L. Larmore 報告人：羅正偉.

CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.

CSE 421 Algorithms Richard Anderson Autumn 2015 Lecture 5.

Stochastic Streams: Sample Complexity vs. Space Complexity

New Characterizations in Turnstile Streams with Applications

Approximating the MST Weight in Sublinear Time

What is the next line of the proof?

Exact Algorithms via Monotone Local Search

Lecture 18: Uniformity Testing Monotonicity Testing

Chapter 5. Optimal Matchings

Structural graph parameters Part 2: A hierarchy of parameters

Constrained Bipartite Vertex Cover: The Easy Kernel is Essentially Tight Bart M. P. Jansen June 4th, WORKER 2015, Nordfjordeid, Norway.

Matrix Martingales in Randomized Numerical Linear Algebra

Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits

The Nonstochastic Multiarmed Bandit Problem

On the effect of randomness on planted 3-coloring models

Introduction Wireless Ad-Hoc Network

The Byzantine Secretary Problem

Locality In Distributed Graph Algorithms

Presentation transcript:

Nonstochastic Multi-Armed Bandits With Graph-Structured Feedback Noga Alon, TAU Nicolo Cesa-Bianchi, Milan Claudio Gentile, Insubria Shie Mannor, Technion Yishay Mansour, TAU and MSR Ohad Shamir, Weizmann

Nonstochastic sequential decision-making K actions and T time steps l t (a) – loss of action a at time t At time t – player picks action X t – incurs loss l t (X t ) – observe feedback on losses Multi-arm bandit: only l t (X t ) Experts (full information): l t (j) for any j 2

Nonstochastic sequential decision-making Goal: – minimize losses – benchmark: The best single action The action j that minimizes the loss – no stochastic assumptions on losses Regret Known regret bounds: – MAB – Experts 3

Motivation – observablity undirecteddirected 4

undirected observation graph ? ? ? ? ? ? ? ? 5

? 3 ? ? ? ? ? ? 6

5 3 ? 1 ? 7 ? ? 7

MAB: no edges Experts: clique ? 3 ? ? ? ? ? ?

Modeling Directed vs Undirected Different types of dependencies Different measures – Independent set – Dominating set – Max Acyclic Subgraph Informed vs Uniformed When does the learner observes the graph – Before – After only the neighbors 9

Our Results Uniformed setting Undirected graph Uniformed setting – Only the neighbors of the node – Independent sets Directed graph – Max Acyclic Subgraph (not tight) – Random Erdos-Renyi graphs Informed setting Directed graphs Regret characterization – dominating sets and ind. set Both expectation and high prob. 10

EXP3-SET Online Algorithm where Theorem 11

EXP3-Set Regret – key lemma Lemma Note: MAB: Q=K Full info. Q=1 Proof: Build an i.s. S – consider action a with minimal Pr[a observed] – Add a to S – Delete a and its neighbors Note 12

EXP3-SET directed case directed graph – Lemma does not hold Example: – Tournament graph j  i iff j<i – probabilities p i =2 -i – α(G)=1 Random graph – Erdos-Renyi edge parameter r – Regret – MAB r=0; Experts r=1 – Note 13

EXP3-SET directed case Upper bound – directed mas(G)=maximum acyclic subgraph of G Tournament – mas(G)=K and α(G)=1 Regret Lower bound - directed Any fixed graph G Regret the graph in advance 14

Dominating set – directed graph ? ? ? ? ? ? ? ? 15

Dominating set – directed graph ? ? ? ? ? ? ? ? 16

EXP3-DOM Simplified version – fixed graph G – D is dominating set log approx Main modification – add probabilities to D induce observability probabilities: Select X t using p t Observe l t (a) for a in S Xt,t weights 17

EXP3-DOM Simple example Transitive observability – tournament action 1 observes all actions – D={1} EXP3-DOM Sample action 1 with prob γ – action 1 is the exploration Otherwise run a MAB – specifically EXP3-SET Intuition – action 1 replaces mixture with uniform 18

Conclusion Observability model – Between MAB and Experts more work to be done Uninformed setting – Undirected graph Informed setting – Directed graph [Kocak, Neu, Valko and R. Muno] improved uniformed 19

EXP3-DOM – main Theorem Theorem: tuning γ Corollary 21

EXP3-DOM – main Theorem Theorem: tuning γ Corollary 22

Outline Model and motivation symmetric observability non-symmetric observability 23

EXP3-DOM: key lemma Lemma – G directed graph, – d - i indegree of i, – α=α(G) Turan’s Theorem – undirected graph G(V,E) Proof: high level – shrink graph G K,G k-1, … – delete nodes step s: – delete max indegree node From Turan’s theorem 24

EXP3-DOM: key lemma (proof) Completing the proof Note, due to edge elimination 25

EXP3-DOM- Key lemma (modified) Lemma (what we really need!) G(V,E) directed graph – IN i indegree of i – r size dominating set; and α size ind. set – p distribution over V p i ≥β 26

EXP3 –DOM: changing graphs Simple – all dom. set same size – approx. same size Problem – different size dom. set can be 1 or K Solution – keep log levels depend on  log 2 (D t )  – algorithm per level Complications – parameters depend on level – setting the learning rate need a delicate doubling Main tech. challenge – handle dynamic adversary. 27

EXP3-DOM receive obs. graph – find dominating set D t logarithmic approximation Run the right copy – Let b t =  log 2 (D t )  – run copy b t log copies For Copy b t – param. depend on b t probabilities: Select X t using p Observe l t (a) for a in S Xt,t weights 28

EXP3-DOM – main Theorem Theorem: tuning γ b 29

Independent set Independent set α(G) [Mannor & Shamir 2012] Tight Regret – α(G) “replaces” K Cons: – requires to observe G – solves an LP each step ? ? ? ? ? ? ? ? 30