Online Learning: An Introduction

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Online Social Networks and Media
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Online Learning: An Introduction Travis Mandel University of Washington

Overview of Machine Learning Supervised Learning Decision Trees, Naïve Bayes, Linear Regression, K-NN, etc. Unsupervised Learning K-means, Association Rule Learning, etc. What else is there????

Overview of Machine Learning Supervised Learning Unsupervised Learning Labeled Training Data ML Algorithm Classifier/ Regression function Unlabeled Training Data ML Algorithm Clusters/Rules Is this the best way to learn?

Overview of Machine Learning Supervised Learning Unsupervised Learning Semi-Supervised Learning Active Learning Interactive Learning Online Learning Reinforcement Learning

Online Classification Observe One Training Instance ML Algorithm Observes Label, Receives Loss/Reward Issues prediction

More generally… Online Learning! Observe observations ML Algorithm Observe Loss/Reward Function Receives Loss/Reward Selects output

Typical Assumption: Statistical Feedback Instances and labels drawn from a fixed distribution D

Recall: Distribution “drug”? “nigeria”? Spam? Prob N 0.5 Y 0.1 0.05

Alternate Assumption: Adversarial Feedback Instances and labels drawn adversarially Spam detection, anomaly detection, etc. Change over time a HUGE Issue!

Change over time example

How can we measure performance? What we want to maximize: Sum of rewards Why could this be a problem?

Alternative: Discounted Sum of Rewards 𝑟 0 + 𝑟 1 + 𝑟 2 + 𝑟 3 +… 𝑟 0 + γ𝑟 1 + γ 2 𝑟 2 + γ 3 𝑟 3 +… Guaranteed to be a finite sum if γ<1, r<c! One problem… If a small percent of times you never find the best option (Incomplete learning), it doesn’t matter much! But intuitively it should matter a LOT!

Regret Sum of rewards compared to the BEST choice in hindsight! If training a linear model, optimal weights with overfitting! See other examples in a second Can get good regret even if perform very poorly in absolute terms 𝑡=1 ∞ 𝑟 𝑡 ∗ − 𝑡=1 ∞ 𝑟 𝑡

We really like theoretical guarantees! Anyone know why? Running online  uncertain future Hard to tell if you reached best possible performance by looking at data

Observe Loss/Reward Function Recall… Observe observations ML Algorithm Observe Loss/Reward Function Receives Loss/Reward Selects output

Experts Setting N experts Assume rewards in [0,1] … Observe expert recommendations ML Algorithm Observe Loss/Reward For each expert Receives Loss/Reward Selects Expert

Regret in the Experts Setting Do as well as the best expert in hindsight! Hopefully you have (at least some) good experts!

“Choose an Expert?” Each expert could make a prediction (different ML models, classifiers, hypotheses, etc.) Each expert could tell us to perform an action (such as buy a stock) Experts can learn & change over time!

Simple strategies Pick Expert that’s done the best in the past? Pick Expert that’s done the best recently?

No! Adversarial! For ANY deterministic strategy… The adversary can know which expert we will choose, and give it low reward/high loss, but all others high reward/low loss! Every step, get further away from best expert  Linear regret 

Solution: Randomness What if the algorithm has access to some random bits? We can do better!

EG: Exponentiated Gradient 𝑤 1 = 1,…,1 𝑤 𝑡,𝑖 = 𝑤 𝑡−1,𝑖 𝑒 𝜂( 𝑟 𝑡,𝑖 −1) 𝜂= 2 𝑙𝑜𝑔𝑁 𝑇 Probabilities are normalized weights

EG: Regret Bound 𝑂( 2𝑇 log 𝑁 )

Problem: What if the feedback is a little different? In the real world, we usually don’t get to see what “would have” happened Shift from predicting future to taking actions in an unknown world Simplest Reinforcement Learning problem

Multi-Armed Bandit (MAB) Problem N arms Assume Statistical feedback Assume rewards in [0,1] … Observes Loss/Reward For The Chosen Arm ML Algorithm Suffers Loss/Reward Pulls Arm

Regret in the MAB Setting Since we are stochastic, each arm has a true mean Do as well as if pulled always pulled arm with highest mean

MABs have a huge number of applications! Internet Advertising Website Layout Clinical Trials Robotics Education Videogames Recommender Systems … Unlike most of Machine Learning, how to ACT, not just how to predict!

A surprisingly tough problem… “The problem is a classic one; it was formulated during the war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage. “ Prof. Peter Whittle,  Journal of the Royal Statistical Society, 1979

MAB algorithm #1: ε-first Phase 1: Explore arms uniformly at random Phase 2: Exploit best arm so far Only 1 parameter: how long to wait! Problem? Not very efficient Makes statistics easy to run

MAB algorithm #2: ε-greedy Flip a coin: With probability ε, explore uniformly With probability 1- ε, exploit Only 1 parameter: ε Problem? Often used in robotics and reinforcement learning

Idea The problem is uncertainty… How to quantify uncertainty? Error bars If arm has been sampled n times, With probability at least 1- 𝛿: 𝜇 −𝜇 < log⁡( 2 𝛿 ) 2𝑛

Given Error bars, how do we act?

Given Error bars, how do we act? Optimism under uncertainty! Why? If bad, we will soon find out!

One last wrinkle How to set confidence 𝛿 Decrease over time If arm has been sampled n times, With probability at least 1- 𝛿: 𝜇 −𝜇 < log⁡( 2 𝛿 ) 2𝑛 𝛿 = 2 𝑡

Upper Confidence Bound (UCB) 1. Play each arm once 2. Play arm i that maximizes: 𝜇 𝑖 + 2log⁡(𝑡) 𝑛 𝑖 3. Repeat Step 2 forever

UCB Regret Bound 𝑂( 𝑁𝑇𝑙𝑜𝑔𝑇 ) 𝑂 log 𝑇 min 𝑖 𝜇 ∗ − 𝜇 𝑖 Problem –independent: 𝑂( 𝑁𝑇𝑙𝑜𝑔𝑇 ) Problem –dependent: 𝑂 log 𝑇 min 𝑖 𝜇 ∗ − 𝜇 𝑖 Can’t do much better!

A little history… William R. Thompson (1933): Was the first to examine MAB problem, proposed a method for solving them 1940s-50s: MAB problem studied intentively during WWII, Thompson ignored 1970’s-1980’s: “Optimal” solution (Gittins index) found but is intractable and incomplete. Thompson ignored. 2001: UCB proposed, gains widespread use due to simplicity and “optimal” bounds. Thompson still ignored. 2011: Empricial results show Thompson’s 1933 method beats UCB, but little interest since no guarantees. 2013: Optimal bounds finally shown for Thompson Sampling

Thompson’s method was fundamentally different!

Bayesian vs Frequentist Bayesians: You have a prior, probabilities interpreted as beliefs, prefer probabilistic decisions Frequentists: No prior, probabilities interpreted as facts about the world, prefer hard decisions (p<0.05) UCB is a frequentist technique! What if we are Bayesian?

Bayesian review: Bayes’ Rule 𝑝 𝜃 𝑑𝑎𝑡𝑎)= 𝑝 𝑑𝑎𝑡𝑎 𝜃 𝑝(𝜃) 𝑝(𝑑𝑎𝑡𝑎) Posterior 𝑝 𝜃 𝑑𝑎𝑡𝑎)∝𝑝 𝑑𝑎𝑡𝑎 𝜃 𝑝(𝜃) Prior Likelihood

Bernoulli Case What if distribution in the set {0,1} instead of the range [0,1]? Then we flip a coin with probability p  Bernoulli Distribution! To estimate p, we count up numbers of ones and zeros Given observed ones and zeroes, how do we calculate the distibution of possible values of p?

Beta-Bernoulli Case Beta(a,b)  Given a 0’s and b 1’s, what is the distribution over means? Prior  pseudocounts Likelihood  Observed counts Posterior  psuedocounts + observed counts Image credit: Wikipedia.

How does this help us? Thompson Sampling: 1. Specify prior (in Beta case often Beta(1,1)) 2. Sample from each posterior distribution to get estimated mean for each arm. 3. Pull arm with highest mean. 4. Repeat step 2 & 3 forever

Thompson Empirical Results Image taken from {Chapelle & Li 2011). And shown to have optimal regret bounds just like (and in some cases a little better than) UCB!

Problems with standard MAB formulation What happens if adversarial? (studied elsewhere) What if arms are factored and share values? What happens if there is delay? What if we know a heuristic that we want to incorporate without harming guarantees? What if we have a prior dataset we want to use? What if we want to evaluate algorithms/parameters on previously collected data?

Problems with standard MAB formulation What happens if adversarial? (studied elsewhere) What if arms are factored and share values? What happens if there is delay? What if we know a heuristic that we want to incorporate without harming guarantees? What if we have a prior dataset we want to use? What if we want to evaluate algorithms/parameters on previously collected data?

A look toward more complexity: Reinforcement Learning Sequence of actions instead of take one and reset Rich space of observations which should influence our choice of action Much more powerful, but a much harder problem!

Summary Online Learning Adversarial vs Statistical Feedback Regret Experts Problem EG algorithm Multi-Armed bandits e-greedy, e-first UCB Thompson Sampling

Questions?