Online Learning Yiling Chen. Machine Learning Use past observations to automatically learn to make better predictions or decisions in the future A large.

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Advertisements

A Primal-Dual Approach to Online Optimization Problems.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
Online learning, minimizing regret, and combining expert advice
Follow the regularized leader
1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.
Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.
PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Multiple Instance Learning
x – independent variable (input)
Adaptive Data Collection Strategies for Lifetime-Constrained Wireless Sensor Networks Xueyan Tang Jianliang Xu Sch. of Comput. Eng., Nanyang Technol. Univ.,
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Shuchi Chawla, Carnegie Mellon University Static Optimality and Dynamic Search Optimality in Lists and Trees Avrim Blum Shuchi Chawla Adam Kalai 1/6/2002.
Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,
Machine Learning Rob Schapire Princeton Avrim Blum Carnegie Mellon Tommi Jaakkola MIT.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
A New Linear-threshold Algorithm Anna Rapoport Lev Faivishevsky.
Online Function Tracking with Generalized Penalties Marcin Bieńkowski Institute of Computer Science, University of Wrocław, Poland Stefan Schmid Deutsche.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
CSCI 3160 Design and Analysis of Algorithms Tutorial 12
Online Learning Algorithms
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Machine Learning Algorithms in Computational Learning Theory
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Information Filtering LBSC 796/INFM 718R Douglas W. Oard Session 10, April 13, 2011.
Benk Erika Kelemen Zsolt
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem.
Empirical Research Methods in Computer Science Lecture 6 November 16, 2005 Noah Smith.
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.
Online Transfer Learning Algorithm ~ The Twenty-Third Annual Conference on Neural Information Processing Systems (NIPS2009) Propose the first framework.
Performance of prediction markets with Kelly bettors David Pennock (Yahoo!) with John Langford (Yahoo!) & Alina Beygelzimer (IBM)
CS 188: Artificial Intelligence Spring 2006 Lecture 12: Learning Theory 2/23/2006 Dan Klein – UC Berkeley Many slides from either Stuart Russell or Andrew.
Tracking Malicious Regions of the IP Address Space Dynamically.
Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.
ONLINE LEARNING CS446 -FALL ‘15 Administration Registration Hw2Hw2 is out  Please start working on it as soon as possible  Come to sections with questions.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM.
Mistake Bounds William W. Cohen. One simple way to look for interactions Naïve Bayes – two class version dense vector of g(x,y) scores for each word in.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.
Lecture 2. Bayesian Decision Theory
Dan Roth Department of Computer and Information Science
Better Algorithms for Better Computers
CS b659: Intelligent Robotics
Decision Trees (suggested time: 30 min)
Administration Registration  Hw1 is due tomorrow night
Static Optimality and Dynamic Search Optimality in Lists and Trees
Classification with Perceptrons Reading:
Data Mining Lecture 11.
Introduction to particle filter
CSCI B609: “Foundations of Data Science”
Online Learning Kernels
Introduction to particle filter
CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.
Presentation transcript:

Online Learning Yiling Chen

Machine Learning Use past observations to automatically learn to make better predictions or decisions in the future A large field We are scratching the surface only for part of it.

Example: Click Prediction

Example: Recommender System Netflix challenge

Spam Prediction Unknown Sender Sent to more than 10 people “Cheap” or “Sale” “Dear Sir”…Spam? Need some reasonable concept classes: Disjunctions: Spam if “Dear Sir” and “Sent to more than 10 people” Threshold: Spam if “Dear Sir” + Sent to more than 10 people + Unknown sender > 2

Batch Learning Unknown Sender Sent to more than 10 people “Cheap” or “Sale” “Dear Sir”…Spam? Learning Algorithm Prediction Rule for New Examples

Online Learning Unknown Sender Sent to more than 10 people “Cheap” or “Sale” “Dear Sir”…Spam? 0000? Unknown Sender Sent to more than 10 people “Cheap” or “Sale” “Dear Sir”…Spam? 1111? 0 1 How to update the prediction rule?

Competitive Ratio Optimal offline algorithm: optimal in hind sight Competitive ration = performance of the online algorithm/performance of the optimal offline algorithm

Why We Care? The “Learning from Expert Advice” setting is an information aggregation problem. Spam if “Dear Sir” and “Sent to more than 10 people” Spam if “Dear Sir” + Sent to more than 10 people + Unknown sender > 2 Yahoo!’s spam filter Can we make use of predictions of these “experts”?

Basic Online Learning Setting The learning algorithm sees a new example The algorithm predicts a label for this example After the prediction, the true label is observed Algorithm makes a mistake if Update the prediction rule

Two Goals Minimize the number of mistakes – Hope that (# of mistakes/# of rounds) -> 0 – Assume that there is a perfect target function Minimize regret – Hope that (# of mistakes - # of mistakes by comparator)/# of rounds -> 0 – Adversarial setting

Minimizing the Number of Mistakes

Halving Algorithm Let C be a finite concept class. Assume that there exist c in C such that c( ) =. Then, the number of mistakes made by Halving is no more than log|C|.

Halving Algorithm Current version space contains all functions that are consistent with the observations so far. At each round t, predict label to be the same as if it is chosen by the majority of functions in the current concept space. Update the version space

Monotonic Disjunctions Concept class can be disjunctions of r variables |C| can be large Halving is not computationally tractable

The Winnow Algorithm

# mistakes <= O(log rn) We can treat each variable (feature) as an expert Winnow updates weights of the expert dynamically

Minimizing Regret No assumption on the distribution of examples No assumption on target function Adversarial setting

# Mistakes <= 2.41 (m + log n)

# of Mistakes <= m + log n + O( sqrt(m log n))