Download presentation

1
**Crash Course on Machine Learning**

Several slides from Luke Xettlemoyer, Carlos Guestrin and Ben Taskar

2
**Typical Paradigms of Recognition**

Feature Computation Model

3
Visual Recognition Identification Classification Is this your car?

4
Visual Recognition Verification Classification Is this a car?

5
**Visual Recognition Classification Classification:**

Is there a car in this picture? Classification

6
**Visual Recognition Classification Structure Learning Detection:**

Where is the car in this picture? Classification Structure Learning

7
**Visual Recognition Classification Activity Recognition:**

What is he doing? What is he doing?

8
Visual Recognition Pose Estimation: Regression Structure Learning

9
**Visual Recognition Structure Learning Object Categorization: Sky**

Person Tree Horse Car Person Bicycle Road

10
**Visual Recognition Classification Structure Learning Segmentation Sky**

Tree Car Person

11
**What kind of problems? Classification Regression Structure Learning**

Generative vs. Discriminative Supervised, unsupervised, semi-supervised, weakly supervised Linear, nonlinear Ensemble methods Probabilistic Regression Linear regression Structured output regression Structure Learning Graphical Models Margin based approaches

12
**What kind of problems? Classification Regression Structure Learning**

Generative vs. Discriminative Supervised, unsupervised, semi-supervised, weakly supervised Linear, nonlinear Ensemble methods Probabilistic Regression Linear regression Structured output regression Structure Learning Graphical Models Margin based approaches

13
**What kind of problems? Classification Regression Structure Learning**

Generative vs. Discriminative Supervised, unsupervised, semi-supervised, weakly supervised Linear, nonlinear Ensemble methods Probabilistic Regression Linear regression Structured output regression Structure Learning Graphical Models Margin based approaches

14
**What kind of problems? Classification Regression Structure Learning**

Generative vs. Discriminative Supervised, unsupervised, semi-supervised, weakly supervised Linear, nonlinear Ensemble methods Probabilistic Regression Linear regression Structured output regression Structure Learning Graphical Models Margin based approaches

15
**Let’s play with probability for a bit Remembering simple stuff**

16
**Thumbtack & Probabilities**

P(Heads) = , P(Tails) = 1- Flips are i.i.d.: Independent events Identically distributed according to Binomial distribution Sequence D of H Heads and T Tails … D={xi | i=1…n}, P(D | θ ) = ΠiP(xi | θ )

17
**Maximum Likelihood Estimation**

Data: Observed set D of H Heads and T Tails Hypothesis: Binomial distribution Learning: finding is an optimization problem What’s the objective function? MLE: Choose to maximize probability of D

18
Parameter learning Set derivative to zero, and solve!

19
**But, how many flips do I need?**

3 heads and 2 tails. = 3/5, I can prove it! What if I flipped 30 heads and 20 tails? Same answer, I can prove it! What’s better? Umm… The more the merrier???

20
**A bound (from Hoeffding’s inequality)**

For N =H+T, and Let * be the true parameter, for any >0: N Prob. of Mistake Exponential Decay! Hoeffding’s inequality: provides a bound on the probability that a sum of random variables deviates from its expectation.

21
**What if I have prior beliefs?**

Wait, I know that the thumbtack is “close” to What can you do for me now? Rather than estimating a single , we obtain a distribution over possible values of In the beginning After observations Observe flips e.g.: {tails, tails}

22
**How to use Prior Use Bayes rule: Or equivalently:**

Data Likelihood Posterior Normalization Or equivalently: Also, for uniform priors: Only made it to Gaussians. Probability review was pretty painful, look 15+ minutes. reduces to MLE objective

23
**Beta prior distribution – P()**

Likelihood function: Posterior:

24
**MAP for Beta distribution**

MAP: use most likely parameter:

25
**What about continuous variables?**

26
**We like Gaussians because**

Affine transformation (multiplying by scalar and adding a constant) are Gaussian X ~ N(,2) Y = aX + b Y ~ N(a+b,a22) Sum of Gaussians is Gaussian X ~ N(X,2X) Y ~ N(Y,2Y) Z = X+Y Z ~ N(X+Y, 2X+2Y) Easy to differentiate

27
**Learning a Gaussian Collect a bunch of data Learn parameters**

xi i = Exam Score 85 1 95 2 100 3 12 … 99 89 Collect a bunch of data Hopefully, i.i.d. samples e.g., exam scores Learn parameters Mean: μ Variance: σ

28
**MLE for Gaussian: Prob. of i.i.d. samples D={x1,…,xN}:**

Log-likelihood of data:

29
**MLE for mean of a Gaussian**

What’s MLE for mean?

30
MLE for variance Again, set derivative to zero:

31
**Learning Gaussian parameters**

MLE:

32
**MAP Conjugate priors Prior for mean: Mean: Gaussian prior**

Variance: Wishart Distribution Prior for mean:

33
**Supervised Learning: find f**

Given: Training set {(xi, yi) | i = 1 … n} Find: A good approximation to f : X Y What is x? What is y?

34
**Simple Example: Digit Recognition**

Input: images / pixel grids Output: a digit 0-9 Setup: Get a large collection of example images, each labeled with a digit Note: someone has to hand label all this data! Want to learn to predict labels of new, future digit images Features: ? 1 2 1 Screw You, I want to use Pixels :D ??

35
**Lets take a probabilistic approach!!!**

Can we directly estimate the data distribution P(X,Y)? How do we represent these? How many parameters? Prior, P(Y): Suppose Y is composed of k classes Likelihood, P(X|Y): Suppose X is composed of n binary features

36
**Conditional Independence**

X is conditionally independent of Y given Z, if the probability distribution for X is independent of the value of Y, given the value of Z e.g., Equivalent to:

37
**Naïve Bayes Naïve Bayes assumption:**

Features are independent given class: More generally:

38
**The Naïve Bayes Classifier**

Given: Prior P(Y) n conditionally independent features X given the class Y For each Xi, we have likelihood P(Xi|Y) Decision rule: Y X1 X2 Xn

39
A Digit Recognizer Input: pixel grids Output: a digit 0-9

40
**Naïve Bayes for Digits (Binary Inputs)**

Simple version: One feature Fij for each grid position <i,j> Possible feature values are on / off, based on whether intensity is more or less than 0.5 in underlying image Each input maps to a feature vector, e.g. Here: lots of features, each is binary valued Naïve Bayes model: Are the features independent given class? What do we need to learn?

41
**Example Distributions**

1 0.1 2 3 4 5 6 7 8 9 1 0.01 2 0.05 3 4 0.30 5 0.80 6 0.90 7 8 0.60 9 0.50 1 0.05 2 0.01 3 0.90 4 0.80 5 6 7 0.25 8 0.85 9 0.60

42
**MLE for the parameters of NB**

Given dataset Count(A=a,B=b) number of examples where A=a and B=b MLE for discrete NB, simply: Prior: Likelihood:

43
**Violating the NB assumption**

Usually, features are not conditionally independent: NB often performs well, even when assumption is violated [Domingos & Pazzani ’96] discuss some conditions for good performance

44
Smoothing 2 wins!! Does this happen in vision?

45
NB & Bag of words model

46
**What about real Features? What if we have continuous Xi ?**

Eg., character recognition: Xi is ith pixel Gaussian Naïve Bayes (GNB): Sometimes assume variance is independent of Y (i.e., i), or independent of Xi (i.e., k) or both (i.e., )

47
**Estimating Parameters**

Maximum likelihood estimates: Mean: Variance: jth training example (x)=1 if x true, else 0

48
**What you need to know about Naïve Bayes**

Naïve Bayes classifier What’s the assumption Why we use it How do we learn it Why is Bayesian estimation important Bag of words model Gaussian NB Features are still conditionally independent Each feature has a Gaussian distribution given class Optimal decision using Bayes Classifier

49
**another probabilistic approach!!!**

Naïve Bayes: directly estimate the data distribution P(X,Y)! challenging due to size of distribution! make Naïve Bayes assumption: only need P(Xi|Y)! But wait, we classify according to: maxY P(Y|X) Why not learn P(Y|X) directly?

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google