Machine Learning 10601 Recitation 6 Sep 30, 2009 Oznur Tastan.

Slides:



Advertisements
Similar presentations
Machine Learning Math Essentials Part 2
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
Linear Regression.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
What is Statistical Modeling
x – independent variable (input)
Classification and risk prediction
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensional reduction, PCA
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
Lecture II-2: Probability Review
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
Modern Navigation Thomas Herring
Generalized Linear Models
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
Today Wrap up of probability Vectors, Matrices. Calculus
Review of Lecture Two Linear Regression Normal Equation
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Lecture 2: Statistical learning primer for biologists
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Review of statistical modeling and probability theory Alan Moses ML4bio.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
Geology 6600/7600 Signal Analysis 04 Sep 2014 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
CH 5: Multivariate Methods
Regression.
ECE 5424: Introduction to Machine Learning
Generalized Linear Models
Distributions and Concepts in Probability Theory
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
ECE 417 Lecture 4: Multivariate Gaussians
Machine Learning Math Essentials Part 2
EE513 Audio Signals and Systems
OVERVIEW OF LINEAR MODELS
Generally Discriminant Analysis
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Test #1 Thursday September 20th
Presentation transcript:

Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan

Outline Multivariate Gaussians Logistic regression

Multivariate Gaussians (or "multinormal distribution“ or “multivariate normal distribution”) Multivariate case: Vector of observations x, vector of means  and covariance matrix  Univariate case: single mean  and variance  Dimension of xDeterminant

Multivariate Gaussians do not depend on x normalization constants Multivariate case Univariate case depends on x and positive

The mean vector

Covariance of two random variables Recall for two random variables x i, x j

The covariance matrix transpose operator Var(x m )=Cov(x m, x m )

An example: 2 variate case The pdf of the multivariate will be: Covariance matrix Determinant

An example: 2 variate case Recall in general case independence implies uncorrelation but uncorrelation does not necessarily implies independence. Multivariate Gaussians is a special case where uncorrelation implies independence as well. Factorized into two independent Gaussians! They are independent!

Diagonal covariance matrix Diagonal matrix: m matrix where off-diagonal terms are zero If all the variables are independent from each other, The covariance matrix will be an diagonal one. Reverse is also true: If the covariance matrix is a diagonal one they are independent

Gaussian Intuitions: Size of   = [0 0]  = [0 0]  = [0 0]  = I  = 0.6 I  = 2 I As  becomes larger, Gaussian becomes more spread out Identity matrix

Gaussian Intuitions: Off-diagonal As the off-diagonal entries increase, more correlation between value of x and value of y

Gaussian Intuitions: off-diagonal and diagonal Decreasing non-diagonal entries (#1-2) Increasing variance of one dimension in diagonal (#3)

Isocontours

Isocontours example We have showed Now let’s try to find for some constant c the isocontour

Isocontours continued

Define Equation of an ellipse Centered on μ 1, μ 2 and axis lengths 2r 1 and 2r 2

We had started with diaogonal matrix In the diagonal covariance matrix case the ellipses will be axis aligned.

Don’t confuse Multivariate Gaussians with Mixtures of Gaussians Mixture of Gaussians: Component Mixing coefficient K=3

Logistic regression Linear regression Outcome variable Y is continuous Logistic regression Outcome variable Y is binary

Logistic function (Logit function) z logit(z) Notice σ(z) is always bounded between [0,1] (a nice property) and as z increase σ(z) approaches 1, as z decreases σ(z) approaches to 0 This term is [0, infinity]

Logistic regression Learn a function to map X values to Y given data The function we try to learn is P(Y|X) X can be continuous or discrete Discrete

Logistic regression

Classification If this holds Y=0 is more probable Than Y=1 given X

Classification Take log both sides Classification rule if this holds Y=0

Logistic regression is a linear classifier Y=0 Y=1 Decision boundary

Classification X1 σ(z)= σ(w 0 +w 1 X 1 )) Notice σ(z) is 0.5 when X1=2 wo=+2, to check evaluate at X1=0 g(z)~0.1 σ(z) is 0.5 when X1=0 to see Classify as Y=0

Estimating the parameters Given data Objective: Train the model to get w that maximizes the conditional likelihood

Difference with Naïve Bayes of Logistic Regression Loss function! Optimize different functions → Obtain different solutions Naïve Bayes argmax P(X|Y) P(Y) Logistic Regression argmax P(Y|X)

Naïve Bayes and Logistic Regression Have a look at the Tom Mitchell’s book chapter Linked under Sep 23 Lecture Readings as well.

Some matlab tips for the last question in HW3 logical function might be useful for dividing into splits. An example of logical in use (please read the Matlab help) S=X(logical(X(:,1)==1),:) this will also work S=X((X(:1)==1,:)) This will subset the portion of the X matrix where the first column has value 1 and will put in matrix S (like Data>Filter in Excel) Matlab has functions for mean, std, sum, inv, log2 Scaling data to zero mean and unit variance: shifting the mean by the mean (subtracting the mean from every element of the vector) and scaling such that it has variance=1 ( dividing the every element of the vector by standard deviation) To be able to do that in matrices. You will need the repmat function, have a look at that otherwise the size of the matrices would not match..etc Elementwise multiplication use.*

References ogReg.pdf Carlos Guestrin lecture notes Andrew Ng lecture notes