Model representation Linear regression with one variable

Slides:



Advertisements
Similar presentations
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Advertisements

Indian Statistical Institute Kolkata
Logistic Regression Classification Machine Learning.
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Artificial Neural Networks
CS 4700: Foundations of Artificial Intelligence
Collaborative Filtering Matrix Factorization Approach
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Learning with large datasets Machine Learning Large scale machine learning.
Artificial Neural Networks
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Machine Learning Chapter 4. Artificial Neural Networks
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Andrew Ng Linear regression with one variable Model representation Machine Learning.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Week 1 - An Introduction to Machine Learning & Soft Computing
The problem of overfitting
Regularization (Additional)
M Machine Learning F# and Accord.net.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Chapter 2 Single Layer Feedforward Networks
Logistic Regression William Cohen.
Chapter 2-OPTIMIZATION
Support Vector Machines Optimization objective Machine Learning.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Deep Learning.
Machine Learning & Deep Learning
Machine Learning – Regression David Fenyő
Lecture 3: Linear Regression (with One Variable)
Machine Learning I & II.
10701 / Machine Learning.
A Simple Artificial Neuron
Machine Learning – Regression David Fenyő
Probabilistic Models for Linear Regression
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
Collaborative Filtering Matrix Factorization Approach
Logistic Regression Classification Machine Learning.
Introduction to Machine learning
The loss function, the normal equation,
How do we find the best linear regression line?
Mathematical Foundations of BME Reza Shadmehr
Softmax Classifier.
Machine Learning Algorithms – An Overview
The Math of Machine Learning
Backpropagation David Kauchak CS159 – Fall 2019.
What is machine learning
Logistic Regression Classification Machine Learning.
Shih-Yang Su Virginia Tech
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Logistic Regression Classification Machine Learning.
Linear regression with one variable
Logistic Regression Classification Machine Learning.
Logistic Regression Geoff Hulten.
Presentation transcript:

Model representation Linear regression with one variable Machine Learning

Housing Prices (Portland, OR) (in 1000s of dollars) Size (feet2) Supervised Learning Given the “right answer” for each example in the data. Regression Problem Predict real-valued output

Training set of housing prices (Portland, OR) Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable

Training Set How do we represent h ? Learning Algorithm Size of house Estimated price Linear regression with one variable. Univariate linear regression.

Linear regression with one variable Cost function Machine Learning

Training Set Hypothesis: ‘s: Parameters How to choose ‘s ? Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … Hypothesis: ‘s: Parameters How to choose ‘s ?

Idea: Choose so that is close to for our training examples y x Idea: Choose so that is close to for our training examples

Cost function intuition I Linear regression with one variable Cost function intuition I Machine Learning

Simplified Hypothesis: Parameters: Cost Function: Goal:

(for fixed , this is a function of x) (function of the parameter ) y x

(function of the parameter ) (for fixed , this is a function of x) y x

(function of the parameter ) (for fixed , this is a function of x) y x

Cost function intuition II Linear regression with one variable Cost function intuition II Machine Learning

Hypothesis: Parameters: Cost Function: Goal:

(for fixed , this is a function of x) (function of the parameters ) Price ($) in 1000’s Size in feet2 (x)

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

Linear regression with one variable Gradient descent Machine Learning

Have some function Want Outline: Start with some Keep changing to reduce until we hopefully end up at a minimum

J(0,1) 1 0

J(0,1) 1 0

Gradient descent algorithm Correct: Simultaneous update Incorrect:

Gradient descent intuition Linear regression with one variable Gradient descent intuition Machine Learning

Gradient descent algorithm

If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.

at local optima Current value of

Gradient descent can converge to a local minimum, even with the learning rate α fixed. As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.

Gradient descent for linear regression Linear regression with one variable Gradient descent for linear regression Machine Learning

Gradient descent algorithm Linear Regression Model

Gradient descent algorithm update and simultaneously

J(0,1) 1 0

J(0,1) 1 0

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

“Batch”: Each step of gradient descent uses all the training examples. “Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples.