CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/

Slides:

Advertisements

Similar presentations

Linear Inverse Problems

Advertisements

1 EMT 101 – Engineering Programming Dr. Farzad Ismail School of Aerospace Engineering Universiti Sains Malaysia Nibong Tebal Pulau Pinang Week 10.

Machine Learning and Data Mining Linear regression

Regularization David Kauchak CS 451 – Fall 2013.

Data Modeling and Parameter Estimation Nov 9, 2005 PSCI 702.

Indian Statistical Institute Kolkata

The loss function, the normal equation,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Lecture 11: Recursive Parameter Estimation

x – independent variable (input)

CSci 6971: Image Registration Lecture 4: First Examples January 23, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI Dr.

Systems of Non-Linear Equations

Linear Simultaneous Equations

Matrix Mathematics in MATLAB and Excel

Matrix Approach to Simple Linear Regression KNNL – Chapter 5.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Non-Linear Simultaneous Equations

1 Chapter 2 Matrices Matrices provide an orderly way of arranging values or functions to enhance the analysis of systems in a systematic manner. Their.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Collaborative Filtering Matrix Factorization Approach

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Chapter 6 Finding the Roots of Equations

Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.

Model representation Linear regression with one variable

Andrew Ng Linear regression with one variable Model representation Machine Learning.

Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.

Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.

Curve-Fitting Regression

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.

The problem of overfitting

Regularization (Additional)

Newton’s Method, Root Finding with MATLAB and Excel

Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.

Chapter 2-OPTIMIZATION

Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.

Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.

MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.

Support Vector Machines Optimization objective Machine Learning.

1 SYSTEM OF LINEAR EQUATIONS BASE OF VECTOR SPACE.

WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.

MTH108 Business Math I Lecture 20.

Lecture 3: Linear Regression (with One Variable)

A Simple Artificial Neuron

Probabilistic Models for Linear Regression

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Logistic Regression Classification Machine Learning.

Collaborative Filtering Matrix Factorization Approach

Chapter 10. Numerical Solutions of Nonlinear Systems of Equations

Nonlinear regression.

The loss function, the normal equation,

Support Vector Machine I

Mathematical Foundations of BME Reza Shadmehr

Softmax Classifier.

Linear Discrimination

Multiple features Linear Regression with multiple variables

Multiple features Linear Regression with multiple variables

Linear regression with one variable

Reinforcement Learning (2)

Presentation transcript:

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: 5: Multivariate Regression 1 CSC M.A. Papalaskari - Villanova University T he slides in this presentation are adapted from: Andrew Ng’s ML course

Regression topics so far Introduction to linear regression Intuition – least squares approximation Intuition – gradient descent algorithm Hands on: Simple example using excel How to apply gradient descent to minimize the cost function for regression linear algebra refresher CSC M.A. Papalaskari - Villanova University 2

What’s next? Multivariate regression Gradient descent revisited – Feature scaling and normalization – Selecting a good value for α Non-linear regression Solving for analytically (Normal Equation) Using Octave to solve regression problems CSC M.A. Papalaskari - Villanova University 3

Size (feet 2 )Number of bedrooms Number of floors Age of home (years) Price ($1000) What’s next? We are not in univariate regression anymore: 4 CSC M.A. Papalaskari - Villanova University

Andrew Ng Size (feet 2 )Number of bedrooms Number of floors Age of home (years) Price ($1000) …………… Multiple features (variables). CSC M.A. Papalaskari - Villanova University 5

Andrew Ng Size (feet 2 )Number of bedrooms Number of floors Age of home (years) Price ($1000) …………… Multiple features (variables). Notation: = number of features = input (features) of training example. = value of feature in training example. CSC M.A. Papalaskari - Villanova University 6

Andrew Ng Size (feet 2 )Price ($1000) …… Multiple features (variables). CSC M.A. Papalaskari - Villanova University 7

8 For convenience of notation, define. Multivariate linear regression Hypothesis: Previously: Now:

CSC M.A. Papalaskari - Villanova University 9 Hypothesis: Cost function: Parameters: (simultaneously update for every ) Repeat Gradient descent:

CSC M.A. Papalaskari - Villanova University 10 (simultaneously update ) Gradient Descent Repeat Previously (n=1):

CSC M.A. Papalaskari - Villanova University 11 (simultaneously update ) Gradient Descent Repeat Previously (n=1): New algorithm : Repeat (simultaneously update for )

CSC M.A. Papalaskari - Villanova University 12 (simultaneously update ) Gradient Descent Repeat Previously (n=1): New algorithm : Repeat (simultaneously update for )

CSC M.A. Papalaskari - Villanova University 13 E.g. = size ( feet 2 ) = number of bedrooms (1-5 ) Feature Scaling Idea: Make sure features are on a similar scale. size (feet 2 ) number of bedrooms Get every feature into range

CSC M.A. Papalaskari - Villanova University 14 E.g. = size ( feet 2 ) = number of bedrooms (1-5 ) Feature Scaling Idea: Make sure features are on a similar scale. Replace with to make features have approximately zero mean (Do not apply to ). Mean normalization E.g.

CSC M.A. Papalaskari - Villanova University 15 Gradient descent -“Debugging”: How to make sure gradient descent is working correctly. -How to choose learning rate.

CSC M.A. Papalaskari - Villanova University 16 No. of iterations Making sure gradient descent is working correctly. -For sufficiently small, should decrease on every iteration. -But if is too small, gradient descent can be slow to converge. Declare convergence if decreases by less than in one iteration?

CSC M.A. Papalaskari - Villanova University 17 Summary: Choosing -If is too small: slow convergence. -If is too large: may not decrease on every iteration; may not converge. To choose, try

Andrew Ng Housing prices prediction CSC M.A. Papalaskari - Villanova University 18

Andrew Ng Polynomial regression Price (y) Size (x) CSC M.A. Papalaskari - Villanova University 19

Andrew Ng Choice of features Price (y) Size (x) CSC M.A. Papalaskari - Villanova University 20

Andrew Ng Gradient Descent Normal equation: Method to solve for analytically. CSC M.A. Papalaskari - Villanova University 21

Andrew Ng Intuition: If 1D Solve for (for every ) CSC M.A. Papalaskari - Villanova University 22

Andrew Ng Size (feet 2 )Number of bedrooms Number of floors Age of home (years) Price ($1000) Size (feet 2 )Number of bedrooms Number of floors Age of home (years) Price ($1000) Examples: CSC M.A. Papalaskari - Villanova University 23

Andrew Ng Size (feet 2 )Number of bedrooms Number of floors Age of home (years) Price ($1000) Size (feet 2 )Number of bedrooms Number of floors Age of home (years) Price ($1000) Examples: CSC M.A. Papalaskari - Villanova University 24

Andrew Ng examples ; features. E.g. If CSC M.A. Papalaskari - Villanova University 25

Andrew Ng is inverse of matrix. Octave: pinv(X’*X)*X’*y CSC M.A. Papalaskari - Villanova University 26

Andrew Ng training examples, features. Gradient DescentNormal Equation No need to choose. Don’t need to iterate. Need to choose. Needs many iterations. Works well even when is large. Need to compute Slow if is very large. CSC M.A. Papalaskari - Villanova University 27

CSC M.A. Papalaskari - Villanova University 28 Notes on Supervised learning and Regression Octave Wiki: documentation:

CSC M.A. Papalaskari - Villanova University 29 Exercise For next class: 1.Download and install Octave (Alternative: if you have MATLAB, you can use it instead.) 2.Verify that it is working by typing in an Octave command window: x = [ ] y = [ ] plot(x,y) This example defines two vectors, x y and should display a plot showing a straight line (the line y=2x). If you get an error at this point, it may be that gnuplot is not installed or cannot access your display. If you are unable to get this to work, you can still do the rest of this exercise, because it does not involve any plotting (just restart Octave). You might refer to the Octave wiki for installation help but if you are stuck, you can get some help troubleshooting this on Friday afternoon 3-4pm in the software engineering lab (mendel 159). 3.Create a few matrices and vectors, eg: A = [1 2; 3 4; 5 6] V = [ ] 4.Try some of the elementary matrix and vector operations from our linear algebra slides (adding, multiplying between matrices, vectors and scalars) 5.Print out a log of your session