RECITATION 1 APRIL 9 Polynomial regression Ridge regression Lasso.

Slides:



Advertisements
Similar presentations
Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
Advertisements

Neural Networks and SVM Stat 600. Neural Networks History: started in the 50s and peaked in the 90s Idea: learning the way the brain does. Numerous applications.
Regularization David Kauchak CS 451 – Fall 2013.
Penalized Regression, Part 2
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Chapter 27 Inferences for Regression This is just for one sample We want to talk about the relation between waist size and %body fat for the complete population.
RECITATION 1 APRIL 14 Lasso Smoothing Parameter Selection Splines.
A Casual Chat on Convex Optimization in Machine Learning Data Mining at Iowa Group Qihang Lin 02/09/2014.
Chapter 2: Lasso for linear models
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Data mining in 1D: curve fitting
Lecture 4: Embedded methods
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.
Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.
Collaborative Filtering Matrix Factorization Approach
Spline and Kernel method Gaussian Processes
Efficient Model Selection for Support Vector Machines
Learning with large datasets Machine Learning Large scale machine learning.
Mathematical formulation XIAO LIYING. Mathematical formulation.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Non-Bayes classifiers. Linear discriminants, neural networks.
Statistical Analysis with Big Data Dr. Fred Oswald Rice University CARMA webcast November 6, 2015 University of South Florida - Tampa, FL 1.
Statistical Analysis with Big Data
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Optimization of functions of one variable (Section 2)
Recitation4 for BigData Jay Gu Feb LASSO and Coordinate Descent.
Data analysis tools Subrata Mitra and Jason Rahman.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
Regularized Least-Squares and Convex Optimization.
Matt Gormley Lecture 5 September 14, 2016
Neural networks and support vector machines
StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
ECE 5424: Introduction to Machine Learning
Deep Learning.
A Fast Trust Region Newton Method for Logistic Regression
Boosting and Additive Trees (2)
Inference for Regression
CSE 4705 Artificial Intelligence
Boosting and Additive Trees
Generalization and adaptivity in stochastic convex optimization
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Probabilistic Models for Linear Regression
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Collaborative Filtering Matrix Factorization Approach
Lasso/LARS summary Nasimeh Asgarian.
لجنة الهندسة الكهربائية
Logistic Regression & Parallel SGD
The Bias Variance Tradeoff and Regularization
Biointelligence Laboratory, Seoul National University
The loss function, the normal equation,
Support Vector Machine I
Mathematical Foundations of BME Reza Shadmehr
Linear Discrimination
Preview to 6.7: Graphs of Polynomial
Batch Normalization.
Stochastic Methods.
Logistic Regression Geoff Hulten.
Presentation transcript:

RECITATION 1 APRIL 9 Polynomial regression Ridge regression Lasso

Polynomial regression lm( y ~ poly(x, degree = d), data=dataset) Find the optimal degree Check the residual plots Training and test set Cross-validation R demo 1

Ridge regression – R package lm.ridge() in library(“MASS”) lm.ridge( y ~., data = dataset, lambda = seq(0, 0.01, by=0.001) ) R demo 2

Ridge regression – from sketch Ridge regression estimators have closed form solutions: How to deal with intercept? Tuning parameter: Effective degrees of freedom Implement: HW 2

Lasso – R package l1ce() in library(“lasso2”) or lars() in library(“lars”) l1ce( y ~., data = dataset, bound = shrinkage.factor) Lasso doesn’t have EDF (why?). We can use the shrinkage factor to get a sense of the penalty. R demo 3

Lasso – from sketch Shooting algorithm (stochastic gradient descent) At each iteration, randomly sample one dimension j, and update How to deal with intercept Center x and y Standardize x Tuning parameter Shrinkage factor for a given Convergence criterion Implement: HW 2 Bonus problem