RECITATION 1 APRIL 9 Polynomial regression Ridge regression Lasso.

Slides:

Advertisements

Similar presentations

Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )

Advertisements

Neural Networks and SVM Stat 600. Neural Networks History: started in the 50s and peaked in the 90s Idea: learning the way the brain does. Numerous applications.

Regularization David Kauchak CS 451 – Fall 2013.

Penalized Regression, Part 2

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Chapter 27 Inferences for Regression This is just for one sample We want to talk about the relation between waist size and %body fat for the complete population.

RECITATION 1 APRIL 14 Lasso Smoothing Parameter Selection Splines.

A Casual Chat on Convex Optimization in Machine Learning Data Mining at Iowa Group Qihang Lin 02/09/2014.

Chapter 2: Lasso for linear models

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Data mining in 1D: curve fitting

Lecture 4: Embedded methods

The loss function, the normal equation,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.

Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:

Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.

The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.

Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.

Collaborative Filtering Matrix Factorization Approach

Spline and Kernel method Gaussian Processes

Efficient Model Selection for Support Vector Machines

Learning with large datasets Machine Learning Large scale machine learning.

Mathematical formulation XIAO LIYING. Mathematical formulation.

1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.

Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Non-Bayes classifiers. Linear discriminants, neural networks.

Statistical Analysis with Big Data Dr. Fred Oswald Rice University CARMA webcast November 6, 2015 University of South Florida - Tampa, FL 1.

Statistical Analysis with Big Data

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Optimization of functions of one variable (Section 2)

Recitation4 for BigData Jay Gu Feb LASSO and Coordinate Descent.

Data analysis tools Subrata Mitra and Jason Rahman.

Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.

RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.

Regularized Least-Squares and Convex Optimization.

Matt Gormley Lecture 5 September 14, 2016

Neural networks and support vector machines

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

ECE 5424: Introduction to Machine Learning

A Fast Trust Region Newton Method for Logistic Regression

Boosting and Additive Trees (2)

Inference for Regression

CSE 4705 Artificial Intelligence

Boosting and Additive Trees

Generalization and adaptivity in stochastic convex optimization

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Probabilistic Models for Linear Regression

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Collaborative Filtering Matrix Factorization Approach

Lasso/LARS summary Nasimeh Asgarian.

لجنة الهندسة الكهربائية

Logistic Regression & Parallel SGD

The Bias Variance Tradeoff and Regularization

Biointelligence Laboratory, Seoul National University

The loss function, the normal equation,

Support Vector Machine I

Mathematical Foundations of BME Reza Shadmehr

Linear Discrimination

Preview to 6.7: Graphs of Polynomial

Batch Normalization.

Stochastic Methods.

Logistic Regression Geoff Hulten.

Presentation transcript:

RECITATION 1 APRIL 9 Polynomial regression Ridge regression Lasso

Polynomial regression lm( y ~ poly(x, degree = d), data=dataset) Find the optimal degree Check the residual plots Training and test set Cross-validation R demo 1

Ridge regression – R package lm.ridge() in library(“MASS”) lm.ridge( y ~., data = dataset, lambda = seq(0, 0.01, by=0.001) ) R demo 2

Ridge regression – from sketch Ridge regression estimators have closed form solutions: How to deal with intercept? Tuning parameter: Effective degrees of freedom Implement: HW 2

Lasso – R package l1ce() in library(“lasso2”) or lars() in library(“lars”) l1ce( y ~., data = dataset, bound = shrinkage.factor) Lasso doesn’t have EDF (why?). We can use the shrinkage factor to get a sense of the penalty. R demo 3

Lasso – from sketch Shooting algorithm (stochastic gradient descent) At each iteration, randomly sample one dimension j, and update How to deal with intercept Center x and y Standardize x Tuning parameter Shrinkage factor for a given Convergence criterion Implement: HW 2 Bonus problem