Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 12a, April 21, 2015 Revisiting Regression – local models, and non-parametric…

Similar presentations


Presentation on theme: "1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 12a, April 21, 2015 Revisiting Regression – local models, and non-parametric…"— Presentation transcript:

1 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 12a, April 21, 2015 Revisiting Regression – local models, and non-parametric…

2 Why local? 2

3 Sparse? 3

4 Remember this one? 4 How would you apply local methods here?

5 SVM-type One-class-classification: this model tries to find the support of a distribution and thus allows for outlier/novelty detection; epsilon-regression: here, the data points lie in between the two borders of the margin which is maximized under suitable conditions to avoid outlier inclusion; nu-regression: with analogue modifications of the regression model as in the classification case. 5

6 Reminder SVM and margin 6

7 Loss functions… 7 classification outlier regression

8 Regression By using a different loss function called the ε- insensitive loss function ||y−f (x)||ε = max{0, ||y− f(x)|| − ε}, SVMs can also perform regression. This loss function ignores errors that are smaller than a certain threshold ε > 0 thus creating a tube around the true output. 8

9 Example lm v. svm 9

10 10

11 Again SVM in R E1071 - the svm() function in e1071 provides a rigid interface to libsvm along with visualization and parameter tuning methods. kernlab features a variety of kernel-based methods and includes a SVM method based on the optimizers used in libsvm and bsvm Package klaR includes an interface to SVMlight, a popular SVM implementation that additionally offers classification tools such as Regularized Discriminant Analysis. Svmpath – you get the idea… 11

12 Knn is local – right? nearest neighbors is a simple algorithm that stores all available cases and predict the numerical target based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s as a non-parametric technique. 12

13 Distance… A simple implementation of KNN regression is to calculate the average of the numerical target of the K nearest neighbors. Another approach uses an inverse distance weighted average of the K nearest neighbors. Choosing K! KNN regression uses the same distance functions as KNN classification. knn.reg and also in kknn http://cran.r- project.org/web/packages/kknn/kknn.pdfhttp://cran.r- project.org/web/packages/kknn/kknn.pdf 13

14 Classes of local regression Locally (weighted) scatterplot smoothing –LOESS –LOWESS Fitting is done locally - the fit at point x, the fit is made using points in a neighborhood of x, weighted by their distance from x (with differences in ‘parametric’ variables being ignored when computing the distance) 14

15 15

16 Classes of local regression The size of the neighborhood is controlled by α (set by span). For α 1, all points are used, with the ‘maximum distance’ assumed to be α^(1/p) times the actual maximum distance for p explanatory variables. 16

17 Classes of local regression For the default family, fitting is by (weighted) least squares. For family="symmetric" a few iterations of an M-estimation procedure with Tukey's biweight are used. Be aware that as the initial value is the least- squares fit, this need not be a very resistant fit. It can be important to tune the control list to achieve acceptable speed. 17

18 Friedman (supsmu in modreg) is a running lines smoother which chooses between three spans for the lines. The running lines smoothers are symmetric, with k/2 data points each side of the predicted point, and values of k as 0.5 * n, 0.2 * n and 0.05 * n, where n is the number of data points. If span is specified, a single smoother with span span * n is used. 18

19 Friedman The best of the three smoothers is chosen by cross-validation for each prediction. The best spans are then smoothed by a running lines smoother and the final prediction chosen by linear interpolation. “For small samples (n 0) should be used. Reasonable span values are 0.2 to 0.4.” 19

20 Local non-param lplm (in Rearrangement) Local nonparametric method, local linear regression estimator with box kernel (default), for conditional mean functions 20

21 Ridge regression Addresses ill-posed regression problems using filtering approaches (e.g. high-pass) Often called “regularization” lm.ridge (in MASS) 21

22 Quantile regression –is desired if conditional quantile functions are of interest. One advantage of quantile regression, relative to the ordinary least squares regression, is that the quantile regression estimates are more robust against outliers in the response measurements. –In practice we often prefer using different measures of central tendency and statistical dispersion to obtain a more comprehensive analysis of the relationship between variables. 22

23 More… Partial Least Squares Regression (PLSR) mvr (in pls) Principal Component Regression (PCR) Canonical Powered Partial Least Squares (CPPLS) 23

24 PCR creates components to explain the observed variability in the predictor variables, without considering the response variable at all. On the other hand, PLSR does take the response variable into account, and therefore often leads to models that are able to fit the response variable with fewer components. Whether or not that ultimately translates into a better model, in terms of its practical use, depends on the context. 24

25 Splines smooth.spline, splinefun (stats, modreg) and ns (in splines) –http://www.inside-r.org/r-doc/splineshttp://www.inside-r.org/r-doc/splines a numeric function that is piecewise-defined by polynomial functions, and which possesses a sufficiently high degree of smoothness at the places where the polynomial pieces connect (which are known as knots) 25

26 Splines For interpolation, splines are often preferred to polynomial interpolation - they yields similar results to interpolating with higher degree polynomials while avoiding instability due to overfitting Features: simplicity of their construction, their ease and accuracy of evaluation, and their capacity to approximate complex shapes Most common: cubic spline, i.e., of order 3— in particular, cubic B-spline 26

27 cars 27

28 Smoothing/ local … https://web.njit.edu/all_topics/Prog_Lang_Doc s/html/library/modreg/html/00Index.htmlhttps://web.njit.edu/all_topics/Prog_Lang_Doc s/html/library/modreg/html/00Index.html http://cran.r-project.org/doc/contrib/Ricci- refcard-regression.pdfhttp://cran.r-project.org/doc/contrib/Ricci- refcard-regression.pdf 28


Download ppt "1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 12a, April 21, 2015 Revisiting Regression – local models, and non-parametric…"

Similar presentations


Ads by Google