Lecture 9: Smoothing and filtering data

Slides:



Advertisements
Similar presentations
What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.
Advertisements

Time series modelling and statistical trends
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
11/11/02 IDR Workshop Dealing With Location Uncertainty in Images Hasan F. Ates Princeton University 11/11/02.
Flexible smoothing with B-splines and Penalties or P-splines P-splines = B-splines + Penalization Applications : Generalized Linear and non linear Modelling.
RECITATION 1 APRIL 14 Lasso Smoothing Parameter Selection Splines.
Interpolation A method of constructing a function that crosses through a discrete set of known data points. .
Introduction to Smoothing Splines
Model assessment and cross-validation - overview
Basic geostatistics Austin Troy.
Data mining and statistical learning - lecture 6
MATH 685/ CSI 700/ OR 682 Lecture Notes
Selected from presentations by Jim Ramsay, McGill University, Hongliang Fei, and Brian Quanz Basis Basics.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Vector Generalized Additive Models and applications to extreme value analysis Olivier Mestre (1,2) (1) Météo-France, Ecole Nationale de la Météorologie,
1 Curve-Fitting Spline Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.
CITS2401 Computer Analysis & Visualisation
Basis Expansion and Regularization
ES 240: Scientific and Engineering Computation. InterpolationPolynomial  Definition –a function f(x) that can be written as a finite series of power functions.
P. Brigger, J. Hoeg, and M. Unser Presented by Yu-Tseh Chi.
Curve Fitting Variations and Neural Data Julie Michelman – Carleton College Jiaqi Li – Lafayette College Micah Pearce – Texas Tech University Advisor:
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Kernel methods - overview
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 12a, April 21, 2015 Revisiting Regression – local models, and non-parametric…
統計計算與模擬 政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日 第十六週:估計密度函數
Curve-Fitting Regression
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
Nonparametric Smoothing Methods and Model Selections T.C. Lin Dept. of Statistics National Taipei University 5/4/2005.
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Biostatistics-Lecture 14 Generalized Additive Models Ruibin Xi Peking University School of Mathematical Sciences.
Classification and Prediction: Regression Analysis
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
CpE- 310B Engineering Computation and Simulation Dr. Manal Al-Bzoor
Chapter 12 Multiple Regression and Model Building.
Dr. Richard Young Optronic Laboratories, Inc..  Uncertainty budgets are a growing requirement of measurements.  Multiple measurements are generally.
V. Space Curves Types of curves Explicit Implicit Parametric.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Splines Vida Movahedi January 2007.
Curve-Fitting Regression
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
1 Estimating the Term Structure of Interest Rates for Thai Government Bonds: A B-Spline Approach Kant Thamchamrassri February 5, 2006 Nonparametric Econometrics.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Lecture 6: Point Interpolation
Optimal Path Planning Using the Minimum-Time Criterion by James Bobrow Guha Jayachandran April 29, 2002.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.
Forecast 2 Linear trend Forecast error Seasonal demand.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
Lecture 29: Modeling Data. Data Modeling Interpolate between data points, using either linear or cubic spline models Model a set of data points as a polynomial.
Matlab Training Session 11: Nonlinear Curve Fitting
Chapter 11 LOcally-WEighted Scatterplot Smoother (Lowess)
Curve-Fitting Spline Interpolation
Machine learning, pattern recognition and statistical data modelling
Bias and Variance of the Estimator
Human Growth: From data to functions
Spline Interpolation Class XVII.
The general linear model and Statistical Parametric Mapping
Basis Expansions and Generalized Additive Models (2)
政治大學統計系余清祥 2004年5月26日~ 6月7日 第十六、十七週:估計密度函數
Data conditioning, filtering, smoothing, interpolation, regularization
Presentation transcript:

Lecture 9: Smoothing and filtering data

Time series: smoothing, filtering, rejecting outliers, interpolation moving average, splines, penalized splines, wavelets autocorrelation in time series variance increase, pattern generation; ar(), arima() … Image data

-- OMS -- QCLS

sig=5 x0=1:100; y0=1/(sig*sqrt(2*pi))*exp(-(x0-50)^2/(2*sig^2)) plot(x0,y0,type="l",col="green",lwd=3,ylim=c(-.02,.1)) #add noise to y0 x=x0; y=y0+rnorm(100)/50 points(x,y,pch=16,type="o”)

--- 5 pt moving average

--- 30 pt moving average

Some signal filtering concepts:

“What is” “feed-forward”? particular example, feed forward is not too severe, but it can be “What is” “feed-forward”?

More advanced filters. Splines: Splines use a collection of basis functions (usually polynomials of order 3 or 4) to represent a functional form for the time series to be filtered. They are fitted piecewise, so that they are locally determined. We choose K points in the interior of the domain (“knots”) and subdivide into K+1 intervals. spline of order m: piecewise m – 1 degree polynomial, continuous thru m – 2 derivatives. Continuous derivatives gives a smooth function. More complex shapes emerge as we increase the degree of the spline and/or add knots. Few knots/low degree: Functions may be too restrictive (biased) or smooth Many knots/high degree: Risk of overfitting, false maxima, etc Penalized Splines add a penalty for curvature, specifying the strength λ. (=0, regular spline/interpolation; = ∞, straight line, linear regression fit)

More advanced filters (continued). Locally-weighted least-squares (“lowess”, “loess”): fit a polynomial (usually a straight line) to points in a sliding window, accepting as the smoothed value the central point on the line, with a taper to capture the ends. Points are usually weighted inversely as a function of distance, very often tri-cubic: (1 - |x|3)3 <in range -1,1 of the window> Savitsky-Golay filter: Fits a polynomial of order n in a moving window, requiring that the fitted curve at each point have the same moments as the original data to order n-1. Partakes of lowess and penalized spline features. (Designed for integrating chromatographic peaks.) Nomencature: 4.11.11.0 ( n.nl.nr.o). Allows direct computation of the derivatives. Parameters are tabulated on the web or computed. add Gaussian wavelets and Haar wavelets and first derivative Gaussian wavelets

sig noisy_sig 10-point MA savgol.4.11.11.0 lowess pspline supsmu 0.010 0.012 NA NA 0.0117 0.012 0.0124 0.010 0.012 0.0132 0.0155 0.0117 0.012 0.0124

rough! noise is reduced….

others not worth trying… others not worth trying….You expect attenuation, within that envelop, this is OK – penalized spline wins

#Summary: #X Moving Average: crude, phase shift, peaks severely flattened, ends discarded <Don't use> ## Centered Moving Average: crude, peaks severely flattened, no phase shift*, feed forward >, ends discarded ## Block Averages: not too crude, not phase shifted*, no feed forward*, conserved properties*, information discarded (Maybe OK) ##Savitzky-Golay: not crude, not phase shifted*, small feed forward (localized), conserved properties, ends discarded; derivative ##locally weighted least squares (lowess/loess): not crude or phase shifted, nice taper at ends, no derivative ##supsmu: analytical properties murky, but a nice smoother for many signals; no derivative ##penalized splines: effective, differentiable; adjusting the parameters may be tricky #Xregular splines: either false maxima, or oversmoothed--<Don't use> Packages: pspline; sm; sreg (fields);

Assessing different sources of variance: EPS 236 Workshop: 2014 Assessing different sources of variance: Extracting Trends, Cycles, etc by Data Filtering and Conditional Averaging. CO2 Measurement has high signal-to-noise ratio, but the system (e.g. the atmosphere) has a lot of variability. Measurement has low signal-to-noise ratio.

“Ancillary measurements”, conditional sampling and suitable filtering or averaging reveals the key features of the data when system variability is the key factor. Zum=tapply(wlef[,"value"],list(wlef[,"yr"],wlef[,"mo"],wlef[,"hr"],wlef[,"ht(magl)"]),median,na.rm=T)

Noisy data: which filter is the “best” (for what purpose?)? Residuals? Events ?

If spar is given: Leave-one-out cross-validation In the default mode, the sm.spline model is selected using “leave-one-out cross-validation”. See article by Rob Hyndman (http://robjhyndman.com/hyndsight/crossvalidation/) for a description. Kalman filter

Interpolation: linear (approx; predict.loess) penalized splines (akima’s aspline)

XX=HIPPO.1.1[lsel&l.uct,"UTC"] YY=HIPPO.1.1[lsel&l.uct,"CO2_OMS"] ZZ=HIPPO.1.1[lsel&l.uct,"CO2_QCLS"] YY[1379:1387] = NA require(pspline) lna1=!is.na(YY) YY.i=approx(x=XX[lna1],y=YY[lna1],xout=XX) YY.spl=sm.spline(XX[lna1],YY[lna1]) require(akima) YY.aspline= aspline(XX[lna1],YY[lna1],xout=XX) #YY.lowess=lowess(XX[lna1],YY[lna1],f=.1) ddd=data.frame(x=XX[lna1],y=YY[lna1]) YY.loess=loess(y ~ x,data=ddd,span=.055) YY.loess.pred=predict(YY.loess,newdata=data.frame(x=XX,y=YY))

Minimize CV for “best” model “Leave-one-out” CV Source: http://robjhyndman.com/hyndsight/crossvalidation/ Minimize CV for “best” model