What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

Topic 12: Multiple Linear Regression

The Simple Regression Model

Hypothesis Testing Steps in Hypothesis Testing:

Linear regression models

Ch11 Curve Fitting Dr. Deshi Ye

Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.

The General Linear Model. The Simple Linear Model Linear Regression.

Simple Linear Regression

A Short Introduction to Curve Fitting and Regression by Brad Morantz

Data mining and statistical learning - lecture 6

Chapter 12 Simple Linear Regression

LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.

LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.

Chapter 10 Simple Regression.

Petter Mostad Linear regression Petter Mostad

SIMPLE LINEAR REGRESSION

Chapter Topics Types of Regression Models

Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)

Ch. 14: The Multiple Regression Model building

Introduction to Regression Analysis, Chapter 13,

Relationships Among Variables

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.

Marketing Research Aaker, Kumar, Day and Leone Tenth Edition

Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.

CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.

Simple Linear Regression Models

1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.

1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.

1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.

Ch4 Describing Relationships Between Variables. Pressure.

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.

1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.

© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.

Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Simple linear regression Tron Anders Moger

Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.

Correlation & Regression Analysis

Machine Learning 5. Parametric Methods.

Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.

1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 

The “Big Picture” (from Heath 1995). Simple Linear Regression.

STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.

Data Modeling Patrice Koehl Department of Biological Sciences

The simple linear regression model and parameter estimation

Lecture 11: Simple Linear Regression

Regression Analysis AGEC 784.

Correlation and Simple Linear Regression

Quantitative Methods Simple Regression.

Correlation and Simple Linear Regression

CHAPTER 29: Multiple Regression*

Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae

Regression Models - Introduction

Discrete Event Simulation - 4

Hypothesis testing and Estimation

Linear regression Fitting a straight line to observations.

Correlation and Simple Linear Regression

Simple Linear Regression

Simple Linear Regression and Correlation

Simple Linear Regression

Product moment correlation

Algebra Review The equation of a straight line y = mx + b

Introduction to Regression

F test for Lack of Fit The lack of fit test..

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression

Presentation transcript:

What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao

Review of the Statistical Model We take measurements of the beam displacement, y, at times t 1,…,t n What we actually observe is which is a noisy version of y, –or is the error resulting from imperfect measurement at time t j

A Statistical Model For Displacement Under the spring model it is assumed that the displacement over time is governed by the spring model: So we could write:

A Statistical Model for Displacement Remember the assumptions: –Data at different time points are independent –Residuals are normally distributed –Residuals variance is constant over time With these assumptions the model can be written:

What if we have Replicates? We may have repeated measurements of the same beam Notation: Let t ij be the j th time point; then i indexes the repeats at t j Denote the repeated measurement of the beam displacement at tj by i (t j )= ij. If we believe C and K are the same across replicates then we may write the model as: independent over time and replicate

The likelihood Because of our independence assumption, the likelihood of the model (which we think of as a function of the parameters C and K, not the data) is the product of the individual density functions evaluated at the data : where N(x;, 2 ) denotes the normal density with mean and variance 2 evaluated at x.

Maximum Likelihood Estimates The Maximum Likelihood Estimates (MLEs) for C and K, denoted and, are the values of C and K that maximize the likelihood function Given 2 known, the MLEs are the same as what youd get with a Least-Squares procedure (the former tends to justify the use of the latter)

How Good is the Model? We can asses the goodness of fit by using the spring model to predict the observed measurements The predicted (AKA fitted) values, ŷ(t) are obtained by evaluating the spring model with and : We can compare the fitted values at the observed times ŷ(t ij ) = ŷ ij to the observed values ij. Run the MATLAB file inv_beam.m by typing: > inv_beam

Spring Model

Model Residuals We need to know the difference between beam displacement data and our models predictions for the beam displacement: These are called the model residuals The residuals are our best guess for the values of ij. Hence from our current model we would expect the e ij to look independent and normally distributed with constant variance

Spring Model Residuals Are the residuals normally distributed? Are they independent (i.e., is there correlation in time)? Is their variance constant? Run the MATLAB file plotresidual.m by typing: > plotresidual

Spring Model Residuals

Spring Model QQ-plot

Coefficient of Determination R 2 One criteria to use when judging a model is the fraction of the variability in the data it can explain: SSTot is the total variability in the data SSE is the variability left over after fitting the model (SSE SSTot) So R 2 represents the fraction of variability in the data that is explained by the model

Coefficient of Determination R 2 In the example shown above we can find that: This means the spring model accounts for about 52% of the variability in our displacement measurements Is 52% a lot?

Coefficient of Determination R 2 Brief aside: note that we can also get an estimate of 2 from SSE: where n is the number of data points and df is the number of degrees of freedom (AKA the number of unknown parameters)

How Good is this Model? Is R 2 = 52% any good? It depends. It can be useful (or even necessary) to set up a naïve straw man alternative against which to compare a physical model There are many possible alternatives and choosing between them is subjective To illustrate we will use a smoothing spline alternative

Smoothing Splines A cubic spline is a function that is a piecewise cubic polynomial: –Between each sequential pair of time points the function is a cubic polynomial –At each time point the function is continuous and has continuous first and second derivatives –The time points are called knots

Smoothing Splines A smoothing spline is a type of cubic spline where: –The time points are specifically those at which measurements are made, t j –Given (y j,t j ) for all j, is determined to be that cubic spline that minimizes –Here is called the smoothing parameter What happens to the smoothness as α?

Smoothing Splines The value of parameterizes the relative importance of smoothness to fit –Larger values of result in a bigger penalty for curvature and hence results in a smoother fit that may not fit the data –Smaller values of result in a wigglier spline that more closely follows the data Straight Line Exact Interpolator

Smoothing Splines But how do we choose the value of ? Another choice without an objectively correct answer! One useful answer is the value that minimizes the leave-one-out predictive error: –Fit the spline to all the displacement data except one point –Use the spline to predict the displacement at this time point –Repeat over all displacement points and sum the residual errors This is called leave-one-out cross-validation

Fitted Spline

Fitted Spline Residuals Are the residuals normally distributed? Are they independent (i.e., is there correlation in time)? Is their variance constant? You can make your own cross-validated spline using the MATLAB file splineplot.m Dont do it now!!!

Spline Residuals

Spline QQ-plot

Compare the Spring Model to Splines If you compare residuals, those for the spline are generally smaller (i.e., it fits the data better) Spline Coefficient of variation is Our spring model explains less of the variation than does a naive spline (52% < 88%)

Compare the Spring Model to Splines Is the difference big enough to reject the use of the spring model? Again, this is subjective, but can use statistical tests to answer the question as objectively as possible Such tests are beyond the scope of this workshop, but if you are interested in supercharging your group project using them please ask me

Good Luck!!!