Course round-up subtitle- Statistical model building Marian Scott University of Glasgow Glasgow, Aug 2013.

Slides:



Advertisements
Similar presentations
Statistical model building
Advertisements

Statistical model building Marian Scott Dept of Statistics, University of Glasgow Glasgow, Sept 2007.
Course round-up subtitle- Statistical model building Marian Scott University of Glasgow Glasgow, Aug 2012.
Uncertain models and modelling uncertainty
Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.
Assumptions underlying regression analysis
Properties of Least Squares Regression Coefficients
Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Design of Experiments Lecture I
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Modelling unknown errors as random variables Thomas Svensson, SP Technical Research Institute of Sweden, a statistician working with Chalmers and FCC in.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Sensitivity Analysis In deterministic analysis, single fixed values (typically, mean values) of representative samples or strength parameters or slope.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
BA 555 Practical Business Analysis
458 Model Uncertainty and Model Selection Fish 458, Lecture 13.
Evaluating Hypotheses
CHAPTER 6 Statistical Analysis of Experimental Data
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Experimental Evaluation
How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Decision analysis and Risk Management course in Kuopio
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Section 2: Science as a Process
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Gaussian process modelling
Simple Linear Regression
Crystal Linkletter and Derek Bingham Department of Statistics and Actuarial Science Simon Fraser University Acknowledgements This research was initiated.
EQT373 STATISTIC FOR ENGINEERS Design of Experiment (DOE) Noorulnajwa Diyana Yaacob School of Bioprocess Engineering Universiti Malaysia Perlis 30 April.
Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Williams, A.: Measurement Uncertainty© Springer-Verlag Berlin Heidelberg 2003 In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Chapter 10 Verification and Validation of Simulation Models
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Stochastic Loss Reserving with the Collective Risk Model Glenn Meyers ISO Innovative Analytics Casualty Loss Reserving Seminar September 18, 2008.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Introduction to emulators Tony O’Hagan University of Sheffield.
Stats 242.3(02) Statistical Theory and Methodology.
Stats Methods at IC Lecture 3: Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Section 2: Science as a Process
Statistical Data Analysis
Chapter 10 Verification and Validation of Simulation Models
Variable Selection for Gaussian Process Models in Computer Experiments
CHAPTER 29: Multiple Regression*
Integration of sensory modalities
Statistical Thinking and Applications
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Course round-up subtitle- Statistical model building Marian Scott University of Glasgow Glasgow, Aug 2013

Step 1 why do you want to build a model- what is your objective? what data are available and how were they collected? is there a natural response or outcome and other explanatory variables or covariates?

Modelling objectives explore relationships make predictions improve understanding test hypotheses

Conceptual system Data Model Policy inputs & parameters model results feedbacks

Value judgements Different criteria of unequal importance key comparison often comparison to observational data (RSS, AIC......) but such comparisons must include the model uncertainties and the uncertainties on the observational data.

Questions we ask about models Is the model valid? Are the assumptions reasonable? Does the model make sense based on best scientific knowledge? Is the model credible? Do the model predictions match the observed data? How uncertain are the results?

Stages in modelling Design and conceptualisation: – Visualisation of structure – Identification of processes – Choice of parameterisation Fitting and assessment – parameter estimation (calibration) – Goodness of fit

a visual model- atmospheric flux of pollutants Atmospheric pollutants dispersed over Europe In the 1970 considerable environmental damage caused by acid rain International action Development of EMEP programme, models and measurements

The mathematical flux model L: Monin-Obukhov length u*: Friction velocity of wind c p : constant (=1.01) : constant (=1246 gm -3 ) T: air temperature (in Kelvin) k: constant (=0.41) g: gravitational force (=9.81m/s) H: the rate of heat transfer per unit area gasht: Current height that measurements are taken at. d: zero plane displacement

what would a statistician do if confronted with this problem? Look at the data understand the measurement processes think about how the scientific knowledge, conceptual model relates to what we have measured

Step 2- understand your data study your data learn its properties tools- graphical

measured atmospheric fluxes for 1997 measured fluxes for 1997 are still noisy. Is there a statistical signal and at what timescale?

Key properties of any measurement Accuracy refers to the deviation of the measurement from the true value Precision refers to the variation in a series of replicate measurements (obtained under identical conditions)

Accurate Imprecise Inaccurate Precise Accuracy and precision

Data properties Nature and distribution of the data- continuous, counts.... Normal, exponential, poisson, maybe need a transformation Missing data- outliers- limits of detection Use pictures to explore

Step 3- build the statistical model Outcomes or Responses Causes or Explanations these are the conditions or environment within which the outcomes or responses have been observed -the covariates. This has very much been the focus of much of the week- whether a linear model, a smooth flexible model, a time series model, a bayesian model.....

Are you a bayesian? What does that mean? It means, you have prior information (belief) that you want to include in your statistical model You need to find a way of capturing this in the prior distribution Model output then a posterior distribution on the quantity of interest- automatically incorporates uncertainty

Calibration-using the data A good idea, if possible to have a training and a test set of data-split the data (90%/10%) Fit the model using the training set, evaluate the model using the test set. why? because if we assess how well the model performs on the data that were used to fit it, then we are being over optimistic other methods: bootstrap and jackknife

Which variables to include Use your science knowledge Use pictures to look for patterns Maybe use some of the more algorithmic ways to select the set (stepwise, BSR...) How to compare models? Nested models (ANOVA, likelihood ratio test)

Uncertainty and sensitivity analysis

Uncertainty (in variables, models, parameters, data) what are uncertainty and sensitivity analyses?

Modelling tools - SA/UA Sensitivity analysis determining the amount and kind of change produced in the model predictions by a change in a model parameter Uncertainty analysis an assessment/quantification of the uncertainties associated with the parameters, the data and the model structure.

SA flow chart ( Saltelli, Chan and Scott, 2000)

Design of the SA experiment Simple factorial designs (one at a time) Factorial designs (including potential interaction terms) Fractional factorial designs Important difference: design in the context of computer code experiments – random variation due to variation in experimental units does not exist.

Global SA Global SA apportions the output uncertainty to the uncertainty in the input factors, covering their entire range space. A global method evaluates the effect of x j while all other x i,i j are varied as well.

How is a sampling (global) based SA implemented? Step 1:define model, input factors and outputs Step 2:assign p.d.f.s to input parameters/factors and if necessary covariance structure. DIFFICULT Step 3:simulate realisations from the parameter pdfs to generate a set of model runs giving the set of output values.

SA -analysis At the end of the computer experiment, data is of the form (y ij, x 1i,x 2i,….,x ni ), where x 1,..,x n are the realisations of the input factors. Analysis includes regression analysis (on raw and ranked values), standard hypothesis tests of distribution (mean and variance) for subsamples corresponding to given percentiles of x, and Analysis of Variance.

How can SA/UA help? SA/UA have a role to play in all modelling stages: – We learn about model behaviour and robustness to change; – We can generate an envelope of outcomes and see whether the observations fall within the envelope; – We can tune the model and identify reasons/causes for differences between model and observations

On the other hand - Uncertainty analysis Parameter uncertainty – usually quantified in form of a distribution. Model structural uncertainty – more than one model may be fit, expressed as a prior on model structure. Scenario uncertainty – uncertainty on future conditions.

An uncertainty example ( Ron Smith ) Original Mean of 100 simulations Standard deviation

An uncertainty example CV from 100 simulations Possible bias from 100 simulations

An uncertainty example model sensitivity analysis identifies weak areas lack of knowledge of accuracy of inputs a significant problem there may be biases in the model output which, although probably small in this case, may be important Model emulators have become popular

Take home message Only able to give you a flavour of what might be possible Good environmental science and good statistical science is key for all problems Think critically- test and re-test your hypotheses and assumptions

Take home message Resources Many good books (have seen some of these over the sessions- not one size fits all JISC mail list- Envstat (worth joining) Royal Statistical Society has an Environmental Statistics section, sometimes holds tutorial meetings on topics.