Some statistical ideas Marian Scott Statistics, University of Glasgow June 2012.

Slides:



Advertisements
Similar presentations
Statistics at a Glance Part I Organizing, decribing, and analyzing data Part II Producing Data- Surveys, Experiments, and Observational studies Part III.
Advertisements

What can Statistics do for me? Marian Scott Dept of Statistics, University of Glasgow Statistics course, March 2009.
What can Statistics do for me? Marian Scott Dept of Statistics, University of Glasgow Statistics course, September 2006.
Environmental change and statistical trends – some examples Marian Scott Dept of Statistics, University of Glasgow NERC August 2012.
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2010.
Some statistical ideas Marian Scott Statistics, University of Glasgow January 2014.
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2008.
Environmental change and statistical trends – some examples
Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present.
Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011.
Environmental change and statistical trends – some examples Marian Scott Dept of Statistics, University of Glasgow NERC September 2011.
What can Statistics do for me? Marian Scott Dept of Statistics, University of Glasgow Statistics course, September 2007.
Assessing Ecological Changes in Freshwaters using Statistical Models Claire Ferguson Adrian Bowman, Marian Scott Laurence Carvalho (CEH, Edinburgh)
Claire Ferguson 1, Adrian Bowman 1, Marian Scott 1 and Laurence Carvalho 2 1 University of Glasgow 2 Centre for Ecology & Hydrology A Case Study of Loch.
Uncertain models and modelling uncertainty
Chapter 9: Simple Regression Continued
Ch 2 Review.
Chapter 16 Inferential Statistics
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Introduction to Statistics
Chapter Topics Types of Regression Models
Overview of STAT 270 Ch 1-9 of Devore + Various Applications.
Operations Management R. Dan Reid & Nada R. Sanders
Linear Regression Example Data
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 7 Forecasting with Simple Regression
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
© 2011 Pearson Prentice Hall, Salkind. Introducing Inferential Statistics.
Statistical Analysis & Techniques Ali Alkhafaji & Brian Grey.
Lecture 14 Multiple Regression Model
Environmental Science Bellringers
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Introduction to Earth Science
ESTIMATING & FORECASTING DEMAND Chapter 4 slide 1 Regression Analysis estimates the equation that best fits the data and measures whether the relationship.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.
Using Web-based Data Sets to Enhance Student Understanding of Climate Change and Data Analysis Kathryn A. Hoppe Green River Community College, Auburn,
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.
1.What is Pearson’s coefficient of correlation? 2.What proportion of the variation in SAT scores is explained by variation in class sizes? 3.What is the.
Recapitulation! Statistics 515. What Have We Covered? Elements Variables and Populations Parameters Samples Sample Statistics Population Distributions.
CH. 2 Tools of Environmental Science I. Scientific Methods I. Scientific Methods A. The Experimental Method A. The Experimental Method Scientists make.
Bailey Wright.  Tornadoes are formed when the vertical wind shear, vertical vorticity, and stream line vorticity conditions are favorable. ◦ Storms and.
Review of BUSA3322 Mary M. Whiteside. Methodologies Two sample tests Analysis of variance Chi square tests Simple regression Multiple regression Time.
Ch. 2 Tools and Methods on an Environmental Scientist.
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
BUS 308 Entire Course (Ash Course) For more course tutorials visit BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308.
BUS 308 Entire Course (Ash Course) FOR MORE CLASSES VISIT BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308 Week 1.
Chapter 8 Introducing Inferential Statistics.
BUS 308 mentor innovative education/bus308mentor.com
Chapter 2 Doing Sociological Research
Science 8--Nature of Science—Scientific Problem Solving
US Environmental Protection Agency
Inferences Concerning Regression Parameters
Chapter 1 Studying Science
CHAPTER 29: Multiple Regression*
Introduction to Statistics
Lecture Slides Elementary Statistics Twelfth Edition
Boosting your child’s confidence in Science
LESSON 24: INFERENCES USING REGRESSION
Statistical Inference
Lecture Slides Elementary Statistics Twelfth Edition
Example: All vehicles made
Correlation A measure of the strength of the linear association between two numerical variables.
Chapter 13 Additional Topics in Regression Analysis
Today we are going back in time to help us predict the future!!!
What does a scientist Do?
Presentation transcript:

Some statistical ideas Marian Scott Statistics, University of Glasgow June 2012

What shall we cover? Why might we need some statistical skills Statistical inference- what is it? how to handle variation exploring data probability models inferential tools- hypothesis tests and confidence intervals Regression and relationships

Why quantify? We need statistical skills to: Make sense of numerical information, Summarise data, Present results (graphically), Test hypotheses Construct models Decision making- Which areas should be restricted? Prediction-What is the trend in temperature? Predict its level in 2050? Decision making-is it safe to eat fish? Regulatory- Have emission control agreements reduced air pollutants? Understanding -when did things happen in the past

Observed nitrogen signals in rivers, lakes and groundwater in Europe (EEA). What is a trend and how should we evaluate it? How sure are we?

Trends in seasons over Europe (Global Change Biology, 2006) 21 countries, 125,000 studies, 542 plant and 19 animal species, Spring is on average 6 to 8 days earlier than it was 30 years ago Analysis of 254 national time series, pattern of observed change in spring matches measured national warming (correlation coefficient –0.69, P<0.001) What do the statistical terms mean?

Spatial patterns of change Spatial patterns of change may be important Changes in the start and end of the growing season between two years (1961, 2004) –heterogeneous

the statistical process A process that allows inferences about properties of a large collection of things (the population) to be made based on observations on a small number of individuals belonging to the population (the sample). The use of valid statistical sampling techniques increases the chance that a set of specimens (the sample, in the collective sense) is collected in a manner that is representative of the population.

What is the population? The population is the set of all items that could be sampled, such as all fish in a lake, all people living in the UK, all trees in a spatially defined forest, or all 20-g soil samples from a field. Appropriate specification of the population includes a description of its spatial extent and perhaps its temporal stability

What are the sampling units? In some cases, sampling units are discrete entities (i.e., animals, trees), but in others, the sampling unit might be investigator-defined, and arbitrarily sized. Example- technetium in shellfish The objective here is to provide a measure (the average) of technetium in shellfish (eg lobsters for human consumption) for the west coast of Scotland. Population is all lobsters on the west coast Sampling unit is an individual animal. Variability exists amongst the sampling units and hence within the population

Some other terminology Parameter: this is a number that describes the population, usually what we want to know about Example- population mean technetium level Statistic: this is a number that we calculate from the sample Example- sample mean technetium level Variability exists amongst the sampling units and hence within the population, so we could imagine hypothetically drawing different samples, all would give a different sample mean.

Summarising data- means, medians and other such statistics

Data types Numerical: a variable may be either continuous or discrete. – For a discrete variable, the values taken are whole numbers (e.g. number of invertebrates). – For a continuous variable, values taken are real numbers ( e.g. pH, alkalinity, DOC, temperature). Categorical: a limited number of categories or classes exist, each member of the sample belongs to one and only one of the classes. – Compliance is a nominal categorical variable since the categories are unordered. – Level of diluent (eg recorded as low, medium,high) would be an ordinal categorical variable since the different classes are ordered

plotting data- histograms, boxplots, stem and leaf plots, scatterplots

median lower quartile upper quartile

Example -Bathing water quality All bathing water sites are classified as either Excellent, Good, Sufficient or Poor in terms of the quantities of 2 different microbiological indicator bacteria Faecal Streptococci (FS) Faecal Coliforms (FC) Sufficient is the minimum standard that bathing water sites are required to meet Classification for each site is based on the 90 th & 95 th percentiles of samples over the most recent 4 bathing seasons joint work with Ruth Haggarty, Claire Ferguson

Preliminary Analysis There is considerable variation –Across different sites –Within the same site across different years Distribution of data is highly skewed with evidence of outliers and in some cases bimodality

probability models- the Normal especially

checking distributional assumptions

Modelling Continuous Variables checking normality Normal probability plot Should show a straight line p-value of test is also reported (null: data are Normally distributed)

Statistical inference Confidence intervals Hypothesis testing and the p-value Statistical significance vs real-world importance Building statistical models

a formal statistical procedure- confidence intervals

Confidence intervals- an alternative to hypothesis testing A confidence interval is a range of credible values for the population parameter. The confidence coefficient is the percentage of times that the method will in the long run capture the true population parameter. A common form is sample estimator 2* estimated standard error

another formal inferential procedure- hypothesis testing

Hypothesis Testing Null hypothesis: usually no effect Alternative hypothesis: effect Make a decision based on the evidence (the data) There is a risk of getting it wrong! Two types of error:- –reject null when we shouldnt - Type I –dont reject null when we should - Type II

Significance Levels We cannot reduce probabilities of both Type I and Type II errors to zero. So we control the probability of a Type I error. This is referred to as the Significance Level or p-value. Generally p-value of <0.05 is considered a reasonable risk of a Type I error. (beyond reasonable doubt)

Statistical Significance vs. Practical Importance Statistical significance is concerned with the ability to discriminate between treatments given the background variation. Practical importance relates to the scientific domain and is concerned with scientific discovery and explanation.

Power Power is related to Type II error probability of power = 1 - making a Type II error Aim: to keep power as high as possible (also related to sample size calculations)

relationships- linear or otherwise

Correlations and linear relationships pearson correlation Strength of linear relationship Simple indicator lying between –1 and +1 Check your plots for linearity

Interpreting correlations The correlation coefficient is used as a measure of the linear relationship between two variables, The correlation coefficient is a measure of the strength of the linear association between two variables. If the relationship is non-linear, the coefficient can still be evaluated and may appear sensible, so beware- plot the data first.

what is a statistical model?

Statistical models Outcomes or Responses these are the results of the practical work and are sometimes referred to as dependent variables. Causes or Explanations these are the conditions or environment within which the outcomes or responses have been observed and are sometimes referred to asindependent variables, but more commonly known as covariates.

Specifying a statistical models Models specify the way in which outcomes and causes link together, eg. Chl-a ~ Temperature there should be an additional item on the right hand side giving a formula:- Chl-a ~ Temperature + Error This says that Chl-a depends on temperature, but that there is also some random variability (error)

Example 1: are atmospheric SO 2 concentrations declining? Measurements made at a monitoring station over a 20 year period Complex statistical model developed to describe the pattern, the model portions the variation to trend, seasonality, residual variation

summary hypothesis tests and confidence intervals are used to make inferences we build statistical models to explore relationships and explain variation a general linear modelling framework is very flexible assumptions should be checked.

Statistics might be needed where? designing and evaluation monitoring and sampling networks; sampling strategies the analysis of observational records, (e.g. past climate indicators, water quality, pollutant trends); trends, spatio-temporal modelling, dealing with variation the study and modelling of extreme events (e.g. sea levels, flood prediction) for prediction and management of future occurrences; extremes, risk modelling, uncertainty evaluating the state of the environment;trends, uncertainty, prediction

Statistics might be needed where? the use of complex computer models to simulate the whole earth system (e.g. climate change and the carbon cycle); uncertainty, model evaluation the analysis of observational records, (e.g. past climate indicators, water quality, pollutant trends); trends, spatio-temporal modelling, dealing with variation the study and modelling of extreme events (e.g. sea levels, flood prediction) for prediction and management of future occurrences; extremes the evaluation and quantification of risk and uncertainty (e.g. volcanic or earthquake prediction);uncertainty, prediction

Statistics and the environment Appropriate statistical models can give –added value to your data, –better descriptions of complex change behaviour and –begin to tease out climate change driven effects in environmental quality –handle natural variation. Greater, innovative statistical analysis needed for environmental science

Statistics and the environment As environmental scientists, we need to try and ensure that: data are gathered under good statistical principles and that they are not left in the filing cabinet. We need to ensure that Good environmental science is served by good statistical science. Environmental science should be Data and information rich

Statistics training we have chosen a few key statistical topics to cover- there are many others We will also have some practical examples for you to work through with guidance the main software tool will be R, which is freely available- this is very similar but not identical to S+ there should be lots of opportunities to ask questions