Ecology reporting and statistical analysis

Slides:



Advertisements
Similar presentations
Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Lab exam when: Nov 27 - Dec 1 length = 1 hour –each lab section divided in two register for the exam in your section so there is a computer reserved for.
Objectives (BPS chapter 24)
Hyp Test II: 1 Hypothesis Testing: Additional Applications In this lesson we consider a series of examples that parallel the situations we discussed for.
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Independent Samples and Paired Samples t-tests PSY440 June 24, 2008.
The Simple Regression Model
Matching level of measurement to statistical procedures
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Correlations and T-tests
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 14 Inferential Data Analysis
Relationships Among Variables
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
Inferential Statistics: SPSS
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Correlation and Linear Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Choosing and using statistics to test ecological hypotheses
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
STEM Fair Graphs & Statistical Analysis. Objectives: – Today I will be able to: Construct an appropriate graph for my STEM fair data Evaluate the statistical.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
Independent Samples t-Test (or 2-Sample t-Test)
Hypothesis Testing Using the Two-Sample t-Test
ANOVA (Analysis of Variance) by Aziza Munir
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Review Hints for Final. Descriptive Statistics: Describing a data set.
ANOVA: Analysis of Variance.
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Statistical Analysis Topic – Math skills requirements.
Chapter Eight: Using Statistics to Answer Questions.
PCB 3043L - General Ecology Data Analysis.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
ANOVA, Regression and Multiple Regression March
Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.
Analysis of Variance STAT E-150 Statistical Methods.
STATS 10x Revision CONTENT COVERED: CHAPTERS
Chapter 13 Understanding research results: statistical inference.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
When the means of two groups are to be compared (where each group consists of subjects that are not related) then the excel two-sample t-test procedure.
PCB 3043L - General Ecology Data Analysis.
Kin 304 Inferential Statistics
Reasoning in Psychology Using Statistics
Reasoning in Psychology Using Statistics
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Correlation and Simple Linear Regression
Presentation transcript:

Ecology reporting and statistical analysis Chris Luszczek Biol2050

Introduction Please treat this slide show as a statistics manual for Biol2050 This tutorial will provide you with the basics of various common statistical methods and examples of how to perform these tests using SPSS statistical software available in York computer labs and accessible from home using York’s remote File Access System (FAS) *WARNING* The FAS may involve a lengthy installation procedure and I have found it to be finicky, sometimes requiring multiple tempts at installation. Be aware of this if you are downloading the software at home… at midnight the evening before your report is due.

Outline 1) Hypothesis Building 2) Hypothesis Testing Null hypothesis/alternate hypothesis 2) Hypothesis Testing 3) Common Statistical tests and how to run them Correlation t-test ANOVA 4) Graphing – how to present your findings Types of graphs and usage formatting

1) Hypothesis Building Creating testable hypothesis is central to scientific method Null (Ho) hypothesis – ‘no effect’ or ‘no difference’ between samples or treatments Alternative (Ha) hypothesis – experimental treatment has a certain statistically significant A claim for which we are trying to find evidence Example Ho: “Different habitats on the York university campus display no differences in diversity” (Ho: x2=x1 or x2-x1=0) Ha: “Grassland habitats at York University contain higher diversity than managed or landscaped areas” (Ha: x2>x1or x2-x1> 0) Hypotheses can never be proven only disproven. Any hypothesis can not be proved to be true until all possible antitheses are indisputably disproved..... even then none can be sure whether every possible antithesis has been considered and dealt with.

2) Hypothesis Testing Either reject or fail to reject the H0 based on statistical testing Statistical testing compares the p-value of observed data to an assigned significance level (α) p-value – the frequency or probability with which the observed event would occur α = the probability that the outcome did not occur by chance Popular levels of significance are 5% (0.05), 1% (0.01), and 0.1% (0.001) IF p-value is smaller than α reject the null hypothesis (H0) The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is false (i.e. the probability of not committing a Type II error, or making a false negative decision).

Hypothesis Testing Visual Summary Are the means significantly different? Sample distribution 1 Sample distribution 2 Identifying differences that are likely due to forces beyond chance. Distributions Errors Significance level Power 1-β Testing for and assuming normality Mean 1 Mean 2

3) Common Statistical Tests in Ecology T-tests ANOVA - Analysis Of VAriance Correlation

Common Statistical Tests in Ecology T-tests: used to determine if two sets of data (2 means) are significantly different from each other. It assumes that data is normally distributed and samples are equal. 2 decisions must be made when selecting a t-test: Paired vs. independent 1-tailed vs 2-tailed ANOVA - Analysis Of VAriance Correlation

3A) T-test One-sample (paired) t-test: Compares two samples in cases where each value in one sample has a natural partner in the other (data are not independent). Used on pre/post data . Also compares a sample mean to a specified value Comparing patient performance before and after the application of a drug (repeated measures sampling – the same subjects are measures before and after treatment) Two- sample (Independent) t-test:  compares means for two groups of cases. Comparing patient performance in a group receiving a drug versus a separate group receiving a trial drug

3A) T-test One-tailed/sided t-test: expect the effect to be in a certain direction “is the sample mean greater than µ?” “is the sample mean less than µ?” H0 : µ = 𝜇 0 , where 𝜇 0 is known HA : µ > 𝜇 0 or µ < 𝜇 0 Two-tailed/sided t-test: testing for different means regardless of direction “is there a significant difference?” H0 : 𝜇 1 = 𝜇 2 HA : 𝜇 1 ≠ 𝜇 2

Match Your Hypothesis and Test! A carefully stated experimental hypothesis with indicate the type of effect you are looking for For example, the hypothesis that "Coffee improves memory“ – suggests paired, one tailed because you will repeatedly measure the same participants and expect an improvement. "Men weigh a different amount from women“ - suggests an independent two tailed test as no direction is implied. So remember, don't be vague with your hypothesis if you are looking for a specific effect! Be careful with the null hypothesis too - avoid "A does not effect B" if you really mean "A does not improve B".

Running a T-test in SPSS Question: Do the fish in lake 1 and lake 2 weigh the same? Null hypothesis: 𝜇 1 = 𝜇 2 (the fish in lake 1 weigh the same as the fish in lake 2) An independent, 2-tailed test! Alternative hypothesis : 𝜇 1 ≠ 𝜇 2 (the fish in lake 1 and lake 2 DO NOT weigh the same)

1) 2) Sometimes SPSS can be tricky about access directories on your computer while using it remotely on the citrix receiver. I find it best to save your data on your C: drive or else on a removable USB drive – these seem to be the most common easily accessed drives 2 views SPSS generates a type of meta data sheet called the variable view. Data from an excel sheet can be opened in SPSS –Sometimes will automatically see a summary of your data rather than the data – to correct: Click Data view tab rather than Variable view

Data View / entry

Select Analyze  compare means  independent samples t test Weight is the test variable Lake is the grouping variable (Click on define groups and type the two names used in the data view)

Output Levene’s test – Assesses if variances are equal, if greater p > 0.05 you can interpret the t results * Given the quality of data collected in these labs assume that the data fulfills the Levene’s test and go on to interpret t-test* Our example: Levene’s (p = 0.669) so we can interpret the t-test (p = 0.01) so we can reject the null hypothesis, thus the fish from lake 1 and lake 2 DO NOT weigh the same. How to report: two-sample t(df) = t-value, p = p-value (two-sample t(12) = -3.065, p = 0.01)

Common Statistical Tests in Ecology T-tests: ANOVA - Analysis Of VAriance Comparing more than two groups of means Compares variance within groups and between groups Parametric, extension of two-tailed t-test Correlation:

3B) ANOVA Analysis Of VAriance (ANOVA) Examples: Is tree density at all York habitats the same? Does insect diversity in York grasslands differ from insect diversity in York woodlots and human impacted? 3 means being compared H0 : µ1 = µ2 = µ3 = … = µk where k = number of related groups HA: one or more means are different

Running an ANOVA You sample four fish from each of three lakes to determine if the fish from the three lakes all weight the same. H0 : There is no difference in fish weight between lakes H0 : 𝜇 𝐿𝑎𝑘𝑒 1 = 𝜇 𝐿𝑎𝑘𝑒 2 = 𝜇 𝐿𝑎𝑘𝑒 3 HA : 𝜇 𝐿𝑎𝑘𝑒 1 ≠ 𝜇 𝐿𝑎𝑘𝑒 2 ≠ 𝜇 𝐿𝑎𝑘𝑒 3 Select Analyze  compare means  one way ANOVA *Important* Select post hoc Tukey  continue  OK Running the ANOVA will identify IF differences between groups exist. Running a post hoc test will test all combinations to determine WHICH groups are difference from each other

Sig. difference between groups Lake 1 and 2 are not significantly different but both are sig. different from lake 3 (based on α = 0.05)

Common Statistical Tests in Ecology T-tests ANOVA - Analysis Of VAriance Correlation: Indicates the strength and direction of a linear relationship between two random variables H0 : no relationship between variables HA : there is a relationship between variables

3C) Correlation Pearson’s Correlation Coefficient (r) – measures the relationship between two variables r always lies between -1 and +1 Positive r-values means that the two variables increase with each other. Negative r-values mean they decrease with each other r-values close to zero mean the variables have no relationship. r-values close to either -1 or 1 mean the relationship is strong. Generally, for ecological data, r greater than 0.5 is considered very strong and a correlation less than 0.2 is considered weak. R2 (coefficient of determination) is the percent of the data that is closest to the line of best fit or a measure of how well the regression lines represents the data. r = correlation coefficient The Pearson product-moment correlation coefficient is a common measure of the correlation (linear dependence) between two variables X and Y. It is very widely used in the sciences as a measure of the strength of linear dependence between two variables, giving a value somewhere between +1 and -1 inclusive. Correlation coeffcient ( r) vs coefficient of determination (r2 or R2) The coefficient of determination, r 2, is useful because it gives the proportion of        the variance (fluctuation) of one variable that is predictable from the other variable.       It is a measure that allows us to determine how certain one can be in making       predictions from a certain model/graph.    The coefficient of determination is the ratio of the explained variation to the total       variation.    The coefficient of determination is such that 0 <  r 2 < 1,  and denotes the strength       of the linear association between x and y.      The coefficient of determination represents the percent of the data that is the closest       to the line of best fit.  For example, if r = 0.922, then r 2 = 0.850, which means that       85% of the total variation in y can be explained by the linear relationship between x       and y (as described by the regression equation).  The other 15% of the total variation       in y remains unexplained.    The coefficient of determination is a measure of how well the regression line       represents the data.  If the regression line passes exactly through every point on the       scatter plot, it would be able to explain all of the variation. The further the line is       away from the points, the less it is able to explain. r² = coefficient of determination There are several different definitions of R² which are only sometimes equivalent. One class of such cases includes that of linear regression. In this case, R² is simply the square of the sample correlation coefficient between the outcomes and their predicted values, or in the case simple linear regression, between the outcome and the values being used for prediction. In such cases, the values vary from 0 to 1. Important cases where the computational definition of R² can yield negative values, depending on the definition used, arise where the predictions which are being compared to the corresponding outcome have not derived from a model-fitting procedure using those data. R² is a statistic that will give some information about the goodness of fit of a model. In regression, the R² coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R² of 1.0 indicates that the regression line perfectly fits the data.

Correlation Example 1 Is there a relationship between the bird diversity and plant diversity in a given habitat? H0 : no relationship between variables HA : there is a relationship between variables r=0.3

Correlation Example 2 Is there a relationship between plant density and a) bare ground b) soil pH c) species richness? H0 : no relationship between variables HA : there is a relationship between variables

Running a Correlation Hypothesize that there is a relationship between mean fish length and lake size (larger lakes might have larger fish). Collected data from 21 lakes. Select Graphs  legacy dialogs  scatter  define (lake size x variable and fish length y variable)

Select Graphs  legacy dialogs  scatter  define (lake size x variable and fish length y variable)

r = 0.824 p < 0.001 Therefore, there is a HIGHLY SIGNIFICANT, STRONG, POSITIVE relationship between fish length and lake size. p value is calculated behind the scenes but is essentially produced by running a 2 tailed t-test with the null hypothesis that the correlation coeeficient is 0 (no correlation). It is tested as a 2 tailed test because the Ha (correlation) can be positive or negative.

Outline 1) Hypothesis Building 2) Hypothesis Testing Null hypothesis/alternate hypothesis 2) Hypothesis Testing 3) Common Statistical tests and how to run them Correlation t-test ANOVA 4) Graphing – how to present your findings Types of graphs and usage formatting

Choosing Graphs Your hypothesis and statistical test should guide your choice of figures! As we have seen some tests are related to specific figures Correlations and Scatter plots The following slides outline the basic use of several common graphs Scatter plots Line Graphs Bar graphs Histograms All graphs are figures vs. tables Figures should be accompanied by captions. Figure captions should be below the image and contain certain information. what the Table or Figure tells the reader what results are being shown in the graph(s) including the summary statistics plotted the organism studied in the experiment (if applicable), context for the results: the treatment applied or the relationship displayed, etc. location (ONLY if a field experiment), specific explanatory information needed to interpret the results shown (in tables, this is frequently done as footnotes) culture parameters or conditions if applicable (temperature, media, etc) as applicable, and, sample sizes and statistical test summaries as they apply. Do not simply restate the axis labels with a "versus" written in between.

Scatter plot Displays 2 variables for a set of data Dependant vs. independent – one variable is under the control of the other variable (Regression Analysis) OR If we have no dependent variable, a scatter plot will show the degree of correlation (NOT CAUSATION!)

Line graph Shows relationship between values plotted on each axis (dependant vs. independent) Used on continuous variables

Bar graph Used for discreet quantitative variables which are similar but not necessarily related Often use ANOVA to test difference

Making Proper Error bars in Excel Excel will apply the same error to all bars if you use the automatic error bar feature. To produce proper, interpretable error bars you must: 1) Calculate standard error for your data: - First calculate standard deviation using the “STDEV.S” function - Then divide standard deviation by the square root of n (observations per group) to give you Standard Error. 2) Different versions of excel hide the ‘custom error bar’ option in different places – - try selecting the data bars  right click  select ‘format data series’  ‘error bars’ OR - try clicking the graph  move to ‘layout’ under ‘chart tools’ tab  ‘error bars’ 3) select ‘custom’ and ‘specify value’ 4) Be sure to select the ‘range’ of SE values to match the range of selected data for both the positive and negative error value

Proper Error Bars 1) 2) 3) 4) See previous slide for explanation of steps.

Histogram Used exclusively for showing the distribution of data that are continuous.

Conclusion This tutorial has provided you with the basic theory, mechanics and applications of common statistical tests. You should now be able to carry out scientific reporting from hypothesis formation to statistical testing and figure formatting.