Biol 500: basic statistics

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
2  How to compare the difference on >2 groups on one or more variables  If it is only one variable, we could compare three groups with multiple ttests:
Inference for Regression
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Introduction to Probability and Statistics Linear Regression and Correlation.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 13 Using Inferential Statistics.
Today Concepts underlying inferential statistics
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Relationships Among Variables
Lecture 5 Correlation and Regression
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Leedy and Ormrod Ch. 11 Gray Ch. 14
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
AM Recitation 2/10/11.
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
ANALYSIS OF VARIANCE. Analysis of variance ◦ A One-way Analysis Of Variance Is A Way To Test The Equality Of Three Or More Means At One Time By Using.
Selecting the Correct Statistical Test
Inference for regression - Simple linear regression
Hypothesis Testing:.
Chapter 13: Inference in Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Statistical Analysis Topic – Math skills requirements.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
ANOVA (Analysis of Variance) by Aziza Munir
Stats 95.
Parametric tests (independent t- test and paired t-test & ANOVA) Dr. Omar Al Jadaan.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
I271B The t distribution and the independent sample t-test.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Chapter Eight: Using Statistics to Answer Questions.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
ANOVA, Regression and Multiple Regression March
Psych 230 Psychological Measurement and Statistics Pedro Wolf November 18, 2009.
Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Chapter 10: The t Test For Two Independent Samples.
When the means of two groups are to be compared (where each group consists of subjects that are not related) then the excel two-sample t-test procedure.
Stats Methods at IC Lecture 3: Regression.
Chapter 4 Basic Estimation Techniques
AP Biology Intro to Statistics
Basic Estimation Techniques
BPK 304W Correlation.
Basic Estimation Techniques
Statistics review Basic concepts: Variability measures Distributions
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Biol 500: basic statistics Goals: 1) understand basics of experimental design - controls - replication 2) understand how to report quantitative data 3) be able to interpret the reporting of statistical results in a journal article

Replication: allows you to determine if the difference between treatments or groups of samples is greater than the variation within a treatment or group Is there a difference in how effective the 3 drugs are in curing headaches?

Replication: allows you to determine if the difference between treatments or groups of samples is greater than the variation within a treatment or group Is there a difference in how effective the 3 drugs are in curing headaches? Generally, overlapping error bars indicate no significant difference between the mean values that are being graphed Bars don’t overlap = probably different ? no yes

Controls: From these data, could you tell if the least effective drug has any effect at all?

Controls: From these data, could you tell if the least effective drug has any effect at all? Including a control that is the same in all respects except the key variable you are manipulating is key to interpreting your results

Controls: Procedural controls allow you to diagnose problems in your experiment, samples or technique When we amplify DNA from unknown samples by PCR, we include a positive control (a DNA sample that always works) and a negative control (all the PCR reagents, but no DNA) This allows us to interpret the results of our gels, and to troubleshoot any problems

Do squirrels bury acorns? My experiment: I remove all the squirrels from 3 clumps of trees in one park, but leave the squirrels 3 control clumps of trees in another park on the other side of town Park A Park B

Pseudoreplication In this example the unit of replication is the park, not the clump of trees – I have no actual replication Park A Park B Any difference that I measure could be due to differences among the two parks, and not due to my squirrel-removal treatment

Avoiding pseudoreplication Correct design would be to have squirrel-removal and control areas in each of several replicate parks This lets you assess differences between treatment and control areas, while simultaneously measuring variation among parks Park C Park A Park B

n = 10 Did these two classes do differently on my 418 midterm?

n = 20

n = 44

n = 44 The statistical approach is to ask if the means of these X = 133.9 ± 29.7 SD range: 59 - 183 X = 126.3 ± 38.8 SD range: 42 - 188 The statistical approach is to ask if the means of these two populations are significantly different

n = 44 the standard deviation (SD) is what you should report if you X = 133.9 ± 29.7 SD range: 59 - 183 X = 126.3 ± 38.8 SD range: 42 - 188 the standard deviation (SD) is what you should report if you are actually interested in the variation – ie, for purposes of deciding where to draw the line between grades

√ X = 133.9 ± 29.7 SD or ± 4.3 SE range: 59 - 183 n = 44 the standard error (SE, or SEM) is SD √ n sample size

X = 133.9 ± 29.7 SD or ± 4.3 SE range: 59 - 183 n = 44 X = 126.3 ± 38.8 SD or ± 5.8 SE range: 42 - 188 the standard error is what you report when you want to compare the means of different treatments or samples

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 X = 133.9 ± 29.7 SD X = 126.3 ± 38.8 SD unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 a t-test compares 2 populations by calculating a test statistic called t and determining the probability (P, or p) of getting that value of t, with that sample size, by chance alone

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 X = 133.9 ± 29.7 SD X = 126.3 ± 38.8 SD unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 paired would be, you compare the % scores on midterm versus final for each student; most tests are unpaired

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 X = 133.9 ± 29.7 SD X = 126.3 ± 38.8 SD unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 one-tailed if you have some reason to think, in advance, that the 2009 scores will only be higher (or lower) than 2007 - cuts your P-value in half, but you need a reason to do this!

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 X = 133.9 ± 29.7 SD X = 126.3 ± 38.8 SD unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 power of your test will depend on your degrees of freedom, which is (sample size) – (number of groups) - in this case: (44 + 44 students) – (2 groups) = 88 -2 = 86

unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 X = 133.9 ± 29.7 SD X = 126.3 ± 38.8 SD unpaired two-tailed t-test: df = 86, t = 1.20, P = 0.23 P values below 0.05 are accepted as significant, meaning there is less than a 5% chance of getting a test statistic this large if the groups are not really any different

3 or more samples can be compared using a df subscripted under F ratio F2,129 = 7.12 P <0.001 overall P for 3-way comparison of means n = 44 n = 44 n = 44 3 or more samples can be compared using a one-way Analysis of Variance, or ANOVA instead of calculating a t statistic, ANOVA calculates an F-ratio, which compares variation within groups (error bars) to the differences in mean values among groups 2 degrees of freedom: 1st = (# of groups – 1) 2nd = (total sample size) – (# of groups)

If your overall P-value is significant, you can then do a post-hoc Scheffe: P = 0.002 Scheffe: P = 0.050 n = 44 n = 44 n = 44 If your overall P-value is significant, you can then do a post-hoc (“after the fact”) test to work out which specific means are different from each other Bonferroni - not too conservative; may see differences that aren’t real Scheffe - very conservative; if it sees a difference, there really is one Dunnett - compares each mean to a control; most powerful

2-way ANOVA tests for interactions among 2 or more factors factors: aspirin, yes/no tylenol, yes/no

2-way ANOVA tests for interactions among 2 or more factors when the response to two treatments combined is not what you would expect from adding their individual effects, this is an interaction interactions are usually the most biologically interesting result!

2-way ANOVA tests for interactions among 2 or more factors A B C D NOT appropriate to do a 1-way ANOVA on these data, because that requires that each treatment be independent of the other treatments - since 2 treatments involve aspirin, they are not independent - also, you miss the interaction, which is the important result

Correlation analysis is appropriate when you think 2 variables are related, but not in a cause-and-effect way - arm length and leg length are related, but longer arms do not cause you to have longer legs; both are due to your height Regression analysis is when you believe a change in one predictor variable (what you manipulate) causes a change in the response variable (the thing you measure) - adding more water makes plants grow taller

Output of a regression analysis includes: 1) ANOVA table tells you if your model explains a significant amount of the variation in the response

Output of a regression analysis includes: 1) ANOVA table 2) equation of the best-fit line summarizes the relationship between predictor and response

Output of a regression analysis includes: 1) ANOVA table 2) equation of the best-fit line 3) table testing the effect of each predictor in multiple regression, you can test many possible predictors that might matter, and see which significantly affect the response variable

Output of a regression analysis includes: 1) ANOVA table 2) equation of the best-fit line 3) table testing the effect of each predictor 4) r2 r2 is the % of variation in the response that is due a change in the predictor

More scatter = lower r2 You can have a low r2, but still have a significant slope

ANOVA and regression are both types of linear models, which test the same basic equation: response variable = model + error variance in the response that is not explained by the model thing you measure predictors, and coefficients that tell you how they affect the response this is what a simple linear regression model looks like

Does predictor X affect response? test is to set the coefficient = 0, which drops out the predictor, and see if the model (now just the residual error term) is really any worse

Parametric versus non-parametric tests All of the tests we have discussed are parametric tests - they use the numerical values of your actual data - however, they also have built-in assumptions that your data, and the residual errors, fit a normal distribution (bell curve)

Parametric versus non-parametric tests % scores If your data do not fit a normal distribution, you can transform the raw numbers to make them more “normal” – put the data through a mathematical function arcsine(square-root(%)) is a standard transformation for %’s which stop at 100%, and are often not normally distributed

Parametric versus non-parametric tests Alternatively, there are non-parametric versions of most common statistical tests that use ranked values instead of the raw data - are typically more conservative: if they see a difference, it is real - make no assumptions about the shape of the distribution raw ranked (high to low) 3 5 2 6 6 3 4 4 9 2 1 7 12 1