Comparing Two Samples Harry R. Erwin, PhD

Slides:



Advertisements
Similar presentations
Using R Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Advertisements

Categorical and discrete data. Non-parametric tests.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 16 l Nonparametrics: Testing with Ordinal Data or Nonnormal Distributions.
Dealing With Statistical Uncertainty
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Departments of Medicine and Biostatistics
Nonparametric Statistics Timothy C. Bates
Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
University of Sydney Statistics 101: Power, p-values and ………... publications. Dr. Gordon S Doig, Senior Lecturer in Intensive Care, Northern Clinical School.
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
CHAPTER 3 ECONOMETRICS x x x x x Chapter 2: Estimating the parameters of a linear regression model. Y i = b 1 + b 2 X i + e i Using OLS Chapter 3: Testing.
Final Review Session.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Student’s t statistic Use Test for equality of two means
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
5-3 Inference on the Means of Two Populations, Variances Unknown
Contrasts Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Nonparametrics and goodness of fit Petter Mostad
Statistical Methods II
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
AM Recitation 2/10/11.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
CORRELATION & REGRESSION
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Nonparametrics and goodness-of-fit
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Summary of Remainder Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Chapter 14 Nonparametric Statistics. 2 Introduction: Distribution-Free Tests Distribution-free tests – statistical tests that don’t rely on assumptions.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Variance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Statistical Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.4 Analyzing Dependent Samples.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Nonparametric Statistics
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Simple linear regression Tron Anders Moger
Binary Response Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Multiple Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Statistical Inference Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Model adequacy checking in the ANOVA Checking assumptions is important –Normality –Constant variance –Independence –Have we fit the right model? Later.
Nonparametric Statistics
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
Nonparametric Statistics
Analysis of Variance Harry R. Erwin, PhD
Nonparametric Statistics
Nonparametric Statistics
Presentation transcript:

Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Gentle, JE (2002) Elements of Computational Statistics. Springer. Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun). Freund and Wilson (1998) Regression Analysis. Academic Press.

Why Test? Statistics is an experimental science, not really a branch of mathematics. It’s a tool that can tell you whether data are accidentally or really similar. It does not give you certainty. This lecture discusses comparison of samples.

Don't Complicate Things Use the classical tests: var.test to compare two variances (Fisher's F) t.test to compare two means (Student's t) wilcox.test to compare two means with non-normal errors (Wilcoxon's rank test) prop.test (binomial test) to compare two proportions cor.test (Pearson's or Spearman's rank correlation) to correlate two variables chisq.test (chi-square test) or fisher.test (Fisher's exact test) to test for independence in contingency tables

Comparing Two Variances Before comparing means, verify that the variances are not significantly different. var.text(set1, set2) This performs Fisher's F test If the variances are significantly different, you can transform the output (y) variable to equalise variances, or you can still use the t.test (Welch's modified test).

Comparing Two Means Student's t-test (t.test) assumes the samples are independent, the variances constant, and the errors normally distributed. It will use the Welch-Satterthwaite approximation (default, less power) if the variances are different. This test can also be used for paired data. Wilcoxon rank sum test (wilcox.test) is used for independent samples, errors not normally distributed. If you do a transform to get constant variance, you will probably have to use this test.

Paired Observations The measurements will not be independent. Use the t.test with paired=T. Now you’re doing a single sample test of the differences against 0. When you can do a paired t.test, you should always do the paired test. It’s more powerful. Deals with blocking, spatial correlation, and temporal correlation.

Sign Test Used when you can't measure a difference but can see it. Use the binomial test (binom.test) for this. Binomial tests can also be used to compare proportions. prop.test

Chi-square Contingency Tables Deals with count data. Suppose there are two characteristics (hair colour and eye colour). The null hypothesis is that they are uncorrelated. Create a matrix that contains the data and apply chisq.test(matrix). This will give you a p-value for matrix values given the assumption of independence.

Fisher's Exact Test Used for analysis of contingency tables when one or more of the expected frequencies is less than 5. Use fisher.test(x)

Correlation and Covariance Are two parameters correlated significantly? Create and attach the data.frame Apply cor(data.frame) To determine the significance of a correlation, apply cor.test(data.frame) You have three options: Kendall's tau (method = "k"), Spearman's rank (method = "s"), or (default) Pearson's product-moment correlation (method = "p")

Kolmogorov-Smirnov Test Are two sample distributions significantly different? or Does a sample distribution arise from a specific distribution? ks.test(A,B)

Statistical Problems Outliers Unequal variances Correlated errors

Outliers and Influential Observations Extreme responses are called outliers and extreme inputs are called leverage points. An observation that has great influence on the estimates is usually an outlier and a leverage point. Use the residual plot to detect them—discussed in the modelling presentation. Fix by verifying the correctness of the observation. If it happens to be correct, it may reflect a factor not present in any of the other observations.

Unequal variances Mentioned earlier. Use non-parametric statistics (usually not effective for regression) robust methods rescaling live with it

Correlated errors The measurements in the data were not independent—usually because selection of the sample units was not strictly random. Frequent problem with time series data but can also reflect spatial correlation or simply sloppy data collection. An autoregressive model. Try special models Avoid with a careful experimental design.