Philip Twumasi-Ankrah, PhD

Slides:



Advertisements
Similar presentations
CHOOSING A STATISTICAL TEST © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Advertisements

Chapter 16 Introduction to Nonparametric Statistics
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 16 l Nonparametrics: Testing with Ordinal Data or Nonnormal Distributions.
Economics 105: Statistics Go over GH 11 & 12 GH 13 & 14 due Thursday.
Introduction to Nonparametric Statistics
Basic Statistical Review
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Chapter 11: Inference for Distributions
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Non-parametric statistics
Mann-Whitney and Wilcoxon Tests.
Nonparametrics and goodness of fit Petter Mostad
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Nonparametric or Distribution-free Tests
Inferential Statistics
Review I volunteer in my son’s 2nd grade class on library day. Each kid gets to check out one book. Here are the types of books they picked this week:
Inferential Statistics: SPSS
Chapter 14: Nonparametric Statistics
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Non-Parametric Methods Professor of Epidemiology and Biostatistics
CHAPTER 14: Nonparametric Methods
Common Nonparametric Statistical Techniques in Behavioral Sciences Chi Zhang, Ph.D. University of Miami June, 2005.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Nonparametric Statistical Methods: Overview and Examples ETM 568 ISE 468 Spring 2015 Dr. Joan Burtner.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
© Copyright McGraw-Hill CHAPTER 13 Nonparametric Statistics.
1/23 Ch10 Nonparametric Tests. 2/23 Outline Introduction The sign test Rank-sum tests Tests of randomness The Kolmogorov-Smirnov and Anderson- Darling.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Ordinally Scale Variables
Steps in Statistical Testing: 1) State the null hypothesis (Ho) and the alternative hypothesis (Ha). 2) Choose an acceptable and appropriate level of significance.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Two Sample t test Chapter 9.
1 Nonparametric Statistical Techniques Chapter 17.
Nonparametric Statistics
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
ANALYSIS PLAN: STATISTICAL PROCEDURES
Angela Hebel Department of Natural Sciences
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
BPS - 5th Ed. Chapter 251 Nonparametric Tests. BPS - 5th Ed. Chapter 252 Inference Methods So Far u Variables have had Normal distributions. u In practice,
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Analisis Non-Parametrik Antonius NW Pratama MK Metodologi Penelitian Bagian Farmasi Klinik dan Komunitas Fakultas Farmasi Universitas Jember.
Nonparametric Statistics
Principles of statistical testing
Tuesday PM  Presentation of AM results  What are nonparametric tests?  Nonparametric tests for central tendency Mann-Whitney U test (aka Wilcoxon rank-sum.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Value Stream Management for Lean Healthcare ISE 491 Fall 2009 Data Analysis - Lecture 7.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai The Normal Curve and Univariate Normality PowerPoint.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
 Kolmogor-Smirnov test  Mann-Whitney U test  Wilcoxon test  Kruskal-Wallis  Friedman test  Cochran Q test.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. CHAPTER 14: Nonparametric Methods to accompany Introduction to Business Statistics fifth.
Research Methodology Lecture No :25 (Hypothesis Testing – Difference in Groups)
Hypothesis testing. Chi-square test
APPROACHES TO QUANTITATIVE DATA ANALYSIS
CHOOSING A STATISTICAL TEST
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Hypothesis testing. Chi-square test
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

Philip Twumasi-Ankrah, PhD Non-parametric statistical methods for testing questionable data-population assumptions Philip Twumasi-Ankrah, PhD November 15, 2012

Parametric or Non-Parametric Tests Choosing the right test to compare measurements is a bit tricky, as you must choose between two families of tests: parametric and nonparametric

Parametric Tests Parametric statistical test are based upon the assumption that the data are sampled from a Gaussian distribution. These tests include the t test and analysis of variance.

Non-Parametric Tests Tests that do not make assumptions about the population distribution are referred to as nonparametric- tests. All commonly used nonparametric tests rank the outcome variable from low to high and then analyze the ranks. These tests include the Wilcoxon, Mann-Whitney test, and Kruskal-Wallis tests. These tests are also called distribution-free tests.

Validity of Assumptions For Parametric statistical tests, it is important that the assumptions made on the probability distribution is valid. If this assumption about the data is true, parametric tests are: more powerful than their equivalent non-parametric counterparts can detect differences with smaller sample sizes, detect smaller differences with the same sample size.

Tests of Normality It is usually important to assure yourself of the validity of the Normality Assumption. This involves tests of univariate normality and include: Graphical Methods Back-of-envelope Tests Some Historical Tests Diagnostic Tests

Graphical Tests Graphical Methods The Normal Quantile-Quantile (Q-Q) plot - constructed by plotting the empirical quantiles of the data against corresponding quantiles of the normal distribution. Kernel Density Plot - Plot of the approximation a hypothesized probability density function from the observed data. The probability-probability plot (P-P plot or percent plot - Compares an empirical cumulative distribution function of a variable with a specific theoretical cumulative distribution function (e.g., the standard normal distribution function)

More Graphical Tests Graphical Methods Histogram plot of the data A box-plot of the data should indicate the nature of skewness of the data. Stem-and-Leaf Plot

Fast-and-Easy Tests Back-of-envelope Tests Using the sample maximum and minimum values, computes their z-score, and compare to the 68–95–99.7 rule:

Historically Relevant Tests Some Historical Tests The third and fourth standardized moments (skewness and kurtosis) were some of the earliest tests for normality. Other early test statistics include the ratio of the mean absolute deviation to the standard deviation OR The ratio of the range to the standard deviation.

Diagnostic Tests Diagnostic Tests D'Agostino's K-squared test, Jarque–Bera test, Anderson–Darling test, Cramér–von Mises criterion, Lilliefors test for normality Kolmogorov–Smirnov test), Shapiro–Wilk test, Pearson's chi-squared test, and Shapiro–Francia test. More recent tests include: The energy test Tests based on the empirical characteristic function like those by Henze and Zirkler, and the BHEP tests.

Choosing Between Parametric and Non-Parametric Tests: Does it Matter? Does it matter whether you choose a parametric or nonparametric test? The answer depends on sample size. There are four cases to think about:

Choosing Between Parametric and Non-Parametric Tests: Does it Matter? Using a parametric test with data from a Non-Normal population when sample sizes are large: The central limit theorem ensures that parametric tests work well with large samples even if the population is non-Gaussian. That is, parametric tests are robust to deviations from Normal distributions, so long as the samples are large. It is impossible to say how large is large enough. Nonparametric tests work well with large samples from Normal populations. The P values tend to be a bit too large, but the discrepancy is small. In other words, nonparametric tests are only slightly less powerful than parametric tests with large samples.

Choosing Between Parametric and Non-Parametric Tests: Does it Matter? For small samples You can't rely on the central limit theorem, so the P value may be inaccurate. In a nonparametric test with data from a Gaussian population, the p - values tend to be too high. The nonparametric tests lack statistical power with small samples.

Choosing Between Parametric and Non-Parametric Tests: Does it Matter? Does it matter whether you choose a parametric or nonparametric test? Large data sets present no problems. Small data sets present a dilemma.

Non-Parametric Tests… Assume that your data have an underlying continuous distribution. Assume that for groups being compared, their parent distributions are similar in all characteristics other than location. Are usually less sensitive than parametric methods. Are often more robust than parametric methods when their assumptions are properly met. Can run into problems when there are many ties (data with the same value). That take into account the magnitude of the difference between categories (e.g. Wilcoxon signed ranks test) are more powerful than those that do not (e.g. sign test).

Choice of Non-Parametric Test It depends on the level of measurement obtained (nominal, ordinal, or interval), the power of the test, whether samples are related or independent, number of samples, availability of software support (e.g. SPSS) Related samples are usually referred to match-pair (using randomization) samples or before-after samples.   Other cases are usually treated as independent samples.  For instance, in a survey using random sampling, we have a sub-sample of males and a sub-sample of females.  They can be considered as independent samples as they are all randomly selected.

Non-Parametric Tests in SPSS

One-sample case Binomial – tests whether the observed distribution of dichotomous variable (a variable that has two values only) is the same as that expected from a given binomial distribution.  The default value of p is 0.5.  You can change the value of p.  For example, a couple has given birth consecutively 8 baby girls, and you would like to test if their probability of given birth to baby girls is > 0.6 or >0.7, you can test the hypothesis by changing the default value of p in the SPSS programme.

One Sample Test Continued Kolmogorov-Smirnov – Compares the distribution of a variable with a uniform, normal, Poisson, or exponential distribution, Null hypothesis: the observed values were sampled from a distribution of that type.

More One Sample Tests Runs A run is defined as a sequence of cases on the same side of the cut point. (An uninterrupted course of some state or condition, for e.g. a run of good luck). You should use the Runs Test procedure when you want to test the hypothesis that the values of a variable are ordered randomly with respect to a cut point of your choosing (Default cut point: median.

Example: If you ask 20 students about how well they understand a lecture on a scale ranged from 1 to 5 (and the median in the class is 3). If you find that, the first 10 students give a value higher than 3 and the second 10 give a value lower than 3 (there are only 2 runs). 5445444545 2222112211 For random situation, there should be more runs (but will not be close to 20, which means they are ordered exactly in an alternative fashion; for example a value below 3 will be followed by one higher than it and vice versa). 2,4,1,5,1,4,2,5,1,4,2,4 The Runs Test is often used as a precursor to running tests that compare the means of two or more groups, including: The Independent-Samples T Test procedure. The One-Way ANOVA procedure. The Two-Independent-Samples Tests procedure. The Tests for Several Independent Samples procedure.

Runs Test siblings Test Valuea 1.00 Cases < Test Value 4 Total Cases 40 Number of Runs 7 Z -.654 Asymp. Sig. (2-tailed) .513 a. Median

Sample cases (Related Samples) McNemar – tests whether the changes in proportions are the same for pairs of dichotomous variables. McNemar’s test is computed like the usual chi-square test, but only the two cells in which the classification don’t match are used. Null hypothesis: People are equally likely to fall into two contradictory classification categories.

Related Sample Cases Sign test – tests whether the numbers of differences (+ve or –ve) between two samples are approximately the same. Each pair of scores (before and after) are compared. When “after” > “before” (+ sign), if smaller (- sign). When both are the same, it is a tie. Sign-test did not use all the information available (the size of difference), but it requires less assumptions about the sample and can avoid the influence of the outliers.

Sign Test To test the association between the following two perceptions Social workers help the disadvantaged and Social workers bring hopes to those in averse situation

More Related Sample Cases Wilcoxon matched-pairs signed-ranks test – Similar to sign test, but take into consideration the ranking of the magnitude of the difference among the pairs of values.  (Sign test only considers the direction of difference but not the magnitude of differences.) The test requires that the differences (of the true values) be a sample from a symmetric distribution (but not require normality). It’s better to run stem-and-leaf plot of the differences.

Two-sample case (independent samples) Mann-Whitney U – similar to Wilcoxon matched-paired signed-ranks test except that the samples are independent and not paired. It’s the most commonly used alternative to the independent-samples t test. Null hypothesis: the population means are the same for the two groups. The actual computation of the Mann-Whitney test is simple. You rank the combined data values for the two groups. Then you find the average rank in each group. Requirement: the population variances for the two groups must be the same, but the shape of the distribution does not matter.

Two Independent Sample Cases Kolmogorov-Smirnov Z– to test if two distributions are different.  It is used when there are only a few values available on the ordinal scale.  K-S test is more powerful than M-W U test if the two distributions differ in terms of dispersion instead of central tendency.

More Two Independent Sample Cases Wald-Wolfowitz Run – Based on the number of runs within each group when the cases are placed in rank order. Moses test of extreme reactions – Tests whether the range (excluding the lowest 5% and the highest 5%) of an ordinal variables is the same in the two groups.

K-sample case (Independent samples) Kruskal-Wallis One-way ANOVA – It’s more powerful than Chi-square test when ordinal scale can be assumed. It is computed exactly like the Mann-Whitney test, except that there are more groups. The data must be independent samples from populations with the same shape (but not necessarily normal).

K Related samples Friedman two-way ANOVA – test whether the k related samples could probably have come from the same population with respect to mean rank.

More K Related Samples Cases Cochran Q – determines whether it is likely that the k related samples could have come from the same population with respect to proportion or frequency of “successes” in the various samples. In other words, it requires dichotomous variables.

Other Interesting Use of Non-Parametrics Non-parametric regression Is a form of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.

Questions