1 Overview of Major Statistical Tools UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D.

Slides:



Advertisements
Similar presentations
CHOOSING A STATISTICAL TEST © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Advertisements

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Ordinal Data. Ordinal Tests Non-parametric tests Non-parametric tests No assumptions about the shape of the distribution No assumptions about the shape.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Chapter Eighteen MEASURES OF ASSOCIATION
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Chapter 19 Data Analysis Overview
Lecture 23 Multiple Regression (Sections )
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Simple Linear Regression Analysis
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Nonparametric or Distribution-free Tests
Inferential Statistics
Choosing Statistical Procedures
Non-parametric Dr Azmi Mohd Tamil.
Hypothesis Testing:.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Overview of Major Statistical Tools UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D. 1.
Statistical Analysis Statistical Analysis
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
EDLD 6392 Advanced Topics in Statistical Reasoning Texas A&M University-Kingsville Research Designs and Statistical Procedures.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Academic Research Academic Research Dr Kishor Bhanushali M
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
ANALYSIS PLAN: STATISTICAL PROCEDURES
Lecture 10: Correlation and Regression Model.
Angela Hebel Department of Natural Sciences
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Hypothesis Testing Procedures Many More Tests Exist!
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Dr.Rehab F.M. Gwada. Measures of Central Tendency the average or a typical, middle observed value of a variable in a data set. There are three commonly.
Inferential Statistics Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
PHL Test of Significance: General Purpose The idea of significance testing. If we have a basic knowledge of the underlying distribution of a variable,
Non – Parametric Test Dr. Anshul Singh Thapa.
What are their purposes? What kinds?
Understanding Statistical Inferences
Presentation transcript:

1 Overview of Major Statistical Tools UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D.

2 Topics to be covered: The Normal Distribution The Normal Distribution Parametric vs nonparametric statistics Parametric vs nonparametric statistics Correlation Correlation Correlational vs experimental research Correlational vs experimental research Analysis of variance Analysis of variance Regression analysis Regression analysis Factor analysis Factor analysis

3 Check this out! Electronic Statistics Textbook Electronic Statistics Textbook Much of the content of this lecture is drawn from this source. StatSoft, Inc. (2008). Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB:

4 The Normal Distribution

5 Parametric Versus Nonparametric Statistics Parametric statistical tests require that we have a basic knowledge of the underlying distribution of a variable, then we can make predictions about how, in repeated samples of equal size, a particular statistic will "behave," that is, how it is distributed.

6 Parametric Versus Nonparametric Statistics For example, if we draw 100 random samples of 100 adults each from the general population, and compute the mean height in each sample, then the distribution of the standardized means across samples will likely approximate the normal distribution (to be precise, Student's t distribution with 99 degrees of freedom). Now imagine that we take an additional sample in a particular city where we suspect that people are taller than the average population. If the mean height in that sample falls outside the upper 95% tail area of the t distribution then we conclude that, indeed, the people of this city are taller than the average population.

7 Parametric Versus Nonparametric Statistics Are most variables normally distributed? In the example just given we relied on our knowledge that, in repeated samples of equal size, the standardized means (for height) will be distributed following the t distribution (with a particular mean and variance). However, this will only be true if in the population the variable of interest (height in our example) is normally distributed, that is, if the distribution of people of particular heights follows the normal distribution (the bell-shape distribution).

8 Parametric Versus Nonparametric Statistics For many variables of interest, we simply do not know for sure that this is the case. For example, is income distributed normally in the population? -- probably not. The incidence rates of rare diseases are not normally distributed in the population, the number of car accidents is also not normally distributed, and neither are very many other variables in which a researcher might be interested.

9 Parametric Versus Nonparametric Statistics The Issue of Sample Size Another factor that often limits the applicability of tests based on the assumption that the sampling distribution is normal is the size of the sample of data available for the analysis (sample size; n). We can assume that the sampling distribution is normal even if we are not sure that the distribution of the variable in the population is normal, as long as our sample is large enough (e.g., 100 or more observations).

10 Parametric Versus Nonparametric Statistics The Issue of Sample Size (continued) However, if the sample is very small, then those tests can be used only if we are sure that the variable is normally distributed, and there is no way to test this assumption if the sample is small.

11 Parametric Versus Nonparametric Statistics Problems in Measurement Applications of tests that are based on the normality assumptions are further limited by a lack of precise measurement. For example, let us consider a study where grade point average (GPA) is measured as the major variable of interest. Is an A average twice as good as a C average? Is the difference between a B and an A average comparable to the difference between a D and a C average? Somehow, the GPA is a crude measure of scholastic accomplishments that only allows us to establish a rank ordering of students from "good" students to "poor" students.

12 Parametric Versus Nonparametric Statistics Problems in Measurement (continued) Most common statistical techniques such as analysis of variance (and t- tests), regression, etc. assume that the underlying measurements are at least of interval, meaning that equally spaced intervals on the scale can be compared in a meaningful manner (e.g, B minus A is equal to D minus C). However, as in our example, this assumption is very often not tenable, and the data rather represent a rank ordering of observations (ordinal) rather than precise measurements. intervalordinalintervalordinal

13 Parametric Versus Nonparametric Statistics The need is evident for statistical procedures that allow us to process data of "low quality," from small samples, on variables about which nothing is known (concerning their distribution).

14 Parametric Versus Nonparametric Statistics Specifically, nonparametric methods were developed to be used in cases when the researcher knows nothing about the parameters of the variable of interest in the population (hence the name nonparametric). In more technical terms, nonparametric methods do not rely on the estimation of parameters (such as the mean or the standard deviation) describing the distribution of the variable of interest in the population.

15 Type Question Parametric tests Nonparametric tests Differences between independent groups t-test for independent samples t-test for independent samples analysis of variance analysis of variance Wald-Wolfowitz runs test Wald-Wolfowitz runs test Mann-Whitney U test Mann-Whitney U test Kolmogorov-Smirnov two-sample test Kolmogorov-Smirnov two-sample test Kruskal-Wallis analysis of ranks Kruskal-Wallis analysis of ranks Median test Median test Differences between dependent groups t-test for dependent samples t-test for dependent samples Sign test Sign test Wilcoxon's matched pairs test Wilcoxon's matched pairs test Relationships between variables correlation coefficient correlation coefficient Spearman R Spearman R Kendall Tau Kendall Tau Coefficient Gamma Coefficient Gamma Chi-square Chi-square

16 Correlation Correlation is a measure of the relation between two or more variables. The measurement scales used should be at least interval scales, but other correlation coefficients are available to handle other types of data. Correlation coefficients can range from to The value of represents a perfect negative correlation while a value of represents a perfect positive correlation. A value of 0.00 represents a lack of correlation. interval scalesnegative correlationpositive correlationinterval scalesnegative correlationpositive correlation

17

18 Correlational Versus Experimental Research In correlational research we do not (or at least try not to) influence any variables but only measure them and look for relations (correlations) between some set of variables, such as blood pressure and cholesterol level. In correlational research we do not (or at least try not to) influence any variables but only measure them and look for relations (correlations) between some set of variables, such as blood pressure and cholesterol level. In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables; for example, a researcher might artificially increase blood pressure and then record cholesterol level. In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables; for example, a researcher might artificially increase blood pressure and then record cholesterol level.

19 Correlational Versus Experimental Research Data analysis in experimental research comes down to calculating "correlations" between variables, specifically, those manipulated and those affected by the manipulation. However, experimental data may potentially provide qualitatively better information: Only experimental data can demonstrate causal relations between variables.

20 Correlational Versus Experimental Research Example: If we found that whenever we change variable A then variable B changes, then we can conclude that "A influences B." Data from correlational research can only be "interpreted" in causal terms based on some theories that we have, but correlational data cannot conclusively prove causality.

21 Analysis of Variance In general, the purpose of analysis of variance (ANOVA) is to test for significant differences between means. If we are only comparing two means, then ANOVA will give the same results as the t test for independent samples (if we are comparing two different groups of cases or observations), or the t test for dependent samples (if we are comparing two variables in one set of cases or observations). t test for independent samplest test for dependent samplest test for independent samplest test for dependent samples

22 Analysis of Variance Why the name analysis of variance? It may seem odd to you that a procedure that compares means is called analysis of variance. However, this name is derived from the fact that in order to test for statistical significance between means, we are actually comparing (i.e., analyzing) variances.

23 Regression Analysis The general purpose of multiple regression is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable.

24 Regression Analysis Example A real estate agent might record for each listing the size of the house (in square feet), the number of bedrooms, the average income in the respective neighborhood according to census data, and a subjective rating of appeal of the house. Once this information has been compiled for various houses it would be interesting to see whether and how these measures relate to the price for which a house is sold. A real estate agent might record for each listing the size of the house (in square feet), the number of bedrooms, the average income in the respective neighborhood according to census data, and a subjective rating of appeal of the house. Once this information has been compiled for various houses it would be interesting to see whether and how these measures relate to the price for which a house is sold. One might learn that the number of bedrooms is a better predictor of the price for which a house sells in a particular neighborhood than how "pretty" the house is (subjective rating). One may also detect "outliers," that is, houses that should really sell for more, given their location and characteristics. One might learn that the number of bedrooms is a better predictor of the price for which a house sells in a particular neighborhood than how "pretty" the house is (subjective rating). One may also detect "outliers," that is, houses that should really sell for more, given their location and characteristics. Example is of a “hedonic price index.” Example is of a “hedonic price index.”

25 Regression Analysis In the social and natural sciences multiple regression procedures are very widely used in research. In general, multiple regression allows the researcher to ask the general question "what is the best predictor of...". For example, educational researchers might want to learn what are the best predictors of success in high-school. Psychologists may want to determine which personality variable best predicts social adjustment. Sociologists may want to find out which of the multiple social indicators best predict whether or not a new immigrant group will adapt and be absorbed into society.

26 Factor Analysis The main applications of factor analytic techniques are: to reduce the number of variables and to reduce the number of variables and to detect structure in the relationships between variables, that is to classify variables. to detect structure in the relationships between variables, that is to classify variables. Hence, factor analysis is applied as a data reduction or structure detection method.

27 The End. Thank You !