Presentation is loading. Please wait.

Presentation is loading. Please wait.

GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT - (1)DATA COLLECTION (2)DATA DESCRIPTION (3)DATA ANALYSIS.

Similar presentations


Presentation on theme: "GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT - (1)DATA COLLECTION (2)DATA DESCRIPTION (3)DATA ANALYSIS."— Presentation transcript:

1 GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT - (1)DATA COLLECTION (2)DATA DESCRIPTION (3)DATA ANALYSIS II

2 Agenda Correlations Correlation Coefficient: a quantitative measure of linear correlations Correlation Strength versus Statistical Significance Simple Regression Analyses Quantitative Research Project – Recap Course Conclusion

3 Correlations Is CGPA related in some way to total hours studied (H)? Statistically, is the mean value of CGPA varying in some way with H? Remember, we need to account for the fact that they each tend to deviate from their true mean randomly. The “correlation coefficient” for a set of observations is a function of how much each of the observed values deviate from the sample means adjusted for (i.e., not explained by) random deviation

4 Correlation Images "Correlation examples2" by Denis Boigelot, original uploader was Imagecreator - Own work, original uploader was Imagecreator. Licensed under CC0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Correlation_examples2.svg#/media/File:Correlation_examples2.svg

5 Representing Linear Correlation 1.For a population, the typical notation is: ρ (H,C) = corr(H,C) = cov (H,C)/σ H σ C = 1/(n-1) * Σ [(H-μ H )(C- μ C )]/ σ H σ C 2.For a sample from that same population (changing the notation only): r (H, C) = 1/(n-1) * Σ [(H-avgH)(C- avgC)]/ s H s C Excel program to calculate (2) above: = CORREL (data array (H), data array (CGPA)), OR = PEARSON (data array (H), data array (CGPA))

6 Population Correlation Coefficient The Pearson correlation coefficient (numbers above images) measures only the linear relationship between two variables

7 Correlation Coefficient (= 0.816) versus Visual Inspection of Data "Anscombe's quartet 3" by Anscombe.svg: Schutzderivative work (label using subscripts): Avenue (talk) - Anscombe.svg. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Anscombe%27s_quartet_3.svg#/media/File:Anscombe%27s_qu artet_3.svg

8 Correlations and Predictions Presence of a (linear) correlation may offer predictive information that may be useful It may (but may not) suggest causality to be examined further - “correlation does not imply causation” (when there is no control group) It may suggest policy considerations (policy action, spillover effects, consequences)

9 10-case Study Raw DataScatter Plot with Linear Trend CaseCGPA Total Hours Studied 17.6735 26.8329 34.1723 47.6750 55.0032 64.1722 75.0017 87.3340 96.8344 106.3338

10 Correlation for 10-case Study = CORREL (CGPA, HOURS) = PEARSON (CGPA, HOURS) = 0.7944 R-squared = 0.7944 * 0.7944 = 0.63 If CGPA is a linear function of HOURS and CGPA is normally distributed, then R-squared gives the “explained variance” or 63% if the variation in CGPA can be “explained” by variation in HOURS

11 Strength versus Significance A “strong” correlation may or may not be significant A “weak” correlation may or may not be significant Key is the size of the sample – for small samples a strong correlation may still be by chance; for large samples it is easy to achieve significance for weak correlations

12 T-test for Significance Null Hypothesis: Ho: r = 0 Alternative Hypothesis: Ha: r ≠ 0 (i.e., there is a positive or negative correlation that is significant) Correlation coefficient ( r) Adjust by weighting (dividing) r by its standard error = se(r) = [(1-r 2 )/(n-2)] 1/2 T-stat* = r/se(r) Compare t-stat* to critical t-value for (n-2) degrees of freedom and chosen significance level

13 10-case Study Correlation coefficient ( r) for our study = 0.79 se(r) = [1-0.63/(10-2)] 1/2 = [0.046] 1/2 = 0.214 T-stat = r/se(r) = 0.79/0.214 = 3.69 For 8 df, two-tailed test @ 95% Confidence, critical t-value = T.INV.2T(.05,8) = 2.306 3.69 > 2.306  the correlation would NOT occur by chance 95% of the time, therefore reject null hypothesis  conclude that hours studied is (positively) correlated with CGPA

14 Representing Linear Relationships Since CGPA and HOURS appear to be strongly positively correlated (but it may only be an artifact of the small sample size) and statistically significant (despite being a small sample) then examine relationship more closely General linear relationship: Y = mX + b for Y dependent variable, X independent or explanatory variable, and b some constant

15 Graphically Locate coordinates (2, 4) that is, X = 2, Y = 4 Locate coordinates (3, 5) When X increases by +1 (from 2 to 3) how much does Y increase by? (=m) When X = 0, what does Y equal? (= b) Therefore model is Y = 1*X + 2

16 CGPA and HOURS For the linear trend line, CGPA = Intercept (b) + coefficient (m) * HOURS CGPA = 2.6 + 0.105*HOURS For every +1 hour studied per month, by how much does CGPA increase? How did we obtain the linear trend line?

17 Regression Analysis - Intuition The estimated linear trend line specifies the linear relationship that “best fits” the data A “best fit” model is one that minimizes the amount an observation deviates from the hypothesized model “Best fit” here means to minimize the sum of the squared deviations between the data points and the linear trend line (model) “Linear Least Squares Regression Model”

18 Regression Analysis - Mechanics In Excel: “Data Analysis”  “Regression” Coefficients: values of “b” (intercept) and “m” coefficient on explanatory variable Standard Error, t-stat, P-value and CI (95%) for each estimate

19 Data Interpretation (again) From the Regression Output we know: CGPA = 2.6 + 0.105*HOURS For every +1 hour studied, CGPA on graduation increases by 0.105 Graduating students with +1 grade point higher than other graduating students, studied on average + 9.52 more hours per month (9.52 = 1 / 0.105)

20 P-Value Approach to Statistical Significance of Total Hours Studied H 0 : coefficient on HOURS = 0; H A : ≠ 0 P-value approach: P-value = 0.0061 <.05 or the probability this coefficient is obtained purely by chance is less than 5%  reject H 0  data support H A Note: for a 1-sided test (e.g., coefficient > 0) divide reported P-value by 2

21 Critical Value Approach to Statistical Significance of Total Hours Studied H 0 : coefficient on HOURS = 0; H A : ≠ 0 Critical value approach: critical value = T.INV.2T (0.05, 9) = 2.622 t-stat = 3.699 > 2.622  reject H 0

22 A Quantitative Research Project: Recapitulation Research Topic: Academic Performance Research Questions: How well do graduating students perform academically? What explains that performance? Measure “academic performance” by graduating CGPA Research Design: Cross-sectional analysis of graduating students in a given year Data Collection: Survey (a random sample of 10) students graduating in 2014 Data Description: Describe the data with basic statistics Data Analysis: Reasons for attending university and performance; Total hours studied and CGPA

23 Research in Public Policy Excerpted from Morçöl and Ivanova (2010) Categories of MethodsQuantitative OrientationQualitative Orientation Empirical Inquiry - Design Methods Experimental, Cross- sectional, Longitudinal Case study Empirical Inquiry - Data Collection Methods Surveys, Secondary DataQualitative (long, in-depth, or semi-structured) Interviews Empirical Inquiry - Data Analysis Methods Statistical, Regression, or Time-series Analyses (Computer-assisted) Qualitative Data Analyses Empirical Inquiry - Combined Methods Game Theory, Simulations, Systems Analysis, Meta- Analyses, Network Analyses Case study, Legal Analyses, Archival, Ethnography, Grounded Theory, Textual Analyses Methods of Decision Making and Planning Cost-benefit, Decision Analyses, Linear Programming Brainstorming, Delphi

24 Quality of Quantitative (Qualitative) Research: Reliability, Relevance, Validity Reliability: can we replicate the research results? (are the results dependable?) Relevance: are results of practical significance? (are results trustworthy or authentic?) Construct Validity: do quantities observed reflect research variables of interest? (is there objectivity?) Internal Validity: is there a causal relationship between the independent and dependent variables? (is there credibility?) External Validity: can we generalize beyond the one study? (are results transferable?)

25 Achieving Learning Outcomes Basic user familiarity requires familiarity with – research ethics – existing data sets – the collection of qualitative and quantitative data – data measurement – sampling – advantages and disadvantages of different research methods – descriptive and inferential statistics

26 Learning Outcomes? Understand key concepts in research Apply critical analytical skills to published research Understand the application, value and limits of quantitative and qualitative research methodologies and techniques / tools Develop skills in devising and designing research methods suitable for different policy contexts and for rigorous analysis Provide a grounding in ethical issues related to: – academic research – the role of the public servant as a custodian of data and information balancing the public’s right to know against the personal data and information which an individual citizen has a right to be kept confidential

27 Good Luck! And THANK YOU… …for the journey, …for your patience, …your curiosity, …your humour!


Download ppt "GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT - (1)DATA COLLECTION (2)DATA DESCRIPTION (3)DATA ANALYSIS."

Similar presentations


Ads by Google