Presentation on theme: "Analysing and Presenting Quantitative Data:"— Presentation transcript:
1Analysing and Presenting Quantitative Data: Inferential Statistics
2Objectives After this session you will be able to: Choose and apply the most appropriate statistical techniques for exploring relationships and trends in data (correlation and inferential statistics).
3Stages in hypothesis testing Hypothesis formulation.Specification of significance level (to see how safe it is to accept or reject the hypothesis).Identification of the probability distribution and definition of the region of rejection.Selection of appropriate statistical tests.Calculation of the test statistic and acceptance or rejection of the hypothesis.
4Hypothesis formulation Hypotheses come in essentially three forms.Those that:Examine the characteristics of a single population (and may involve calculating the mean, median and standard deviation and the shape of the distribution).Explore contrasts and comparisons between groups.Examine associations and relationships between groups.
5Specification of significance level – potential errors Significance level is not about importance – it is how likely a result is to be probably true (not by chance alone).Typical significance levels:p = 0.05 (findings have a 5% chance of being untrue)p = 0.01 (findings have a 1% chance of being untrue)[
7Selection of statistical tests –examples Research questionIndependent variableDependent variableStatistical testIs stress counselling effective in reducing stress levels?Nominal groups (experimental and control)Attitude scores (stress levels)Paired t-testDo women prefer skin care products more than men?Nominal (gender)Attitude scores (product preference levels)Mann Whitney U (data not normally distributed)Does gender influence choice of coach?Nominal (choice of coach)Chi-squareDo two interviewers judge candidates the same?NominalRank order scoresSpearman’s rho (data not normally distributed)Is there an association between rainfall and sales of face creams?Rainfall (ratio data)Ratio data (sales)Pearson Product Moment (data normally distributed)
8Nominal groups and quantifiable data (normally distributed) To compare the performance/attitudes of two groups, or to compare the performance/attitudes of one group over a period of time using quantifiable variables such as scores.Use paired t-test which compares the means of the two groups to see if any differences between them are significant.Assumption: data are normally distributed.
10Data outputs: test for normality Case Processing SummaryCasesValidMissingTotalNPercentStressTime19298.9%11.1%93100.0%StressTime2Tests of NormalityKolmogorov-Smirnov(a)Shapiro-WilkStatisticdfSig.StressTime1.09592.041.983.289StressTime2.096.034.985.363a Lilliefors Significance Correction
1295% Confidence Interval of the Difference Statistical outputPaired Samples StatisticsMeanNStd. DeviationStd. Error MeanPair 1StressTime192.36366StressTime28.7500.33316Paired Samples TestPaired DifferencesdfSig. (2-tailed)MeanStd. DeviationStd. Error Mean95% Confidence Interval of the DifferencetLowerUpperPair 1Stress Time 1 Stress Time 2.221277.27091.000
13Nominal groups and quantifiable data (normally distributed) To compare the performance/attitudes of two groups, or to compare the performance/attitudes of one group over a period of time using quantifiable variables such as scores.Use Mann-Whitney U.Assumption: data are not normally distributed.
16Kolmogorov-Smirnov(a) Statistical outputTests of NormalitySexKolmogorov-Smirnov(a)Shapiro-WilkStatisticdfSig.Attitude1.29832.000.8152.16768.909Ranksa Lilliefors Significance CorrectionTest Statistics(a)AttitudeMann-Whitney UWilcoxon WZ-4.419Asymp. Sig. (2-tailed).000a Grouping Variable: SexRanksRanksSexNMean RankSum of RanksAttitude13231.8926859.26Total100
17Association between two nominal variables We may want to investigate relationships between two nominal variables – for example:Educational attainment and choice of career.Type of recruit (graduate/non-graduate) and level of responsibility in an organization.Use chi-square when you have two or more variables each of which contains at least two or more categories.
19Statistical output Chi-Square Tests Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided)Exact Sig. (1-sided)Pearson Chi-Square.382(b)1.536Continuity Correction(a).221.638Likelihood Ratio.383Fisher's Exact Test.556.320Linear-by-Linear Association.380.537N of Valid Cases201a Computed only for a 2x2 tableb 0 cells (.0%) have expected count less than 5. The minimum expected count isSymmetric MeasuresValueApprox. Sig.Nominal by NominalPhi.044.536Cramer's VN of Valid Cases201a Not assuming the null hypothesis.b Using the asymptotic standard error assuming the null hypothesis.
20Correlation analysisCorrelation analysis is concerned with associations between variables, for example:Does the introduction of performance management techniques to specific groups of workers improve morale compared to other groups? (Relationship: performance management/morale.)Is there a relationship between size of company (measured by size of workforce) and efficiency (measured by output per worker)? (Relationship: company size/efficiency.)Do measures to improve health and safety inevitably reduce output? (Relationship: health and safety procedures/output.)
21Perfect positive and perfect negative correlations
23Strength of association based upon the value of a coefficient Correlation figureDescriptionNone Negligible Weak Moderate Strong Very strong Perfect
24Calculating a correlation for a set of data We may wish to explore a relationship when:The subjects are independent and not chosen from the same group.The values for X and Y are measured independently.X and Y values are sampled from populations that are normally distributed.Neither of the values for X or Y is controlled (in which case, linear regression, not correlation, should be calculated).
25Associations between two ordinal variables For data that is ranked, or in circumstances where relationships are non-linear, Spearman’s rank-order correlation (Spearman’s rho), can be used.
27Statistical output Correlations MrJones MrsSmith Spearman's rho Correlation Coefficient1.000.779(**)Sig. (2-tailed)..000N30** Correlation is significant at the 0.01 level (2-tailed).
28Association between numerical variables We may wish to explore a relationship when there are potential associations between, for example:Income and age.Spending patterns and happiness.Motivation and job performance.Use Pearson Product-Moment (if the relationships between variables are linear).If the relationship is or -shaped, use Spearman’s rho.
31Statistical output Descriptive Statistics Mean Std. Deviation N Rainfall48.1711.22830Sales132.4728.311CorrelationsRainfallSalesPearson Correlation1-.813(**)Sig. (2-tailed).000N30** Correlation is significant at the 0.01 level (2-tailed).
32SummaryInferential statistics are used to draw conclusions from the data and involve the specification of a hypothesis and the selection of appropriate statistical tests.Some of the inherent danger in hypothesis testing is in making Type I errors (rejecting a hypothesis when it is, in fact, true) and Type II errors (accepting a hypothesis when it is false).For categorical data, non-parametric statistical tests can be used, but for quantifiable data, more powerful parametric tests need to be applied. Parametric tests usually require that the data are normally distributed.