Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9.

Similar presentations


Presentation on theme: "Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9."— Presentation transcript:

1 Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9

2 Objectives To define a standard set of descriptive statistics used to analyse continuous variables To examine the Explore facility in SPSS To introduce the analysis of a continuous variable according to values of a categorical variable, an example of bivariate analysis To introduce further SPSS Help options To reinforce the use of SPSS syntax

3 SPSS Descriptive Statistics Analyse/Descriptive Statistics/Frequencies Analyse/Descriptive Statistics/Explore Analyse/Descriptive Statistics/Descriptives

4 Exercise: continuous variable Generate a set of standard summary statistics for the continuous variable Age

5 Explore: Age

6 Explore: Descriptive Statistics StatisticStd. Error AGEMean31.78.315 95% Confidence Interval for Mean Lower Bound31.16 Upper Bound32.40 5% Trimmed Mean31.31 Median31.00 Variance154.614 Std. Deviation12.434 Minimum1 Maximum77 Range76 Interquartile Range20.00 Skewness.427.062 Kurtosis-.503.124 Descriptives

7 Exercise: Help What’s This? Results Coach Case Studies

8 Measures of central tendency Most commonly: –Mode –Median –Mean 5 per cent trimmed mean

9 The mode The mode is the most frequently occurring value in a dataset Suitable for nominal data and above Example: –The mode of the first most frequently used drug is Alcohol, with 717 cases, approximately 46 per cent of valid responses

10 Bimodal Describes a distribution Two categories have a large number of cases Example: –The distribution of Employment is bimodal, employment and unemployment having a similar number of cases and more cases than the other categories

11 The median The middle value when the data are ordered from low to high is the median Half the data values lie below the median and half above The data have to be ordered so the median is not suitable for nominal data, but is suitable for ordinal levels of measurement and above

12 Example: median Seizures of opium in Germany, 1994-1998 (Kilograms) Source: United Nations (2000). World Drug Report 2000 (United Nations publication, Sales No. GV.E.00.0.10). Year19941995199619971998 Seizure36154542286

13 Sort the seizure data in ascending order The middle value is the median; the median annual seizures of opium for Germany between 1994 and 1998 was 42 kilograms Year19951994199719961998 Seizure15364245286 Ranked: 1 2 3 4 5

14 The mean Add the values in the data set and divide by the number of values The mean is only truly applicable to interval and ratio data, as it involves adding the variables It is sometimes applied to ordinal data or ordinal scales constructed from a number of Likert scales, but this requires the assumption that the difference between the values in the scale is the same, e.g. between 1 and 2 is the same as between 5 and 6

15 Example: mean Seizures of opium in Germany, 1994-1998 Sample size = 5 36 + 15 + 45 + 42 + 286 = 424 424/5 = 84.8 Year19941995199619971998 Seizure36154542286

16 The 5 per cent trimmed mean The 5 per cent trimmed mean is the mean calculated on the data set with the top 5 per cent and bottom 5 per cent of values removed An estimator that is more resistant to outliers than the mean

17 95 per cent confidence interval for the mean An indication of the expected error (precision) when estimating the population mean with the sample mean In repeated sampling, the equation used to calculate the confidence interval around the sample mean will contain the population mean 95 times out of 100

18 Measures of dispersion The range The inter-quartile range The variance The standard deviation

19 The range A measure of the spread of the data Range = maximum – minimum

20 Quartiles 1 st quartile: 25 per cent of the values lie below the value of the 1 st quartile and 75 per cent above 2 nd quartile: the median: 50 per cent of values below and 50 per cent of values above 3 rd quartile: 75 per cent of values below and 25 per cent of the values above

21 Inter-quartile range IQR = 3 rd Quartile – 1 st Quartile The inter-quartile range measures the spread or range of the mid 50 per cent of the data Ordinal level of measurement or above

22 Variance The average squared difference from the mean Measured in units squared Requires interval or ratio levels of measurement

23 Standard deviation The square root of the variance Returns the units to those of the original variable

24 Example: standard deviation and variance Seizures of opium in Germany, 1994-1998 YearSeizureDeviationsSquared deviations 199436-48.82381.44 199515-69.84872.04 199645-39.81584.04 199742-42.81831.84 1998286201.240481.44 Total424051150.8 Count55 Mean84.8Variance10230 Standard deviation 101

25 Distribution or shape of the data The normal distribution Skewness: –Positive or right-hand skewed –Negative or left-hand skewed Kurtosis: –Platykurtic –Mesokurtic –Leptokurtic

26 Symmetrical data: the mean, the median and the mode coincide Mean Median Mode f(X) X The normal distribution

27 Right-hand skew (+) Right-hand skew: the extreme large values drag the mean towards them f(X) XModeMedianMean

28 Left-hand skew (-) Left-hand skew: the extreme small values drag the mean towards them ModeMeanMedianX f(X)

29 Bivariate analysis Continuous Dependent Variable Categorical Independent Variable

30 Explore

31 Explore: Options button

32 Explore: Plots button

33 Explore: Statistics button

34 GenderStatisticStd. Error AGEMaleMean31.43.340 95% Confidence Interval for Mean Lower Bound30.76 Upper Bound32.09 5% Trimmed Mean31.03 Median30.00 Variance144.286 Std. Deviation12.012 Minimum1 Maximum70 Range69 Interquartile Range19.00 Skewness.370.069 Kurtosis-.573.138 FemaleMean33.39.789 95% Confidence Interval for Mean Lower Bound31.84 Upper Bound34.94 5% Trimmed Mean32.77 Median33.00 Variance193.593 Std. Deviation13.914 Minimum14 Maximum77 Range63 Interquartile Range23.00 Skewness.472.138 Kurtosis-.602.376 Descriptives

35 Male Female

36 Boxplot of Age vs Gender Median Inter-quartile range Outlier

37 Syntax: Explore EXAMINE VARIABLES=age BY gender /ID=id /PLOT BOXPLOT HISTOGRAM /COMPARE GROUP /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL.

38 Summary Measures of central tendency Measures of variation Quantiles Measures of shape Bivariate analysis for a categorical independent variable and continuous dependent variable Histograms Boxplots


Download ppt "Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9."

Similar presentations


Ads by Google