Presentation on theme: "LIS 570 Summarising and presenting data - Univariate analysis."— Presentation transcript:
LIS 570 Summarising and presenting data - Univariate analysis
Summary Basic definitions Descriptive statistics Describing frequency distributions shape central tendency dispersion
Selecting analysis and statistical techniques De Vaus p133
Values : the categories developed for a variable Nominal Ordinal Interval Data : Observations (Measurements) taken on the units of analysis Basic Definitions
Basic definitions Statistics - Methods for dealing with data Descriptive statistics summarise sample or census data Inferential statistics Draw conclusions about the population from the results of a random sample drawn from that population
Frequency Distributions Ungrouped frequency distribution A list of each of the values of the variable The number of times and/or the percent of times each value occurs Grouped frequency distribution A table or graph which shows the frequencies or percent for ranges of values
Central Tendency Typical or representative value or score Mean (arithmetic mean)( x ) Sum all the observations / n Use for interval variables when appropriate Median Value that divides the distribution so that an equal number of values are above the median and an equal number below Mode Value with the greatest frequency Uni-modal, bi-modal etc.
Mode Best for nominal variables Problems most common may not measure typicality may be more than one mode unstable - can be manipulated Dispersion variation ratio (v) % of people not in the modal category
Median Preferred for ordinal variables people are ranked from low to high median is the middle case the median category is the one that the middle person belongs to
Dispersion The cth percentile of a set of numbers is a value such that c percent of the numbers fall below it and the rest fall above. The median is the 50th percentile The lower quartile is the 25th percentile The upper quartile is the 75th percentile five number summary Median, quartiles and extremes
Dispersion Lower quartile Median Upper quartile
Mean uses the actual numerical values of the observations most common measure of centre makes sense only of interval or ratio data, frequently computed for ordinal variables as well.
Dispersion The standard deviation and variance measure spread about the mean as centre. Variance mean of the squares of the deviations of the observations from the mean. Standard deviation the positive square root of the variance
Example Data (6,7,5,3,4) = 6+7+5+3+4=25 = 5 5 Variance (S 2 ) Calculate the mean for the variable Take each observation and subtract the mean from it Square the result from the above Add (sum) all the individual results Divide by n
Variance (s 2 ) Variance = sum of the sq deviations = 10 = 2 number of observation 5
Standard deviation (s) Square root of the variance 2 = 1.4 an average deviation of the observations from their mean influenced by outliers best used with symmetrical distributions
Summary Determine if variable is nominal, ordinal or interval Nominal Frequency tables Mode Ordinal Frequency tables (grouped frequency tables histogram Median and five number summary plus IQR Mode
Summary Interval Determine whether the distribution is skewed or symmetrical Compare median and mean Use the mean and the standard deviation if the distribution is not markedly skewed Otherwise use median and five number summary plus IQR Use the mode in addition if it adds anything.