# LIS 570 Summarising and presenting data - Univariate analysis.

## Presentation on theme: "LIS 570 Summarising and presenting data - Univariate analysis."— Presentation transcript:

LIS 570 Summarising and presenting data - Univariate analysis

Summary  Basic definitions  Descriptive statistics  Describing frequency distributions  shape  central tendency  dispersion

Selecting analysis and statistical techniques De Vaus p133

 Values : the categories developed for a variable Nominal Ordinal Interval  Data : Observations (Measurements) taken on the units of analysis Basic Definitions

Basic definitions  Statistics - Methods for dealing with data Descriptive statistics summarise sample or census data Inferential statistics Draw conclusions about the population from the results of a random sample drawn from that population

Methods of analysis (De Vaus, 134)

Frequency Distributions  Ungrouped frequency distribution A list of each of the values of the variable The number of times and/or the percent of times each value occurs  Grouped frequency distribution A table or graph which shows the frequencies or percent for ranges of values

Frequency distributions

 Required information for frequency tables table number and title labels for the categories of the variables column headings the number of missing cases

Histograms

Describing Frequency Distributions  Shape  Symmetrical (Mirror image) Skewed Negative skew  tail toward lower scores Positive skew  tail toward higher scores  Dispersion  Central tendency

Shape - for ordinal or interval variables Positively skewed distribution Cluster towards the low end of the variable

Shape - for ordinal or interval variables Negatively skewed distribution Cluster towards the high end of the variable

Shape - Symmetry

Central Tendency  Typical or representative value or score Mean (arithmetic mean)( x ) Sum all the observations / n Use for interval variables when appropriate Median Value that divides the distribution so that an equal number of values are above the median and an equal number below Mode Value with the greatest frequency Uni-modal, bi-modal etc.

Mode  Best for nominal variables  Problems most common may not measure typicality may be more than one mode unstable - can be manipulated  Dispersion variation ratio (v) % of people not in the modal category

Median  Preferred for ordinal variables people are ranked from low to high median is the middle case the median category is the one that the middle person belongs to

Dispersion  The cth percentile of a set of numbers is a value such that c percent of the numbers fall below it and the rest fall above. The median is the 50th percentile The lower quartile is the 25th percentile The upper quartile is the 75th percentile  five number summary Median, quartiles and extremes

Dispersion Lower quartile Median Upper quartile

Boxplot 10864121416 Variable 1 Variable 2 Variable 3 Interquartile range IQR

Mean  uses the actual numerical values of the observations  most common measure of centre  makes sense only of interval or ratio data,  frequently computed for ordinal variables as well.

Dispersion  The standard deviation and variance measure spread about the mean as centre.  Variance mean of the squares of the deviations of the observations from the mean.  Standard deviation the positive square root of the variance

Example Data (6,7,5,3,4)  = 6+7+5+3+4=25 = 5 5 Variance (S 2 ) Calculate the mean for the variable Take each observation and subtract the mean from it Square the result from the above Add (sum) all the individual results Divide by n

Variance (s 2 ) Variance = sum of the sq deviations = 10 = 2 number of observation 5

Standard deviation (s)  Square root of the variance  2 = 1.4  an average deviation of the observations from their mean  influenced by outliers  best used with symmetrical distributions

Summary  Determine if variable is nominal, ordinal or interval Nominal Frequency tables Mode Ordinal Frequency tables (grouped frequency tables histogram Median and five number summary plus IQR Mode

Summary Interval Determine whether the distribution is skewed or symmetrical Compare median and mean Use the mean and the standard deviation if the distribution is not markedly skewed Otherwise use median and five number summary plus IQR Use the mode in addition if it adds anything.