Download presentation

Presentation is loading. Please wait.

Published byEmma Aleesha O’Brien’ Modified over 4 years ago

1
Programming in R Describing Univariate and Multivariate data

2
Describing univariate data In this session I will explain: Measures of central tendency and variation How to use figures to summarize a single variable (univariate data) How to create these in R.

3
Characteristics of numeric variables Center, or where do we find most of the data Distribution or shape, such as a bell shaped curve Variation or dispersion, how far spread out is the data, on average, how far are observations from the center? Outliers…have we got Bill Gates in our salary sample?

4
Measure of central tendency The “center” of a data set can be described using two different measures: 1. Mean – the commonly known “average” 2. Median – the midpoint

5
The mean The sample mean is sometimes called “x bar” Translation, add up all the values and divide by the number of values Usually, this is what people call the average x = n x x

6
The median The middle of the data is called the median –Sort the data from smallest to largest –If there are an odd number of observations, the middle number is the median –For even number of observations, the median is the midpoint between the two middle numbers

7
Median price= (7521+8139)/2 or 7830

8
The mode The most commonly occurring value –There can be more than one mode (multimodal, bimodal) –Sometimes there is no mode –For categorical variables, the mode is the only possible measure of central tendency

9
The median and the mode for table are both 62, while the mean is 61. Table may be a fairly symmetrical variable, with a slight left skew

10
Shape and skewness

11
Normal variables and standard deviation In a symmetric, bell shaped distribution, we are able to describe the entire distribution using only two numbers, the mean and the standard deviation The standard deviation is roughly the average distance that observations are from their mean

12
Calculating the standard deviation Standard deviation= Translation: Find the difference between the mean and each value in the dataset, square each difference, add these up, divide by the total number of values minus 1, then take the square root of that (or, get R to do it for you)

13
And we care because? The Empirical Rule For any normal curve, approximately 68% of the values fall within 1 standard deviation of the mean 95% of the values fall within 2 standard deviations of the mean 99.7% of the values fall within 3 standard deviations of the mean

14
Other things to describe How many modes? The range, minimum and maximum This histogram shows a bimodal shape. The data has a minimum of 1.67 minutes and a maximum of 4.93 minutes, for a range of 3.26 minutes. http://wps.aw.com/wps/media/objects/15/15719/projects/ch3_faithful/index.html

15
The five number summary Minimum, maximum, median, lower quartile and upper quartile MinimumMaximumMedianLower Quartile Upper Quartile The visual representation of the five number summary is the box or box and whiskers plot

16
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Interpreting box plots Outlier : any value more than 1.5 interquartile range(IQR) beyond closest quartile, shown with stars. ¼ of students slept between 3 and 6 hours, ¼ slept between 6 and 7, ¼ slept between 7 and 8 ¼ slept between 8 and 16

17
Other ways to visualize data When developing a visual representation of a single variable, the most common tools are – Histograms, Pie Charts, Bar Charts, Box Plots and Stem and Leaf Plots. We’ve already seen a histogram and a box plot

18
Pie charts Excellent for categorical variables with 5 or fewer categories.

19
Bar charts Can be used to illustrate categories, or means and medians by categories

20
How to produce these in R The function summary() to get mean, median, first quartile, third quartile, minimum, and maximum. table() to get frequency counts prop.table() to get percentages Plus, pie(), barplot(), hist(), and boxplot() to get pie, bar plots, histograms, and box plots, respectively.

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google