Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing Quantitative Variables Presentation 3.

Similar presentations


Presentation on theme: "Describing Quantitative Variables Presentation 3."— Presentation transcript:

1 Describing Quantitative Variables Presentation 3

2 What is a quantitative variable? Quantitative variables are recorded as numerical values. They are measurements or counts taken on each unit in the sample. Quantitative variables are recorded as numerical values. They are measurements or counts taken on each unit in the sample. Consider the following examples: Consider the following examples: 1. Age of a person. 1. Age of a person. 2. Number of times a person sees a dentist in a year. 2. Number of times a person sees a dentist in a year. 3. Weight of a dog. 3. Weight of a dog. 4. Number of credits a student takes in a semester. 4. Number of credits a student takes in a semester. Note: Quantitative variables can be either continuous or discrete. Continuous variables can take on any numerical value in a range. Discrete variables can take on only fixed values.

3 What is not a quantitative variable… Numbers that represent categories are NOT quantitative variables. Numbers that represent categories are NOT quantitative variables. Your SSN# for example is a label, not a measurement. Your SSN# for example is a label, not a measurement. Helpful Hint: When considering if something is a quantitative variable consider if an average of the variable is meaningful. The average height in a sample would certainly be of interest. The average SSN# would not. Helpful Hint: When considering if something is a quantitative variable consider if an average of the variable is meaningful. The average height in a sample would certainly be of interest. The average SSN# would not.

4 The Four Features of Quantitative Data Location: What is the center or average value? Location: What is the center or average value? Spread: What is the spread or variability of the values? Do they fall closely around the center or far apart? Spread: What is the spread or variability of the values? Do they fall closely around the center or far apart? Shape: What is the shape of the data? Bell- shaped or skewed? Symmetric? Shape: What is the shape of the data? Bell- shaped or skewed? Symmetric? Outliers: Are there any extreme or unusual observations? Outliers: Are there any extreme or unusual observations?

5 Tools to Describe Quantitative Data Five Number Summary: Table that consists of the minimum, first and third quartiles, median, and the maximum values of a sample. Used to describe both the center and spread of values. Five Number Summary: Table that consists of the minimum, first and third quartiles, median, and the maximum values of a sample. Used to describe both the center and spread of values. Graphs: Dotplots, Histograms, and Boxplots are useful to illustrate location, spread, and shape of data, as well as identify outliers. Graphs: Dotplots, Histograms, and Boxplots are useful to illustrate location, spread, and shape of data, as well as identify outliers. Numerical Summaries: All five members of the five number summary, in addition to the sample mean and standard deviation. Numerical Summaries: All five members of the five number summary, in addition to the sample mean and standard deviation.

6 How to Construct a Five Number Summary: Finding the Median  Consider the following data set: Exercise Hours per week for 11 Men. Sort the values in increasing order. Data: 0, 1, 1, 2, 5, 7, 8, 10, 11, 14, 25  Median is the middle value in the data, such that half the observations are greater and half are less. The median is the middle value for an odd number of observations, or the average of the middle two values for an even number of observations.  In this case there are 11 observations so the median is the middle, or 6 th number, which happens to be 7.

7  The median divides the data into halves, and the quartiles further divide the data into quarters.  The first quartile (Q1) is the median of the lower half of the data, the third quartile is the median of the upper half (Q3). Data: 0, 1, 1, 2, 5, 7, 8, 10, 11, 14, 25  The upper and lower parts of the data set are highlighted above. We ignore the median value 7 in our calculations. Q1 = 1 Q3 = 11 How to Construct a Five Number Summary: Finding the Quartiles

8 How to Construct a Five Number Summary: Min and Max  The last part of the five number summary is the minimum and maximum. It is easy to see that the min value was 0, and the max was 24. The summary is usually displayed in a table as follows: Five Number Summary: Outline Median Q1Q3 MinMax Five Number Summary: Our Example

9 Interpreting the Five Number Summary Sex Heigh t Hand Span Female Male Male Female6418 Male Female5920 Male7323 Male Female6521 ……… Min Q1Median Q3 Max 25% FNS: Hand Span for 89 Women From “handheight” Data Set in Text CD:

10 Interpreting the Five Number Summary 50% of the sample falls below the median, and fifty percent of the sample falls above the median. 50% of the sample falls below the median, and fifty percent of the sample falls above the median. 50% of the sample falls between Q1 and Q3. 50% of the sample falls between Q1 and Q3. 25% of the sample falls below Q__. 25% of the sample falls below Q__. 25% of the sample falls above Q__. 25% of the sample falls above Q__. 75% of the sample falls below Q__. 75% of the sample falls below Q__. 75% of the sample falls above Q__. 75% of the sample falls above Q__.

11 Example: What is the five number summary for this data? Number of hours spent on internet per week: Number of hours spent on internet per week: 12, 4, 16, 18, 1, 6, 10, 8

12 Graphs for Quantitative Variables  There are 4 main graphs for quantitative variables. 1. Stem-and-Leaf Plot 2. Dotplot 3. Histogram 4. Boxplot Show individual data points. Okay for small data sets. Better for large data sets. Most commonly used.

13 Example of Stem-and-Leaf Plot and Dotplot Using Hand Span Data for Women Stem-and-Leaf Display: Hand Span Female Stem-and-leaf of Handspan N = Graphs for Quantitative Variables

14 Creating a Stem-and-Leaf Plot 1. Determine the stem values: All but the last of the displayed digits of a number. It is reasonable to have between 6 and 15 stems defining equally spaced intervals. 2. Attach a “leaf” for each individual to the appropriate stem. This is the last displayed digit of the number. 3. At each stem value, put leaves in increasing order. Graphs for Quantitative Variables

15 Example: Create a stem and leaf plots for the following sample s: (a) 75, 84, 68, 95, 87, 93, 56, 87, 83, 82, 80, 62, 91, 84 (a) 75, 84, 68, 95, 87, 93, 56, 87, 83, 82, 80, 62, 91, 84 |5| 6 |5| 6 |6| 2 OR |6| 2 8 |6| 8 |7| 55 |7| |8| |7| 55 |9| 135 |8| |8| 77 |9| 13 |9| 5 (b)

16 Histogram Horizontal Axis: Determine equally spaced intervals to divide the data. (5-15 intervals) Vertical Axis: Frequencies or relative frequencies (percentages). Graphs for Quantitative Variables

17 How to Draw a Boxplot Step 1: Label either a vertical axis or a horizontal axis with numbers from min to max of the data. Step 2: Draw box with lower end at Q1 and upper end at Q3. Step 3: Draw a line through the box at the median. Step 4: Draw a line from Q1 end of box to smallest data value that is not further than 1.5  (Q3- Q1) from Q1. Draw a line from Q3 end of box to largest data value that is not further than 1.5  (Q3- Q1) from Q3. Step 5: Mark data points further than 1.5  IQR from either edge of the box with an asterisk. Points represented with asterisks are considered to be “outliers”. Graphs for Quantitative Variables

18 Boxplot Min Q1 Median Q3 Max Graphs for Quantitative Variables NOTE: Min=16 is greater than Q1-1.5(Q3-Q1) = (2) =15.5 SO…stop at Min Max=23 is less than Q3+1.5(Q3-Q1) = (2) = 23.5 So…stop at Max.

19 Shape of Data We can use a graphs to look at the shape of the quantitative variable distribution. We can use a graphs to look at the shape of the quantitative variable distribution. An example of a bell-shaped or normal distribution which appear often in nature: An example of a bell-shaped or normal distribution which appear often in nature:

20 Skewed Distributions Example: Exam Scores Scores from an easy exam, skewed left. Scores from a hard exam, skewed right. Skewed data often occurs when the variable is naturally bounded in some way and a great many units fall close to the boundary. For example, the variable number of pets.

21 Numerical Summaries: Location  Median: The middle value such that half the observations are greater and half less.  Mean: The average value in the data set. The mean equals the sum of all observations divided by the number of observations. Symbol: = sample mean  If the distribution is symmetric the mean will equal the median.  If the data is right skewed, the mean is ___________ than the median.  If the data is left skewed, the mean is ___________ than the median.

22 Numerical Summaries: Spread  Range: The distance between the most extreme values in the data set. Range = Maximum – Minimum.  Interquartile Range (IQR): The distance between the first and third quartiles. IQR = Q3 – Q1  Standard Deviation: Approximately the average distance a value falls from the mean. Symbol = s = sample standard deviation Symbol = s = sample standard deviation Here is the formula for the standard deviation square, which is called Variance of the sample. Here is the formula for the standard deviation square, which is called Variance of the sample.

23 Example – Calculate Variance by hand 1.Find difference between each data point and mean. ______, ______, ______, ______, ______ 2.Square the differences, and add them up. ______+______+ ______+ ______+ ______=_______ 3.Divide by one less than the number of data points and you will get the result. variance = _______/________ =_________ Consider we ask 5 persons how many high school friends they have and we plotted their responses below. What is the sample variance?

24 Outliers  Definition: An outlier is a data point that is not consistent with the bulk of the data. Possible Reasons for Outliers: 1. An error was made while taking the measurement or entering it into the computer. 2. The individual belongs to a different group than the bulk of individuals measured. 3. The outlier is a legitimate, though extreme data value.

25 Identifying Outliers Graphs are one of the best methods to identify outliers. In the case of the boxplot below the outlying observation is indicated by an asterisk. Boxplot Outlier Rule: Any observation which is less than 1.5*IQR below Q1 or greater than 1.5*IQR above Q3 is considered an outlier and receives an asterisk.

26 Resistant Statistics  Resistant statistics are those that are “resistant” to the influence of outliers. Resistant: Median, IQR Non-Resistant: Mean, Std. Deviation, and Range

27 The most appropriate measure of variability depends on … the shape of the data’s distribution. If data are symmetric, with no serious outliers, use range and standard deviation. If data are symmetric, with no serious outliers, use range and standard deviation. If data are skewed, and/or have serious outliers, use IQR. If data are skewed, and/or have serious outliers, use IQR.

28 The Empirical Rule The Empirical Rule states that for any bell-shaped curve, approximately The Empirical Rule states that for any bell-shaped curve, approximately 68% of the values fall within 1 standard deviation of the mean in either direction. 68% of the values fall within 1 standard deviation of the mean in either direction. (i.e. plus or minus s) (i.e. plus or minus s) 95% of the values fall within 2 standard deviation of the mean in either direction. 95% of the values fall within 2 standard deviation of the mean in either direction. 99.7% of the values fall within 3 standard deviation of the mean in either direction. 99.7% of the values fall within 3 standard deviation of the mean in either direction.


Download ppt "Describing Quantitative Variables Presentation 3."

Similar presentations


Ads by Google