Presentation on theme: "Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative Data?"— Presentation transcript:
Agresti/Franklin Statistics, 1 of 63 Section 2.4 How Can We Describe the Spread of Quantitative Data?
Agresti/Franklin Statistics, 2 of 63 Measuring Spread: Range Range: difference between the largest and smallest observations
Agresti/Franklin Statistics, 3 of 63 Measuring Spread: Standard Deviation Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations
Agresti/Franklin Statistics, 4 of 63 Empirical Rule For bell-shaped data sets: Approximately 68% of the observations fall within 1 standard deviation of the mean Approximately 95% of the observations fall within 2 standard deviations of the mean Approximately 100% of the observations fall within 3 standard deviations of the mean
Agresti/Franklin Statistics, 5 of 63 Parameter and Statistic A parameter is a numerical summary of the population A statistic is a numerical summary of a sample taken from a population
Agresti/Franklin Statistics, 6 of 63 Section 2.5 How Can Measures of Position Describe Spread?
Agresti/Franklin Statistics, 7 of 63 Quartiles Splits the data into four parts The median is the second quartile, Q 2 The first quartile, Q 1, is the median of the lower half of the observations The third quartile, Q 3, is the median of the upper half of the observations
Agresti/Franklin Statistics, 8 of 63 Example: Find the first and third quartiles Prices per share of 10 most actively traded stocks on NYSE (rounded to nearest $) 2 4 11 12 13 15 31 31 37 47 a. Q 1 = 2 Q 3 = 47 b. Q 1 = 12 Q 3 = 31 c. Q 1 = 11 Q 3 = 31 d. Q 1 =11.5 Q 3 = 32
Agresti/Franklin Statistics, 9 of 63 Measuring Spread: Interquartile Range The interquartile range is the distance between the third quartile and first quartile: IQR = Q3 – Q1
Agresti/Franklin Statistics, 10 of 63 Detecting Potential Outliers An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile
Agresti/Franklin Statistics, 11 of 63 The Five-Number Summary The five number summary of a dataset: Minimum value First Quartile Median Third Quartile Maximum value
Agresti/Franklin Statistics, 12 of 63 Boxplot A box is constructed from Q 1 to Q 3 A line is drawn inside the box at the median A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier A line extends outward from the upper end of the box to the largest observation that is not a potential outlier
Agresti/Franklin Statistics, 13 of 63 Boxplot for Sodium Data Sodium Data: 0 200 Five Number Summary: 70 210 125 210 Min: 0 125 220 Q1: 145 140 220 Med: 200 150 230 Q3: 225 170 250 Max: 290 170 260 180 290 200 290
Agresti/Franklin Statistics, 15 of 63 Z-Score The z-score for an observation measures how far an observation is from the mean in standard deviation units An observation in a bell-shaped distribution is a potential outlier if its z-score +3
Agresti/Franklin Statistics, 16 of 63 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables
Agresti/Franklin Statistics, 17 of 63 Variables Response variable: the outcome variable Explanatory variable: the variable that explains the outcome variable
Agresti/Franklin Statistics, 18 of 63 Association An association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable
Agresti/Franklin Statistics, 19 of 63 Section 3.1 How Can We Explore the Association Between Two Categorical Variables?
Agresti/Franklin Statistics, 20 of 63 Example: Food Type and Pesticide Status
Agresti/Franklin Statistics, 21 of 63 Example: Food Type and Pesticide Status What is the response variable? What is the explanatory variable? Pesticides: Food Type: Yes No Organic 29 98 Conventional 19485 7086
Agresti/Franklin Statistics, 22 of 63 Example: Food Type and Pesticide Status What proportion of organic foods contain pesticides? What proportion of conventionally grown foods contain pesticides? Pesticides: Food Type: Yes No Organic 29 98 Conventional 19485 7086
Agresti/Franklin Statistics, 23 of 63 Example: Food Type and Pesticide Status What proportion of all sampled items contain pesticide residuals? Pesticides: Food Type: Yes No Organic 29 98 Conventional19485 7086
Agresti/Franklin Statistics, 24 of 63 Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: Displays 2 categorical variables The rows list the categories of 1 variable The columns list the categories of the other variable Entries in the table are frequencies
Agresti/Franklin Statistics, 25 of 63 Example: Food Type and Pesticide Status Contingency Table Showing Conditional Proportions
Agresti/Franklin Statistics, 26 of 63 Example: Food Type and Pesticide Status What is the sum over each row? What proportion of organic foods contained pesticide residuals? What proportion of conventional foods contained pesticide residuals? Pesticides: Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27
Agresti/Franklin Statistics, 27 of 63 Example: Food Type and Pesticide Status
Agresti/Franklin Statistics, 28 of 63 Example: For the following pair of variables, which is the response variable and which is the explanatory variable? College grade point average (GPA) and high school GPA a.College GPA: response variable and High School GPA : explanatory variable b.College GPA: explanatory variable and High School GPA : response variable
Agresti/Franklin Statistics, 29 of 63 Section 3.2 How Can We Explore the Association Between Two Quantitative Variables?
Agresti/Franklin Statistics, 30 of 63 Scatterplot Graphical display of two quantitative variables: Horizontal Axis: Explanatory variable, x Vertical Axis: Response variable, y
Agresti/Franklin Statistics, 31 of 63 Example: Internet Usage and Gross National Product (GDP)
Agresti/Franklin Statistics, 32 of 63 Positive Association Two quantitative variables, x and y, are said to have a positive association when high values of x tend to occur with high values of y, and when low values of x tend to occur with low values of y
Agresti/Franklin Statistics, 33 of 63 Negative Association Two quantitative variables, x and y, are said to have a negative association when high values of x tend to occur with low values of y, and when low values of x tend to occur with high values of y
Agresti/Franklin Statistics, 34 of 63 Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?
Agresti/Franklin Statistics, 35 of 63 Linear Correlation: r Measures the strength of the linear association between x and y A positive r-value indicates a positive association A negative r-value indicates a negative association An r-value close to +1 or -1 indicates a strong linear association An r-value close to 0 indicates a weak association
Agresti/Franklin Statistics, 36 of 63 Calculating the correlation, r
Agresti/Franklin Statistics, 37 of 63 Example: 100 cars on the lot of a used-car dealership Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? Positive association Negative association No association