Presentation on theme: "AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of."— Presentation transcript:
Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of several groups or categories. A quantitative variable takes numeric values for which arithmetic operations make sense.
The distribution of a variable tells us what values the variable takes on and how often it takes on those values.
Statistical inference involves drawing conclusions about a large group, called the population by gathering information from a smaller subgroup, called the sample.
The main statistical designs for producing data are surveys, experiments and observational studies. In an observational study, we observe individuals and measure variables of interest but do not attempt to influence the responses. In an experiment, we deliberately do something to individuals in order to observe their response.
What two types of graphs are typically used for categorical variables? What two types of graphs are typically used for quantitative variables?
Please know: Cumulative frequency histogram Relative frequency histogram
When you describe the distribution pay special attention to the … shape: overall pattern, symmetric or skewed. The length of the “tails” will tell us whether a graph (i.e. distribution) is left-skewed (left tail is the longest) or right-skewed (the right tail is the longest). modes: the values that occur most often (i.e. peaks) unimodal - one major peak, bimodal - two major peaks
Center: the middle The two most common measures of center are the mean and the median. Spread: how varied (i.e. spread out is the data The IQR and standard deviation are probably the two most common measures of spread. Outliers: any value(s) that fall outside the overall pattern.
When you have to describe the shape of a distribution, don’t get mad, C U S S E N P H N U R A T S E P E U A E R A D L
Measuring Center: The Mean & Median To calculate the mean, add the values of the observations and divide by the number of observations. The mean of a sample is denoted, pronounced x-bar. The mean of a population is denoted, the Greek letter Mu.
Measuring Center: The Median The median (denoted by M) is the midpoint of a distribution: To calculate the median…. 1. Order the observations from smallest to largest. 2. If the number of observations is odd, the median is simply the middle value in the list. You can find the location by counting (n+1)/2 observations from the bottom (or top). 3. If the number of observations is even, you should average the two middle numbers. The location of the median is again (n+1)/2 from the bottom or top of the list.
EXAMPLE: Consider the following set of numbers… 13, 25, 28, 36, 47 M= _______=________ Now, consider adding a 6 th number, say 104. M= _______=________ We say that the median is an outlier resistant measure of center, while the mean is not.
Mean versus Median The mean and median of a roughly symmetrical distribution will be close together. If the distribution is exactly symmetric, the mean and median are equal. In a skewed distribution, the mean is farther out in the long tail than the median. In a skewed distribution, the median is the more accurate measure of center. In descriptions of data, the “average” value of a variable is usually referred to as the mean whereas the “typical” value is usually referred to as the median.
Measuring Spread: The Quartiles One way to measure spread, or variability, is to calculate the range, which is the difference between the largest and smallest observations. Another way to describe the spread of a distribution is by considering different percentiles. The p th percentile of a distribution is the value that has p percent of the observations at or below it. The median is the 50% percentile. The 25 th percentile is called the 1 st quartile while the 75 th percentile is called the 3 rd quartile.
The Five-Number Summary and Boxplots The five-number summary of a set of observations consists of the smallest value, the 1 st quartile, the median, the 3 rd quartile and the largest value. The five-number summary can be presented visually by a boxplot.
The 1.5IQR Rule for Outliers The distance between the 1 st and 3 rd quartiles is called the interquartile range, which is abbreviated IQR for obvious reasons. The quartiles and IQR are resistant to changes in either tail of a distribution. ****Since the median and the IQR are resistant to outliers, they should be used when describing a skewed distribution.
We will call a data value a “suspected” outlier if it falls more than 1.5 x IQR above Q 3 or below Q 1. In a modified boxplot, the whiskers extend only to vlaues not “flagged” as outliers and asterisks are used to denote any outliers.
Measuring Spread: The Standard Deviation The standard deviation measures spread by determining how far each value is from the mean and then “averaging” these distances. The standard deviation of a sample is denoted by s. The standard deviation of a population is denoted, the Greek letter Sigma.
The following formula is used to compute the standard deviation of a sample. The variance of a set of observations,, is simply the square of the standard deviation.
Properties of the Standard Deviation 1. s measures spread about the mean and should be used only when the mean is used as the measure of center 2. s = 0 only when there is no spread/variability (i.e. all the values are the same. Otherwise, s > 0. As the observations become more spread out about their mean, s gets greater. 3. s, like the mean, is not resistant to outliers. A few outliers can make s very large. Distributions with outliers and strongly skewed distributions have very large standard deviations. As such, the number s does not give much helpful information about such distributions.
Choosing Measures of Center and Spread The five number summary, in particular the median and the IQR, is usually better than the mean and standard deviation for describing a skewed distribution or a distribution with strong outliers. Use and s only for reasonably symmetric distributions that are free of outliers.
Adding the same number, a, to each observation adds a to the measure of center but does not affect the measure of spread. Multiplying each observation by the same number, b, multiplies both the measures of center and spread by b.