# Descriptive Statistics Stoney Pryor

## Presentation on theme: "Descriptive Statistics Stoney Pryor"— Presentation transcript:

Descriptive Statistics Stoney Pryor
AP Statistics Review Descriptive Statistics Stoney Pryor

Stoney Pryor Husband for 17 years, father of 3
Teacher in CSISD for 19 years Taught AP Statistics since 1998 About 110 in 4 sections two years, only 6 sophomores last year, 14 students this year Varsity football coach for 16 years, including 5 years as offensive coordinator Head girls soccer coach since 1999.

Descriptive Statistics
Free response questions on this topic address constructing and interpreting graphical displays of data, summarizing distributions, and comparing distributions of univariate data along with exploring categorical data. All responses must be in context.

1 2 When creating a graph, be sure to title, label, and scale the horizontal and vertical axes (if appropriate). If the graph includes multiple data sets, label each plot. 3

C.U.S.S. the data Always comment on center, shape, and spread when asked to provide information about a distribution based on a graph. If there are unusual features such as outliers, clusters, or gaps, comment on these. When describing the shape of a mound-shaped, approximately symmetric distribution, do not say that the distribution is normal based only on a mound-shaped, symmetric graph. Stating that the distribution is approximately normal is acceptable.

Knowing that the mean and median are unequal does not mean that the shape of the distribution is skewed.

When asked to compare two distributions based on graphs, compare and describe the center, shape, and spread. Use comparative language such as larger, higher, less variable. C.U.S.S. them both

Multiple Choice

1. A small town employs 34 salaried, nonunion employees
1. A small town employs 34 salaried, nonunion employees. Each employee receives an annual salary increase of between \$500 and \$2000 based on a performance review by the mayor’s staff. Some employees are members of the mayor’s political party, and the rest are not. Students at the local high school form two lists, A and B, one for the raises granted to employees who are in the mayor’s party, and the other for raises granted to employees who are not. They want to display a graph (or graphs) of the salary increases in the student newspaper that readers can use to judge whether the two groups of employees have been treated in a reasonable equitable manner.

1. Which of the following displays is least likely to be useful to readers for this purpose?
A. Back-to-back stemplots of A and B B. Scatterplot of B versus A C. Parallel boxplots of A and B D. Histograms of A and B that are drawn to the same scale E. Dotplots of A and B that are drawn to the same scale Really, this is 2 lists of uni-variate data. So, a uni-variate graph is most appropriate.

2. The figure above shows a cumulative relative frequency histogram of 40 scores on a test given in an AP Statistics class. Which of the following conclusions can be made from the graph?

A. There is greater variability in the lower 20 test scores than in the higher 20 test scores. B. The median test score is less than 50. C. Sixty percent of the students had test scores above 80.

A. There is greater variability in the lower 20 test scores than in the higher 20 test scores. B. The median test score is less than 50. C. Sixty percent of the students had test scores above 80. No, it is about 80. No, about half did.

D. If the passing score is 70, most students did not pass the test E
D. If the passing score is 70, most students did not pass the test E. The horizontal nature of the graph for test scores of 60 and below indicates that those scores occurred most frequently. No, nearly 75% did. Huh? Backwards.

A. There is greater variability in the lower 20 test scores than in the higher 20 test scores. B. The median test score is less than 50. C. Sixty percent of the students had test scores above 80.

3. The stemplot below shows the yearly earnings per share of stock for two different companies over a sixteen-year period. Which of the following statements is true? A. The median of the earnings of Company A is less than the median of the earnings of Company B. B. The range of the earnings of Company A is less than the range of the earnings of Company B. C. The third quartile of Company A is smaller than the third quartile of Company B. D. The mean of the earnings of Company A is greater than the mean of the earnings of Company B. E. The interquartile range of Company A is twice the interquartile range of Company B.

4. The following dotplot shows the Vitamin A content, in IUs or International Units, for 36 well known fruits. Which of the following statements is true? A. Since the data is skewed right, the mean is less than the median. B. Since the data is skewed right, the median is less than the mean. C. Since the data is skewed left, the mean is less than the median. D. Since the data is skewed left, the median is less than the mean. E. Since the data is skewed right, the mean and median are both greater than 1200 IUs.

5. The boxplots shown summarize two data sets, I and II
5. The boxplots shown summarize two data sets, I and II. Based on the boxplots, which of the following statements about these two data sets CANNOT be justified? A. The range of data set I is equal to the range of data set II. B. The interquartile range of data set I is equal to the interquartile range of data set II. C. The median of data set I is less than the median of data set II. D. Data set I and data set II have the same number of data points. E. About 75% of the values in data set II are greater than or equal to about 50% of the values in data set I. About the same length. The boxes are about the same size. Yep. (Middle line) Whoa! Yes.

6. A random sample of the weights of 50 bears in a national park is taken, generating the following summary statistics. What can be said about the number of outliers for this data set? A. 0 B. At least 1 C. No more than 1 D. At least 2 E. No more than 2

7. The histogram below displays the times, in minutes, needed for each chimpanzee in a sample of 26 to complete a simple navigational task. It was determined that the largest observation, 93, is an outlier since Q (Q3-Q1) = Which of the following boxplots could represent the information in the histogram?

8. The histograms below represent the distribution of five different data sets, each containing 28 integers, from 1 through 7, inclusive. The horizontal and vertical scales are the same for all graphs. Which graph represents the data set with the largest standard deviation?

Free Response

9. The Better Business Council of a large city has concluded that students in the city’s schools are not learning enough about economics to function in the modern world. These findings were based on test results from a random sample of 20 twelfth-grade students who completed a 46-question multiple-choice test on basic economic concepts. The data set below shows the number of questions that each of the 20 students in the sample answered correctly. 12, 16, 18, 17, 18, 33, 41, 44, 38, 35, 19, 36, 19, 13, 43, 8, 16, 14, 10, 9

12, 16, 18, 17, 18, 33, 41, 44, 38, 35, 19, 36, 19, 13, 43, 8, 16, 14, 10, 9 (a) Display these data in a stemplot.

(b) Use your stemplot for part (a) to describe the main features of this score distribution.
The most striking feature of the plot is that the scores cluster into two groups, one concentrated in the mid-teens and the other in the high 30s (or one with relatively low scores on the exam and one with relatively high scores). There are no scores in the 20s.

(c) Why would it be misleading to report only a measure of center for this score distribution.
A measure of center might fall between the two groups (as does the mean of here) where there is not data and would not provide an accurate picture of student performance on the exam. It would not indicate that students tended to score either very well or very poorly on the exam.

Scoring This question is scored in four sections: section 1 is part (a), and sections 2 to 4 consist of elements of parts (b) and (c). Section 1 is cored as either essentially correct (E) or incorrect (I). Section 1 is essentially correct (E) if in part (a) the student gives a correctly constructed stemplot. Any other type of plot is incorrect (I). Note: One or two misplaced or omitted leaves can still be considered essentially correct as long as the important features of the display are not altered.

Parts (b) and (c) are scored together in three sections, each of which is scored as essentially correct (E), partially correct (P), or incorrect (I). Section 2 is essentially correct (E) if in either part (b) or (c) the student clearly notices: 1. that there are two groups; 2. that there is a gap in the middle of the distribution; 3. the relative or specific positions of the two groups OR the location of the gap, a general measure of location (such as mean, median, or the fact that most scores fall between 10 and 19). ( Median = 18, mean = 22.95) Section 2 is partially correct (P) if the student notes two out of the three.

Section 3 is essentially correct (E) if in part (b) or part (c) the solution is given in the context of the problem and is communicated well. Section 3 is partially correct (P) if the student mentions the context (for instance, using the word “scores”), but communication of the context is weak. Section 3 is incorrect (I) if the context is not mentioned at all.

Section 4 is essentially correct (E) if in part (c) a valid reason is given for why a measure of center is not sufficient for data of this type (with the two groups and a gap). If, for instance, the reasoning would apply equally well to other shapes, it is not sufficient. Section 4 can be a most partially correct (P) if a student does not recognize the groups or gap. It is partially correct if the student compares the mean and median and cites outliers or skewness as the reason why a measure of center is not sufficient, or if a general reason is given for why a measure of center is not sufficient. (For instance, the student may say that center alone without some measure of spread is never sufficient.)

4 Complete Response All four sections essentially correct 3 Substantial Response Three sections essentially correct and no sections partially correct OR Two sections essentially correct and two sections partially correct 2 Developing Response Two sections essentially correct and no sections partially correct One section essentially correct and two sections partially correct Note: A score cannot exceed 2 if (1) the student fails to notice either of the two distinct groups of scores or the gap between the groups, and (2) the response to part (c) mentions neither the two groups nor the gap. 1 Minimal Response One section essentially correct and no sections partially correct No sections essentially correct and two sections partially correct If a response is between two scores (for example, 2 ½ points), use a holistic approach to determine whether to score up or down depending on the strength of the response and communication.

10. As gasoline prices have increased in recent years, many drivers have expressed concern about the taxes they pay on gasoline for their cars. In the United States, gasoline taxes are imposed by both the federal government and by individual states. The boxplot below shows the distribution of the state gasoline taxes, in cents per gallon, for all 50 states on January 1, 2006. (a) Based on the boxplot, what are the approximate values of the median and the interquartile range of the distribution of state gasoline taxes, in cents per gallon? Mark and label the boxplot to indicate how you found the approximated values.

The median and quartiles are marked and labeled on the boxplot above
The median and quartiles are marked and labeled on the boxplot above. The median is approximately 21 cents per gallon. The first and third quartiles are approximately 18 cents per gallon and 25 cents per gallon, respectively. The IQR is Q3 – Q1, which is approximately 25 – 18 = 7 cents per gallon.

(b) The federal tax imposed on gasoline was 18
(b) The federal tax imposed on gasoline was 18.4 cents per gallon at the time the state taxes were in effect. The federal gasoline tax was added to the state gasoline tax for each state to create a new distribution of combined gasoline taxes. What are approximate values, in cents per gallon, of the median and interquartile range of the new distribution of combined gasoline taxes? Justify your answer. After adding 18.4 cents per gallon to each of the state taxes, the median of the combined gasoline taxes would be the median of the state tax plus the federal tax, which is approximately = 39.4 cents per gallon. Although the quartiles of the combined gasoline taxes will change (Q1 = = 36.4 cents per gallon and Q3 = = 43.4 cents per gallon), the IQR will remain the same as it was for the state taxes at 7 cents per gallon (43.4 – 36.4 = 7).

Scoring Parts (a) and (b) are each scored as essentially correct (E), partially correct (P), or incorrect (I). Part (a) is scored as follows: Essentially correct (E) if the student identifies reasonable values for the median and IQR and justifies them by marking and labeling the boxplot. Partially correct (P) if the student identifies reasonable values for the median and IQR but does not mark or label the boxplot OR identifies, marks, and labels only one value (median or IQR). Incorrect (I) if the student identifies neither value OR identifies only one value but fails to mark and label the boxplot.

Scoring Part (b) is scored as follows: Essentially correct (E) if the student gives a median that is 18.4 cents per gallon larger than the median identified in part (a), gives an IQR that is the same single number found in part (a), AND provides a reasonable justification for at least one of these values. Partially correct (P) if the student provides only one correct value (either the median or the IQR) AND provides a justification. Incorrect (I) if the student gives incorrect values for the median and IQR OR provides only one correct value with no justification.

Scoring 4 Complete Response Both parts essentially correct 3 Substantial Response One part essentially correct and one part partially correct 2 Developing Response One part essentially correct and one part incorrect OR Both parts partially correct 1 Minimal Response One part partially correct and one part incorrect