Presentation on theme: "Why is Statistical Literacy So Important? Elizabeth Johnson Department of Statistics George Mason University."— Presentation transcript:
Why is Statistical Literacy So Important? Elizabeth Johnson Department of Statistics George Mason University
Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write! Samuel S. Wilks (1906 - 1964) paraphrasing Herbert G. Wells (1866 - 1946)
What is Statistical Literacy? There are numerous (and sometimes inconsistent) definitions of what constitutes statistical literacy. Katherine Wallman (1993) defined it as including the cognitive abilities of understanding and critically evaluating statistical results as well as appreciating the contributions statistical thinking can make. In 1998, David Moore asked “What statistical ideas will educated people who are not specialists require in the twenty-first century? That is the issue of statistical literacy. What specific concepts and skills will be needed in the context of specific jobs? That is the issue of statistical competence.”
Joan Garfield’s (1999) definition focused on the understanding of statistical language: words, symbols, and terms. Being able to interpret graphs and tables. Being able to read and make sense of statistics in the news, media, polls. While the International Statistical Literacy Project (http://iase-web.org/) used the Oxford English Dictionary definition of "numeracy" (which is more general than statistical literacy)http://iase-web.org/
Iddo Gal, 2000, defined it broadly as two interrelated components, primarily (a) people's ability to interpret and critically evaluate statistical information, data-related arguments, or stochastic phenomena, which they may encounter in diverse contexts and when relevant (b) their ability to discuss or communicate their reactions to such statistical information, such as their understanding of the meaning of the information, their opinions about the implications of this information, or their concerns regarding the acceptability of given conclusions.
In 2001, Richard Scheaffer, President of the American Statistical Association, stated that “In the information age of today, statistics is essential but statisticians are not” He points out that statisticians have much to offer and must be proactive in their approaches to leaders in education, producers and users of data, the scientific community of scholars, and the public at large. Properly constructed bridges to these constituencies can convey the positive contributions of statistical thinking for all, strong academic programs in statistics, the value-added practice of statistics, and the infusion of statistics into interdisciplinary research.
Outline Brief history of the AP Statistics program and the development of the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Explain some “new” concepts covered by the K-12 Common Core State Standards for Mathematics Discuss the impact these programs will have on undergraduate statistics curricula, teacher certification programs and future statisticians.
Creation of the Advanced Placement (AP) Statistics Program In 1997, the Educational Testing Service administered the first AP exam in Statistics to 7,667 high school students. The exam was created by a taskforce that included Dick Schaeffer among others. The exam consists of two parts: 1) a 40 multiple choice item section worth 50% of total grade and 2) 6 free response items worth 50% of the total grade. The exam covers topics in the four areas of exploring data, sampling and experimentation, anticipating patterns, statistical inference.
Growth of the AP Statistics Exam
Sample Free-Response Question: Experimental Design As dogs age, diminished joint and hip health may lead to joint pain and thus reduce a dog’s activity level. Such a reduction in activity can lead to other health concerns such as weight gain and lethargy due to lack of exercise. A study is to be conducted to see which of two dietary supplements, glucosamine or chondroitin, is more effective in promoting joint and hip health and reducing the onset of canine osteoarthritis. Researchers will randomly select a total of 300 dogs from ten different large veterinary practices around the country. All of the dogs are more than 6 years old, and their owners have given consent to participate in the study. Changes in joint and hip health will be evaluated after 6 months of treatment.
Three Part Question (a) What would be the advantage to adding a control group in the design of this study? (b) Assuming a control group is added to the other two groups in the study, explain how you would assign the 300 dogs to these three groups using a completely randomized design. (c) Rather than using a completely randomized design, one group of researchers proposes blocking on clinics, and another group of researchers proposes blocking on breed of dog. How would you decide which one of these two variables to use as a blocking variable?
Model Solution for Part (b) “Each dog will be assigned a unique random number, 001-300, using a random number generator on a calculator, statistical software, or a random number table. The numbers will be sorted from smallest to largest. The dogs assigned the first 100 numbers in the ordered list will receive glucosamine. The dogs with the next 100 numbers in the ordered list will be assigned to the control group. Finally, the dogs with the numbers 201- 300 will receive chondroitin.”
Teaching Moment A large number of student solutions to this problem used a standard six-sided die to randomize the subjects into three groups. If the dogs were sequentially assigned to the three groups by consecutively rolling the die with two sides of the die corresponding to the assignment to each of the three groups, then the assignment is a completely randomized design with possibly unequal treatment groups. A student using this assignment received full credit via the scoring rubrics. Note that the question did not require the treatment group sizes to be the same.
The “Stopping Rule” Holdup HOWEVER, if a student attempted to balance the treatment groups in the sequential die assignment procedure by requiring 100 dogs in each group, then this assignment procedure did not produce a completely randomized design, and the student received no credit.
A follow-up paper by Taylor & Breazel (2008) showed that the balanced treatment group die assignment does not result in a completely randomized design because the last two dogs to be assigned have a higher probability of being grouped together than the probability for the completely randomized design. In fact, the probability is shown to be approximately.39 when there are 6 dogs,.69 when there are 18 dogs,.94 for when there are 30 dogs and approaches 1 as the number of dogs increases.
Blocking? Part (c) was scored as essentially correct (E) if: the student argues that the variable with the stronger relationship to joint and hip health should be used as the blocking variable; OR the student states that the variable with the larger anticipated variability in the response measure should be used as the blocking variable so that units within blocks are as homogeneous as possible. A rationale is required, but a variable does not have to be selected.
What about the Multiple Choice Items?
Importance of statistical literacy in K-12 Presents statistics-coherent curriculum in context of statistical problem solving process 1. Formulate a question that can be answered by data 2. Design and implement a plan to collect data 3. Analyze the data with graphical and numerical methods 4. Interpret the analysis in context of the original question Developmental levels (A, B, and C) Resource in guiding statistics standards, assessments, and preparation of K-12 teachers www.amstat.org/education/gaise
Guidelines for Assessment and Instruction in Statistics Education (GAISE) Undergraduate Report Emphasize statistical literacy and develop statistical thinking. Use real data. Stress conceptual understanding rather than mere knowledge of procedures. Foster active learning in the classroom. Use technology for developing concepts and analyzing data. Use assessments to improve and evaluate student learning. Writing team included: Martha Aliaga, George Cobb, Carolyn Cuff, Joan Garfield (chair), Rob Gould, Robin Lock, Tom Moore, Allan Rossman, Bob Stephenson, Jessica Utts, Paul Velleman, Jeff Witmer
In the news today Common Core State Standards (corestandards.org) For the first time, most U.S. States and some territories will have common K-12 mathematics standards. Specific standards give statistics a more prominent role and place more emphasis on conceptual understanding and reasoning. Incorporates the statistical problem-solving processes described in the GAISE Report and other GAISE recommendations ASA members (prominent statisticians and statistics educators) were involved in writing and reviewing the statistics standards in the Common Core)
Statistics and Probability: Grades 6 – 8 Overview Grade 6 Develop understanding of statistical variability. Summarize and describe distributions. Grade 7 Use random sampling to draw inferences about a population. Draw informal comparative inferences about two populations. Investigate chance processes and develop, use, and evaluate probability models. Grade 8 Investigate patterns of association in bivariate data.
Common Core Standards – Grade 6 Recognize a statistical question as one that anticipates variability in the data related to the question and accounts for it in the answers. For example, “How old am I?” is not a statistical question, but “How old are the students in my school?” is a statistical question because one anticipates variability in students’ ages. A well-written statistical question anticipates answers that will vary and includes: Population of interest Measurement of interest
LOCUS is an NSF project focused on the development and assessment of statistical literacy ( K-12 and undergraduate)
Statistics and Probability: High School Overview Interpreting Categorical and Quantitative Data (S-ID) Summarize, represent, and interpret data on a single count or measurement variable Summarize, represent, and interpret data on two categorical and quantitative variables Interpret linear models Making Inferences and Justifying Conclusions (S-IC) Understand and evaluate random processes underlying statistical experiments Make inferences and justify conclusions from sample surveys, experiments and observational studies
Statistics and Probability: High School Overview Conditional Probability and the Rules of Probability (S-CP) Understand independence and conditional probability and use them to interpret data Use the rules of probability to compute probabilities of compound events in a uniform probability model Using Probability to Make Decisions (S-MD) Calculate expected values and use them to solve problems Use probability to evaluate outcomes of decisions
Some Specific CCSC High School Concepts Fit a function to the data; use functions fitted to data to solve problems in the context of the data. Use given functions or choose a function suggested by the context. Emphasize linear, quadratic, and exponential models. Informally assess the fit of a function by plotting and analyzing residuals. Distinguish between correlation and causation. Source: corestandards.org
Understand statistics as a process for making inferences about population parameters based on a random sample from that population. Decide if a specified model is consistent with results from a given data- generating process, e.g., using simulation. For example, a model says a spinning coin falls heads up with probability 0.5. Would a result of 5 tails in a row cause you to question the model? Evaluate reports based on data. Source: corestandards.org
Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each. Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant. Source: corestandards.org
The Next Big Thing? Simulation-and Randomization-Based Inference “Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.” – George Cobb (TISE, 2007)
Who is doing research in this area?? NSF-funded curriculum projectsFull Implementation Rossman, Chance, Holcomb, Cobb (CSI) West and Woodard (NC State) Gould et al (UCLA) Garfield, delMas, Zieffler, et al (CATALST) Tintle et al (Hope College) March 2011 JSE article Textbook project Hamrick et al (Rhodes College) 2011 JSM panel discussion Lock 5 textbook project Tabor and Franklin, Statistical Reasoning in Sports
So, what will we do with these future students in our undergraduate introductory statistics classes? Mean vs. median? Risk analysis (e.g., Utts, 2010) Multivariate modeling (e.g., Kaplan, 2009) Large, complex data sets, data mining (e.g., Gould plenary talk) Bayesian methods, decision theory (e.g., Stewart plenary talk) Computing, visualization tools (e.g., Nolan and Lang, 2010) Data dialogues (e.g., Pfannkuch et al, 2010)
What is this doing to the undergraduate statistics curriculum? Nick Horton as chair of the ASA undergraduate guidelines working group will release revised guidelines for undergraduate statistics later this year which stress Real applications Problem-solving Increasing importance of data science The guidelines also help to define our relationship with mathematics
What could (or should) we do for future teachers? Recommendations from the ASA-NCTM Joint Statement on K-12 Teacher Preparation: 1) Professional development courses and workshops for future and current teachers need to model effective pedagogies for teaching statistics, in addition to focusing on developing understanding of statistical concepts, mastery of statistical content, and knowledge of the essential ideas of statistical thinking and problem solving. Providing such courses and workshops may require universities to expand (or initiate) pre-service and outreach offerings in statistics.
2) Faculty who teach statistics need to work together with education faculty to provide coursework that emphasizes stronger conceptual knowledge of statistics and the essential ideas of statistical thinking and problem solving. 3) State departments of education need to work together with national professional organizations such as ASA, NCTM, the Association of Mathematics Teacher Educators (AMTE), the National Council of Supervisors of Mathematics (NCSM), and the Mathematical Association of America (MAA) to ensure the development of uniform resources, assessments, and delivery models of professional development in statistics.
What could this mean for future statisticians? Will curriculum and pedagogy decisions be grounded in educational research? Research Opportunities Journal of Statistics Education Founded at N.C. State in 1993 Statistics Education Research Journal Nearing its 10 th anniversary (launched 2002) Publishing exclusively research articles in statistics education New PhD programs in Statistics Education are being developed University of Georgia, NC State (?), University of Minnesota