# Bob delMas, Laura Le, Nicola Parker and Laura Ziegler

## Presentation on theme: "Bob delMas, Laura Le, Nicola Parker and Laura Ziegler"— Presentation transcript:

Bob delMas, Laura Le, Nicola Parker and Laura Ziegler
How to Implement a Randomization-Based Introductory Statistics Course: The CATALST Curriculum Bob delMas, Laura Le, Nicola Parker and Laura Ziegler Funded by NSF DUE BOB

Overview of Workshop Overview of CATALST Course
TinkerPlots Introduction UNIT 1: Chance Models and Simulation UNIT 2: Models for Comparing Groups – Randomization Methods UNIT 3: Estimating Models Using Data – Bootstrap Methods Assessment Results

University of Minnesota Project Team
FACULTY Joan Garfield Andy Zieffler Bob delMas GRADUATE STUDENTS Rebekah Isaak Laura Le Laura Ziegler

Our Collaborators Allan Rossman, Cal Poly State U
Beth Chance, Cal Poly State U John Holcomb, Cleveland State U George Cobb, Mt. Holyoke Coll. Herle McGowan, NCSU Instructors at different institutions who have implemented CATALST

CATALST Course: Teaching Students to Really Cook
Metaphor from Alan Schoenfeld (1998) Many intro stats classes teach how to follow “recipes” but not how to really “cook.” Able to perform routine procedures and tests Don't have the big picture that allows them to solve unfamiliar problems and to articulate and apply their understanding. Someone who knows how to cook knows the essential things to look for and focus on, and how to make adjustments on the fly.

CATALST: The Inference Model
The core logic of inference as the foundation (Cobb, 2007) Model: Specify a model to reasonably approximate the variation in outcomes attributable to the random process Randomize & Repeat: Use the model to generate simulated data and collect a summary measure Evaluate: Examine the distribution of the resulting summary measures

Radical Content No t-tests; Use of probability for simulation and modeling (TinkerPlots) Coherent curriculum that builds ideas of models, chance, simulated data, inference from first day Immersion in statistical thinking Activities based on real problems, real data Course materials contain all classroom activities and homework assignments Lesson plans posted at CATALST website

Radical Pedagogy Student-centered approach based on research in cognition & learning, instructional design principles Minimal lectures, just-in-time as needed Cooperative groups to solve problems “Invention to learn” and "test and conjecture" activities [develop reasoning & promote transfer] Writing & whole class discussion (wrap-up)

Why TinkerPlots? Fathom® is a viable option for building models and simulation, but also challenging to students (Maxara & Biehler, 2006, 2007; Biehler & Prommel , 2010) TinkerPlots™ was chosen because of unique visual (graphical interface) capabilities Allows students to see the devices they select (e.g., sampler, spinner) Easily use these models to simulate and collect data Allows students to visually examine and evaluate distributions of statistics

3 CATALST Units (14 Week Semester)
Chance Models and Simulation Learning to use the core logic of inference (George Cobb) Specify a chance model (sampling or random assignment) Generate a trial, Collect measure, Repeat many times Evaluate fit of chance model to the observed data Models for Comparing Groups Randomization Tests Random Assignment Studies Observed Data Studies Design: Random Assignment and Random Sampling Drawing valid conclusions using logic of inference Estimating Models Using Data Bootstrap Method Standard Error of a Sample Statistic Confidence Intervals

Workshop Overview Summary
You will work through shortened versions of three activities from the CATALST course. You will do one activity from: Unit 1 - Chance Models and Simulation Unit 2 - Models for Comparing Groups Unit 3 - Estimating Models Using Data Preliminary Assessment Results NP: We removed questions from the Beginning and end. I would edit to say “guiding questions throughout the activity.” Do we want to include intro to tinkerplots on this agenda?

Agenda for Session 1 An introduction to TinkerPlots
One activity from Unit 1 on Chance Models and Simulation A modified version of the Matching Dogs to Owners TURN OVER TO NICOLA

An introduction to TinkerPlots: Modeling Dice
NICOLA

Matching Dogs to Owners: Learning Goals
Develop the reasoning of statistical significance Understand how observed result can be judged unlikely under a particular model, Begin the process of statistical thinking -This is the first informal introduction to p-values and statistical inference (what does it mean to have evidence against the model?) -Our aim is develop the student’s statistical thinking, that is, getting them to think about models (in particular, the null model), considering variation that occurs due to chance, having them think about what statistic is of interest and how to display that statistic (dot plots), and communicating the results effectively and in the context of the problem

Matching Dogs to Owners: Student Preparation
At this point students have…. Modeled random behavior in TinkerPlots Coins, dice Colors of M&M candies Effects of “One Son” or “One of Each” child policies (using a stopping criterion) …and Introductory readings on hypothesis testing have been assigned

Matching Dogs to Owners: Research Question

Matching Dogs to Owners: Research Question
Are humans able to match dogs to owners better than blind luck?

Matching Dogs to Owners: Simulation
For each numbered person in the following photos, write down the letter for the dog that you think is owned by that person. Do we want to have them read the paragraphs? I’m thinking we’d have time…

A 1. _____ B 2. _____ C 3. _____ D 4. _____ E 5. _____ F 6. _____

1. _A__ A B 2. _C__ C 3. _F__ D 4. __E__ E 5. __B__ F 6. __D__

Matching Dogs to Owners: Research Question
Are humans able to match dogs to owners better than blind luck? I was just thinking of posting this slide while “students” are doing the activity…

Matching Dogs to Owners: Directions
Let’s do the modified activity Again, put on your “student hat”! Afterward, you will get to put your “teacher hat” back on

Matching Dogs to Owners: Wrap-up
What does the “blind guessing” model mean?

Matching Dogs to Owners: Wrap-up
Why do we use the “blind guessing” model to simulate our data? LL: Have these be the same bullets as used previously.

Matching Dogs to Owners: Wrap-up
What does it mean to have evidence that supports or does not support the “blind guessing” model?

Matching Dogs to Owners: Wrap-up
Tinkerplots steps

Matching Dogs to Owners: Reflection
What about this activity (content, format…) do you think might help maximize student learning in your classroom? What are your hesitations / what do you think might hinder student learning in your classroom? What questions do you still have about the implementation of such a course? Presuming that you wanted to implement these activities in your courses, how comfortable would you feel doing so? Why or why not? NICOLA

Matching Dogs to Owners: Reflection

Matching Dogs to Owners: Reflection

Matching Dogs to Owners: Reflection
What questions do you still have about the implementation of this type of activity? NICOLA

Matching Dogs to Owners: Reflection
Presuming that you wanted to implement these activities in your courses, how comfortable would you feel doing so? Why or why not? OK at some point Bob and Nicola need to make a final decision about who should facilitate this part…

Matching Dogs to Owners: Building the Model & Simulation
-show TP demo…

Matching Dogs to Owners: Building the Model & Simulation
-show TP demo… (this is just for backup)

Matching Dogs to Owners: Building the Model & Simulation
-show TP demo…

Matching Dogs to Owners: Building the Model & Simulation
-show TP demo…

Matching Dogs to Owners: Building the Model & Simulation
-show TP demo…

Agenda for Session 2 Two activities from Unit 2 on Comparing Groups
A modified version of the Sleep Deprivation Activity A modified version of the Contagious Yawns Study Homework LAURA Z

Sleep Deprivation: Learning Goals
Develop the need for a summary measure to compare two groups with quantitative data (e.g., different summary measures provide different information regarding characteristics of the data; some more relevant than others)

Sleep Deprivation: Learning Goals
Learn how to determine if a single observed difference is real and important or just due to chance (the need to consider the observed result in the distribution of results that are possible under the null model) Find the approximate p-value from simulated data and draw a conclusion

Sleep Deprivation: Prior Knowledge
Informal idea of p-value, but the term is introduced in this activity Basic idea of comparing groups Randomization test (by hand) NP: I deleted “Prior Knowledge” as a bullet because it’s listed in the heading already. Type of study Experiment with random assignment Quantitative data Comparing groups based on difference in means using the randomization test

Sleep Deprivation: Student Preparation
Students read the abstract from Stickgold, R., James, L., & Hobson, J. A. (2000). Visual discrimination learning requires sleep after training. Nature Neuroscience, 3(12), The reading is a scientific article abstract that introduces the context of and provides motivation for the Sleep Deprivation activity.

Sleep Deprivation: Preliminary Discussion
Begin with a discussion on how to measure whether or not the amount of sleep effects test performance. Have students come up with different methods. Describe how that relates to the activity.

Sleep Deprivation: Research Question
Does the effect of sleep deprivation last, or can a person “make up” for sleep deprivation by getting a full night’s sleep in subsequent nights? Stickgold, R., James, L., & Hobson, J. A. (2000). Visual discrimination learning requires sleep after training. Nature Neuroscience, 3(12),

Sleep Deprivation: Study Design
Deprived 21 Human Subjects 10 Unrestricted Sleep

Sleep Deprivation: Directions
Let’s do the modified activity Again, put on your “student hat”! Afterward, you will get to put your “teacher hat” back on

Sleep Deprivation: Sample Wrap-Up Questions
What was our null model? Why do we need to conduct a test, why can't we just look at the observed difference? What is the purpose of random assignment? Where was the plot centered? Why does that make sense? What is a p-value? What conclusion did you come to for the sleep study?

Sleep Deprivation: Building the Model & Simulation

Sleep Deprivation: Building the Model & Simulation

Sleep Deprivation: Building the Model & Simulation

Sleep Deprivation: Building the Model & Simulation

Sleep Deprivation: Building the Model & Simulation

Contagious Yawns Study: Homework Assignment
Research Question: Are yawns contagious?

Contagious Yawns Study: Study Design
Seed Planted 16 No Yawn Seed Planted 50 Human Subjects MythBusters [Photograph]. Retrieved April 11, 2013, from:

Contagious Yawns Study: Data
Subject Yawned Subject Did Not Yawn Total Yawn Seed Planted 10 24 34 Yawn Seed Not Planted 4 12 16  14 36 50

Contagious Yawns Study: Questions
Describe how you would create a model to answer the research question. Describe the simulation process you would use to answer the research question.

Contagious Yawns Study: Building the Model & Simulation

Contagious Yawns Study: Homework Questions
Quantify the strength of evidence/p-value for the observed result. Note that the difference in the proportion that yawned was In light of your previous answer, would you say that the results that the researchers obtained provide strong evidence that yawning is contagious? Explain your reasoning based on your simulation results.

Contagious Yawns Study: Reflection
What was the purpose of this homework assignment? How did it differ from the Sleep Deprivation Activity?

Agenda for Session 3 Overview of Unit 3 on Estimating Models Using Data A demonstration of the Kissing the Right Way Activity LAURA LE In this session, I’m going to start with an overview of Unit 3 followed by a demonstration of an activity that introduces the idea of confidence intervals, named Kissing the Right way.

Unit 3 Overview: Estimating Models Using Data
Sampling Bias, precision, and size of population Comparing Hand Spans Formalizes summary measure of variation (standard deviation) Kissing The Right Way Intro to confidence intervals Memorizing Letters Part II Confidence interval for effect size Why 2 Standard Errors? In the final unit of the CATALST course, the focus is on estimating models using sample data. This unit begins with an activity that explores how different sampling methods may affect the parameter estimates that are made. Students explore the ideas of bias, precision, and size of the population. The second activity in this unit is one that formalizes a summary measure of variation (the standard deviation). In the third activity, which will be demo’ed shortly, students are introduced to the idea of confidence intervals through bootstrapping. Students also learn about effect size. This is introduced via the two-group comparison and creating a confidence interval for the effect size. The last activity in Unit 3 tries to answer many student’s questions about “why 2SE’s.” It covers why two standard errors are used to calculate interval estimates as well as what it means to be 95% confident in our interval estimates of the population parameter.

Kissing the Right Way Like I mentioned on the previous slide, the activity called Kissing the Right Way is an activity that introduces the concept of confidence intervals. In your handout of materials, you are given the lesson plan for this activity, on pgs As you can see on pg. 20, prior to class, students are asked to read the article that this activity’s context came from, as well as some supplemental readings about standard deviation, standard error, and margin of error. To summarize the article: A German bio-psychologist, Onur Güntürkün, was curious whether the human tendency for right-sightedness (e.g., right-handed, right-footed, right-eyed), manifested itself in other situations as well. In trying to understand why human brains function asymmetrically, with each side controlling different abilities, he investigated whether kissing couples were more likely to lean their heads to the right than to the left. Interesting, right? 

Kissing the Right Way: Learning Goals
Understand what standard error is Understand the difference between standard deviation and standard error Understand what margin of error is Understand what an interval estimate is Understand the purpose of an interval estimate The learning goals for this activity are…

Kissing Study Activity
What percentage of couples lean their heads to the right when kissing? How can we find out? Collect data! The research question for this study was “what percentage of couples lean their heads to the right when kissing?” So how can we answer this question… we collect data!! ************************ -Talk about preceding activities…what leads them up to this point…(finding the average, finding a single probability, etc). Students are guided through the process of obtaining a bootstrap interval for categorical data

Kissing Study Activity
80 Couples Lean Right The researchers went out to public places like airports, train stations, beaches, and parks in the United States, Germany, and Turkey and observed 124 couples (estimated ages 13 to 70 years, not holding any other objects like luggage that might influence their behavior). What they found was that 80 leaned their heads to the right when kissing and 44 leaned their heads to the left. 44 Couples Lean Left 124 Couples

Kissing Activity Sixty-five percent of the couples observed leaned to the right. What percentage of all couples lean to the right? How much variation is there in the estimate from sample-to-sample? 80 out of 124 turns out to be 65%. After computing this point estimate, students are then presented with questions that ask about sampling variability. For example, Consider another study carried out using the same methodology, but using a different sample of 124 couples. Would you necessarily obtain the same answer to the research question (i.e., would the percentage of couples who lean their heads to the right when kissing be the same)? Explain why or why not. After students grapple with this idea of sampling variability and the need for a good estimate for the population parameter, the next logical step is to say, well, how can I get a numerical value for how much variation there is in the estimate from sample-to-sample? We do this by using our sample as a substitute for the population and simulating a lot of samples to come up with an estimate for sampling variability. Use sample as a substitute for the population

Kissing Study: Building the Model & Simulation
So, students build a model using the sample result as a “stand-in” for the population. Students will use a spinner, put 64.5% as kissing to the right, and set repeat to 124.

Building the Model & Simulation
Kissing Study: Building the Model & Simulation Then, they click run and obtain a single trial from our “stand-in” population.

Kissing Study: Building the Model & Simulation
They plot this to see the results from our single trial better and…

Kissing Study: Building the Model & Simulation
Collect the outcome of interest…the percent of simulated couples that turned their head to the right. The students then collect many many trials (they are good at doing this by now), and

Kissing Study: Building the Model & Simulation
Come up with a distribution of multiple trials. From there, they compute the standard error of this distribution…multiply this standard error by 2 and add and subtract this to the point estimate to get the upper and lower bounds of the 95% CI. By this point in the class, students are really good at working through the steps in TP, and creating the distribution of results. A couple of points of confusion for students is: - in the calculation for the standard error. - using the point estimate for the midpoint of the CI and not the midpoint of the distribution.

Kissing the Right Way: Sample Wrap-Up Questions
What is standard error? What is margin of error? When would we need to use margin of error? How would you interpret the margin of error? What is an interval estimate? What is the purpose of an interval estimate? How would you interpret your interval estimate? What do you think the purpose was of this activity? Some sample wrap-up questions for this activity are…

Agenda for Session 4 Summary of Assessment Results
Overview of Assessment Instruments Comparison of CATALST and Non-CATALST students BOB

CATALST: Assessment GOALS: 27 forced-choice items
Study design Reasoning about variability Sampling and sampling variability Interpreting confidence intervals and p-values Statistical inference Modeling and simulation MOST: Measure of statistical thinking 4 real-world contexts open-ended and forced-choice items AFFECT: attitudes and perceptions

Performance of the CATALST (n = 289) & non-CATALST (n = 440) groups on the GOALS test

Bootstrapped Confidence Interval Limits for Each Item (CATALST: n = 289; non-CATALST: n = 440)

GOALS Item 1 Design & Conclusions
A recent research study randomly assigned participants into groups that were given different levels of Vitamin E to take daily. One group received only a placebo pill. The research study followed the participants for eight years to see which ones developed a particular type of cancer during that time period. What is the primary purpose of the use of random assignment for making inferences based on this study? CATALST NON-CATALST Response Options 17.0% 38.2% a. To ensure that a person doesn’t know whether or not they are getting the placebo. 66.7% 29.1% b. To ensure that the groups are similar in all respects except for the level of VitamE. 16.3% 32.7% c. To ensure that the study participants are representative of the larger population.

GOALS Item 3 Design & Conclusions
A local television station for a city with a population of 500,000 recently conducted a poll where they invited viewers to call in and voice their support or opposition to a controversial referendum that was to be voted on in an upcoming election. Over 5,000 people responded, with 67% opposed to the referendum. The TV station announced that the referendum would most likely be defeated in the election. Select the best answer below for why you think the TV station's announcement is valid or invalid. CATALST NON-CATALST Response Options 5.2% 22.3% a. Valid, because the sample size is large enough to represent the population. 4.8% 17.5% b. Valid, because 67% is far enough above 50% to predict a majority vote 7.2% 16.8% c. Invalid, because the sample is too small given the size of the city 82.7% 43.4% d. Invalid, because the sample is not likely to be representative of the population

GOALS Items 23, 24 & 25 Understanding p-Values
For questions 23-25, indicate whether the interpretation of the p-value provided is valid or invalid. STATEMENT CORRECT RESPONSE CATALST Non-CATALST The p-value is the probability that the \$5 incentive group would have the same or lower success rate than the “do your best” rate. INVALID 82.3% 58.0% The p-value is the probability that the \$5 incentive group would have a higher success rate than the “do your best” group. 58.4% 48.9% The p-value is the probability of obtaining a result as extreme as was actually found, if the \$5 incentive is really not helpful. VALID 70.0% 51.4% ANSWER ALL THREE ITEMS CORRECTLY 39.5% 19.3%

MOST Test: Statistical Thinking
Example of an Open-Ended Item: Context Consider an experiment where a researcher wants to study the effects of two different exam preparation strategies on exam scores. Twenty students volunteered to be in the study, and were randomly assigned to one of two different exam preparation strategies, 10 students per strategy. After the preparation, all students were given the same exam (which is scored from 0 to 100). The researcher calculated the mean exam score for each group of students. The mean exam score for the students assigned to preparation strategy A was 5 points higher than the mean exam score for the students assigned to preparation strategy B.

MOST Test: Statistical Thinking
Open-ended Question Explain how the researcher could determine whether the difference in means of 5 points is large enough to claim that one preparation strategy is better than the other. (Be sure to give enough detail that someone else could easily follow your explanation in order to implement your proposed analysis and draw an appropriate inference (conclusion).)

MOST Test: Sample Scoring Rubric
Component Randomization-Based Parametric-Procedures Modeling: Hypothesis Describes the null model of no difference Describes the Null and Alternative hypotheses Context: Hypothesis Identifies the context in definition of null model Identifies the context in definition of null and alternative hypotheses Modeling: Test Describes the simulation: statistic to collect & repetition States a z-test & describes checking assumptions Conclusions Description identifies how conclusion can be drawn from the test results Context: Conclusion Conclusion presented in the context of the problem: Refers to the percentage of breakups for Monday or percentage of breakups for each day of the week

MOST Test: Preliminary Findings
CATALST students include a higher percentage of procedural components For all students, responses are weakest on making conclusions and including context CATALST students more likely to describe the use of technology in their responses CATALST students tend to describe steps in more detail CATALST students more likely to map a problem to a previously solved problem Non-CATALST students more likely to present a non-statistical explanation for outcomes

Percent Agree or Strongly Agree
Affect Survey: Percent Agree or Strongly Agree Non-CATALST (n = 453) CATALST (n = 283) Learning to use (software/TinkerPlotsTM) was an important part of learning statistics. 68.7% 81.5% I would be comfortable using (software/TinkerPlotsTM) to test for a difference between groups after completing this class. 66.0% 91.0% I would be comfortable using (software/TinkerPlotsTM) to compute an interval estimate for a population parameter after completing this class. 61.7% 83.7% Learning to (use software/create models in TinkerPlotsTM) helped me learn to think statistically. 61.6% 84.5%

CATALST: Interview Study
Examined statistical reasoning of 5 tertiary students after Chance Models and Simulation unit Each of two novel problems asked students to reason about the likelihood of a surprising result After five weeks, students’ demonstrated consideration of sampling variability when drawing inferences Students’ solutions typically: Considered the value most likely to occur under the chance model (expected value) Demonstrated an awareness of sampling variability Quantified the degree of “unusualness”

CATALST publications Garfield, J., delMas, R. & Zieffler, A. (2012). Developing statistical modelers and thinkers in an introductory, tertiary-level statistics course. ZDM: The International Journal on Mathematics Education. Ziegler, L. and Garfield, J. (2013) Exploring students' intuitive ideas of randomness using an iPod shuffle activity. Teaching Statistics, 35(1), 2-7. Isaak, R., Garfield, J. and Zieffler, A. (in press). The Course as Textbook. Technology Innovations in Statistics Education. Garfield, J., Zieffler, A., delMas, R. & Ziegler, L. (under review). A New Role for Probability in the Introductory College Statistics Course. Journal of Statistics Education. delMas, R. , Zieffler, A. & Garfield, J. (under review). Tertiary Students' Reasoning about Samples and Sampling Variation in the Context of a Modeling and Simulation Approach to Inference. Educational Studies in Mathematics. NP: Is the TISE article still in press? Are the other two still under review?