Presentation on theme: "Bob delMas, Laura Le, Nicola Parker and Laura Ziegler"— Presentation transcript:
1Bob delMas, Laura Le, Nicola Parker and Laura Ziegler How to Implement a Randomization-Based Introductory Statistics Course: The CATALST CurriculumBob delMas, Laura Le, Nicola Parker and Laura ZieglerFunded by NSF DUEBOB
2Overview of Workshop Overview of CATALST Course TinkerPlots IntroductionUNIT 1: Chance Models and SimulationUNIT 2: Models for Comparing Groups – Randomization MethodsUNIT 3: Estimating Models Using Data – Bootstrap MethodsAssessment Results
3University of Minnesota Project Team FACULTYJoan GarfieldAndy ZiefflerBob delMasGRADUATE STUDENTSRebekah IsaakLaura LeLaura Ziegler
4Our Collaborators Allan Rossman, Cal Poly State U Beth Chance, Cal Poly State UJohn Holcomb, Cleveland State UGeorge Cobb, Mt. Holyoke Coll.Herle McGowan, NCSUInstructors at different institutions who have implemented CATALST
5CATALST Course: Teaching Students to Really Cook Metaphor from Alan Schoenfeld (1998)Many intro stats classes teach how to follow “recipes” but not how to really “cook.”Able to perform routine procedures and testsDon't have the big picture that allows them to solve unfamiliar problems and to articulate and apply their understanding.Someone who knows how to cook knows the essential things to look for and focus on, and how to make adjustments on the fly.
6CATALST: The Inference Model The core logic of inference as the foundation (Cobb, 2007)Model: Specify a model to reasonably approximate the variation in outcomes attributable to the random processRandomize & Repeat: Use the model to generate simulated data and collect a summary measureEvaluate: Examine the distribution of the resulting summary measures
7Radical ContentNo t-tests; Use of probability for simulation and modeling (TinkerPlots)Coherent curriculum that builds ideas of models, chance, simulated data, inference from first dayImmersion in statistical thinkingActivities based on real problems, real dataCourse materials contain all classroom activities and homework assignmentsLesson plans posted at CATALST website
8Radical PedagogyStudent-centered approach based on research in cognition & learning, instructional design principlesMinimal lectures, just-in-time as neededCooperative groups to solve problems“Invention to learn” and "test and conjecture" activities [develop reasoning & promote transfer]Writing & whole class discussion (wrap-up)
9Why TinkerPlots?Fathom® is a viable option for building models and simulation, but also challenging to students (Maxara & Biehler, 2006, 2007; Biehler & Prommel , 2010)TinkerPlots™ was chosen because of unique visual (graphical interface) capabilitiesAllows students to see the devices they select (e.g., sampler, spinner)Easily use these models to simulate and collect dataAllows students to visually examine and evaluate distributions of statistics
103 CATALST Units (14 Week Semester) Chance Models and SimulationLearning to use the core logic of inference (George Cobb)Specify a chance model (sampling or random assignment)Generate a trial, Collect measure, Repeat many timesEvaluate fit of chance model to the observed dataModels for Comparing GroupsRandomization TestsRandom Assignment StudiesObserved Data StudiesDesign: Random Assignment and Random SamplingDrawing valid conclusions using logic of inferenceEstimating Models Using DataBootstrap MethodStandard Error of a Sample StatisticConfidence Intervals
11Workshop Overview Summary You will work through shortened versions of three activities from the CATALST course.You will do one activity from:Unit 1 - Chance Models and SimulationUnit 2 - Models for Comparing GroupsUnit 3 - Estimating Models Using DataPreliminary Assessment ResultsNP: We removed questions from the Beginning and end. I would edit to say “guiding questions throughout the activity.”Do we want to include intro to tinkerplots on this agenda?
12Agenda for Session 1 An introduction to TinkerPlots One activity from Unit 1 on Chance Models and SimulationA modified version of the Matching Dogs to OwnersTURN OVER TO NICOLA
13An introduction to TinkerPlots: Modeling Dice NICOLA
14Matching Dogs to Owners: Learning Goals Develop the reasoning of statistical significanceUnderstand how observed result can be judged unlikely under a particular model,Begin the process of statistical thinking-This is the first informal introduction to p-values and statistical inference (what does it mean to have evidence against the model?)-Our aim is develop the student’s statistical thinking, that is, getting them to think about models (in particular, the null model), considering variation that occurs due to chance, having them think about what statistic is of interest and how to display that statistic (dot plots), and communicating the results effectively and in the context of the problem
15Matching Dogs to Owners: Student Preparation At this point students have….Modeled random behavior in TinkerPlotsCoins, diceColors of M&M candiesEffects of “One Son” or “One of Each” child policies (using a stopping criterion)…and Introductory readings on hypothesis testing have been assigned
17Matching Dogs to Owners: Research Question Are humans able to match dogs to owners better than blind luck?
18Matching Dogs to Owners: Simulation For each numbered person in the following photos, write down the letter for the dog that you think is owned by that person.Do we want to have them read the paragraphs? I’m thinking we’d have time…
21Matching Dogs to Owners: Research Question Are humans able to match dogs to owners better than blind luck?I was just thinking of posting this slide while “students” are doing the activity…
22Matching Dogs to Owners: Directions Let’s do the modified activityAgain, put on your “student hat”!Afterward, you will get to put your “teacher hat” back on
23Matching Dogs to Owners: Wrap-up What does the “blind guessing” model mean?
24Matching Dogs to Owners: Wrap-up Why do we use the “blind guessing” model to simulate our data?LL: Have these be the same bullets as used previously.
25Matching Dogs to Owners: Wrap-up What does it mean to have evidence that supports or does not support the “blind guessing” model?
26Matching Dogs to Owners: Wrap-up Tinkerplots steps
27Matching Dogs to Owners: Reflection What about this activity (content, format…) do you think might help maximize student learning in your classroom?What are your hesitations / what do you think might hinder student learning in your classroom?What questions do you still have about the implementation of such a course?Presuming that you wanted to implement these activities in your courses, how comfortable would you feel doing so? Why or why not?NICOLA
28Matching Dogs to Owners: Reflection What about this activity (content, format…) do you think might help maximize student learning in your classroom?NICOLA
29Matching Dogs to Owners: Reflection What are your hesitations about this activity / what do you think might hinder student learning if you were to use it in your classroom?NICOLA
30Matching Dogs to Owners: Reflection What questions do you still have about the implementation of this type of activity?NICOLA
31Matching Dogs to Owners: Reflection Presuming that you wanted to implement these activities in your courses, how comfortable would you feel doing so?Why or why not?OK at some point Bob and Nicola need to make a final decision about who should facilitate this part…
32Matching Dogs to Owners: Building the Model & Simulation -show TP demo…
33Matching Dogs to Owners: Building the Model & Simulation -show TP demo… (this is just for backup)
34Matching Dogs to Owners: Building the Model & Simulation -show TP demo…
35Matching Dogs to Owners: Building the Model & Simulation -show TP demo…
36Matching Dogs to Owners: Building the Model & Simulation -show TP demo…
37Agenda for Session 2 Two activities from Unit 2 on Comparing Groups A modified version of the Sleep Deprivation ActivityA modified version of the Contagious Yawns Study HomeworkLAURA Z
38Sleep Deprivation: Learning Goals Develop the need for a summary measure to compare two groups with quantitative data (e.g., different summary measures provide different information regarding characteristics of the data; some more relevant than others)
39Sleep Deprivation: Learning Goals Learn how to determine if a single observed difference is real and important or just due to chance (the need to consider the observed result in the distribution of results that are possible under the null model)Find the approximate p-value from simulated data and draw a conclusion
40Sleep Deprivation: Prior Knowledge Informal idea of p-value, but the term is introduced in this activityBasic idea of comparing groupsRandomization test (by hand)NP: I deleted “Prior Knowledge” as a bullet because it’s listed in the heading already.Type of studyExperiment with random assignmentQuantitative dataComparing groups based on difference in means using the randomization test
41Sleep Deprivation: Student Preparation Students read the abstract from Stickgold, R., James, L., & Hobson, J. A. (2000). Visual discrimination learning requires sleep after training. Nature Neuroscience, 3(12),The reading is a scientific article abstract that introduces the context of and provides motivation for the Sleep Deprivation activity.
42Sleep Deprivation: Preliminary Discussion Begin with a discussion on how to measure whether or not the amount of sleep effects test performance.Have students come up with different methods.Describe how that relates to the activity.
43Sleep Deprivation: Research Question Does the effect of sleep deprivation last, or can a person “make up” for sleep deprivation by getting a full night’s sleep in subsequent nights?Stickgold, R., James, L., & Hobson, J. A. (2000). Visual discrimination learning requires sleep after training. Nature Neuroscience, 3(12),
44Sleep Deprivation: Study Design Deprived21 Human Subjects10UnrestrictedSleep
45Sleep Deprivation: Directions Let’s do the modified activityAgain, put on your “student hat”!Afterward, you will get to put your “teacher hat” back on
46Sleep Deprivation: Sample Wrap-Up Questions What was our null model?Why do we need to conduct a test, why can't we just look at the observed difference?What is the purpose of random assignment?Where was the plot centered? Why does that make sense?What is a p-value?What conclusion did you come to for the sleep study?
47Sleep Deprivation: Building the Model & Simulation
48Sleep Deprivation: Building the Model & Simulation
49Sleep Deprivation: Building the Model & Simulation
50Sleep Deprivation: Building the Model & Simulation
51Sleep Deprivation: Building the Model & Simulation
52Contagious Yawns Study: Homework Assignment Research Question:Are yawns contagious?
53Contagious Yawns Study: Study Design Seed Planted16 No YawnSeed Planted50 Human SubjectsMythBusters [Photograph]. Retrieved April 11, 2013, from:
54Contagious Yawns Study: Data Subject YawnedSubject Did Not YawnTotalYawn Seed Planted102434Yawn Seed Not Planted41216 143650
55Contagious Yawns Study: Questions Describe how you would create a model to answer the research question.Describe the simulation process you would use to answer the research question.
56Contagious Yawns Study: Building the Model & Simulation
57Contagious Yawns Study: Homework Questions Quantify the strength of evidence/p-value for the observed result. Note that the difference in the proportion that yawned wasIn light of your previous answer, would you say that the results that the researchers obtained provide strong evidence that yawning is contagious? Explain your reasoning based on your simulation results.
59Contagious Yawns Study: Reflection What was the purpose of this homework assignment?How did it differ from the Sleep Deprivation Activity?
60Agenda for Session 3Overview of Unit 3 on Estimating Models Using DataA demonstration of the Kissing the Right Way ActivityLAURA LEIn this session, I’m going to start with an overview of Unit 3 followed by a demonstration of an activity that introduces the idea of confidence intervals, named Kissing the Right way.
61Unit 3 Overview: Estimating Models Using Data SamplingBias, precision, and size of populationComparing Hand SpansFormalizes summary measure of variation (standard deviation)Kissing The Right WayIntro to confidence intervalsMemorizing Letters Part IIConfidence interval for effect sizeWhy 2 Standard Errors?In the final unit of the CATALST course, the focus is on estimating models using sample data.This unit begins with an activity that explores how different sampling methods may affect the parameter estimates that are made. Students explore the ideas of bias, precision, and size of the population.The second activity in this unit is one that formalizes a summary measure of variation (the standard deviation).In the third activity, which will be demo’ed shortly, students are introduced to the idea of confidence intervals through bootstrapping.Students also learn about effect size. This is introduced via the two-group comparison and creating a confidence interval for the effect size.The last activity in Unit 3 tries to answer many student’s questions about “why 2SE’s.” It covers why two standard errors are used to calculate interval estimates as well as what it means to be 95% confident in our interval estimates of the population parameter.
62Kissing the Right WayLike I mentioned on the previous slide, the activity called Kissing the Right Way is an activity that introduces the concept of confidence intervals.In your handout of materials, you are given the lesson plan for this activity, on pgsAs you can see on pg. 20, prior to class, students are asked to read the article that this activity’s context came from, as well as some supplemental readings about standard deviation, standard error, and margin of error.To summarize the article: A German bio-psychologist, Onur Güntürkün, was curious whether the human tendency for right-sightedness (e.g., right-handed, right-footed, right-eyed), manifested itself in other situations as well. In trying to understand why human brains function asymmetrically, with each side controlling different abilities, he investigated whether kissing couples were more likely to lean their heads to the right than to the left.Interesting, right?
63Kissing the Right Way: Learning Goals Understand what standard error isUnderstand the difference between standard deviation and standard errorUnderstand what margin of error isUnderstand what an interval estimate isUnderstand the purpose of an interval estimateThe learning goals for this activity are…
64Kissing Study Activity What percentage of couples lean their heads to the right when kissing?How can we find out?Collect data!The research question for this study was “what percentage of couples lean their heads to the right when kissing?” So how can we answer this question… we collect data!!************************-Talk about preceding activities…what leads them up to this point…(finding the average, finding a single probability, etc).Students are guided through the process of obtaining a bootstrap interval for categorical data
65Kissing Study Activity 80 CouplesLean RightThe researchers went out to public places like airports, train stations, beaches, and parks in the United States, Germany, and Turkey and observed 124 couples (estimated ages 13 to 70 years, not holding any other objects like luggage that might influence their behavior). What they found was that 80 leaned their heads to the right when kissing and 44 leaned their heads to the left.44 CouplesLean Left124 Couples
66Kissing ActivitySixty-five percent of the couples observed leaned to the right. What percentage of all couples lean to the right?How much variation is there in the estimate from sample-to-sample?80 out of 124 turns out to be 65%.After computing this point estimate, students are then presented with questions that ask about sampling variability. For example,Consider another study carried out using the same methodology, but using a different sample of 124 couples. Would you necessarily obtain the same answer to the research question (i.e., would the percentage of couples who lean their heads to the right when kissing be the same)? Explain why or why not.After students grapple with this idea of sampling variability and the need for a good estimate for the population parameter, the next logical step is to say, well, how can I get a numerical value for how much variation there is in the estimate from sample-to-sample?We do this by using our sample as a substitute for the population and simulating a lot of samples to come up with an estimate for sampling variability.Use sample as a substitute for the population
67Kissing Study: Building the Model & Simulation So, students build a model using the sample result as a “stand-in” for the population. Students will use a spinner, put 64.5% as kissing to the right, and set repeat to 124.
68Building the Model & Simulation Kissing Study:Building the Model & SimulationThen, they click run and obtain a single trial from our “stand-in” population.
69Kissing Study: Building the Model & Simulation They plot this to see the results from our single trial better and…
70Kissing Study: Building the Model & Simulation Collect the outcome of interest…the percent of simulated couples that turned their head to the right. The students then collect many many trials (they are good at doing this by now), and
71Kissing Study: Building the Model & Simulation Come up with a distribution of multiple trials. From there, they compute the standard error of this distribution…multiply this standard error by 2 and add and subtract this to the point estimate to get the upper and lower bounds of the 95% CI.By this point in the class, students are really good at working through the steps in TP, and creating the distribution of results.A couple of points of confusion for students is:- in the calculation for the standard error.- using the point estimate for the midpoint of the CI and not the midpoint of the distribution.
72Kissing the Right Way: Sample Wrap-Up Questions What is standard error?What is margin of error?When would we need to use margin of error?How would you interpret the margin of error?What is an interval estimate?What is the purpose of an interval estimate?How would you interpret your interval estimate?What do you think the purpose was of this activity?Some sample wrap-up questions for this activity are…
73Agenda for Session 4 Summary of Assessment Results Overview of Assessment InstrumentsComparison of CATALST and Non-CATALST studentsBOB
74CATALST: Assessment GOALS: 27 forced-choice items Study designReasoning about variabilitySampling and sampling variabilityInterpreting confidence intervals and p-valuesStatistical inferenceModeling and simulationMOST: Measure of statistical thinking4 real-world contextsopen-ended and forced-choice itemsAFFECT: attitudes and perceptions
75Performance of the CATALST (n = 289) & non-CATALST (n = 440) groups on the GOALS test
76Bootstrapped Confidence Interval Limits for Each Item (CATALST: n = 289; non-CATALST: n = 440)
77GOALS Item 1 Design & Conclusions A recent research study randomly assigned participants into groups that were given different levels of Vitamin E to take daily. One group received only a placebo pill. The research study followed the participants for eight years to see which ones developed a particular type of cancer during that time period.What is the primary purpose of the use of random assignment for making inferences based on this study?CATALSTNON-CATALSTResponse Options17.0%38.2%a. To ensure that a person doesn’t know whether or not they are getting the placebo.66.7%29.1%b. To ensure that the groups are similar in all respects except for the level of VitamE.16.3%32.7%c. To ensure that the study participants are representative of the larger population.
78GOALS Item 3 Design & Conclusions A local television station for a city with a population of 500,000 recently conducted a poll where they invited viewers to call in and voice their support or opposition to a controversial referendum that was to be voted on in an upcoming election. Over 5,000 people responded, with 67% opposed to the referendum. The TV station announced that the referendum would most likely be defeated in the election.Select the best answer below for why you think the TV station's announcement is valid or invalid.CATALSTNON-CATALSTResponse Options5.2%22.3%a. Valid, because the sample size is large enough to represent the population.4.8%17.5%b. Valid, because 67% is far enough above 50% to predict a majority vote7.2%16.8%c. Invalid, because the sample is too small given the size of the city82.7%43.4%d. Invalid, because the sample is not likely to be representative of the population
79GOALS Items 23, 24 & 25 Understanding p-Values For questions 23-25, indicate whether the interpretation of the p-value provided is valid or invalid.STATEMENTCORRECTRESPONSECATALSTNon-CATALSTThe p-value is the probability that the $5 incentive group would have the same or lower success rate than the “do your best” rate.INVALID82.3%58.0%The p-value is the probability that the $5 incentive group would have a higher success rate than the “do your best” group.58.4%48.9%The p-value is the probability of obtaining a result as extreme as was actually found, if the $5 incentive is really not helpful.VALID70.0%51.4%ANSWER ALL THREE ITEMS CORRECTLY39.5%19.3%
80MOST Test: Statistical Thinking Example of an Open-Ended Item: ContextConsider an experiment where a researcher wants to study the effects of two different exam preparation strategies on exam scores. Twenty students volunteered to be in the study, and were randomly assigned to one of two different exam preparation strategies, 10 students per strategy. After the preparation, all students were given the same exam (which is scored from 0 to 100). The researcher calculated the mean exam score for each group of students. The mean exam score for the students assigned to preparation strategy A was 5 points higher than the mean exam score for the students assigned to preparation strategy B.
81MOST Test: Statistical Thinking Open-ended QuestionExplain how the researcher could determine whether the difference in means of 5 points is large enough to claim that one preparation strategy is better than the other. (Be sure to give enough detail that someone else could easily follow your explanation in order to implement your proposed analysis and draw an appropriate inference (conclusion).)
82MOST Test: Sample Scoring Rubric ComponentRandomization-BasedParametric-ProceduresModeling: HypothesisDescribes the null model of no differenceDescribes the Null and Alternative hypothesesContext: HypothesisIdentifies the context in definition of null modelIdentifies the context in definition of null and alternative hypothesesModeling: TestDescribes the simulation: statistic to collect & repetitionStates a z-test & describes checking assumptionsConclusionsDescription identifies how conclusion can be drawn from the test resultsContext: ConclusionConclusion presented in the context of the problem: Refers to the percentage of breakups for Monday or percentage of breakups for each day of the week
83MOST Test: Preliminary Findings CATALST students include a higher percentage of procedural componentsFor all students, responses are weakest on making conclusions and including contextCATALST students more likely to describe the use of technology in their responsesCATALST students tend to describe steps in more detailCATALST students more likely to map a problem to a previously solved problemNon-CATALST students more likely to present a non-statistical explanation for outcomes
84Percent Agree or Strongly Agree Affect Survey:Percent Agree or Strongly AgreeNon-CATALST(n = 453)CATALST(n = 283)Learning to use (software/TinkerPlotsTM) was an important part of learning statistics.68.7%81.5%I would be comfortable using (software/TinkerPlotsTM) to test for a difference between groups after completing this class.66.0%91.0%I would be comfortable using (software/TinkerPlotsTM) to compute an interval estimate for a population parameter after completing this class.61.7%83.7%Learning to (use software/create models in TinkerPlotsTM) helped me learn to think statistically.61.6%84.5%
85CATALST: Interview Study Examined statistical reasoning of 5 tertiary students after Chance Models and Simulation unitEach of two novel problems asked students to reason about the likelihood of a surprising resultAfter five weeks, students’ demonstrated consideration of sampling variability when drawing inferencesStudents’ solutions typically:Considered the value most likely to occur under the chance model (expected value)Demonstrated an awareness of sampling variabilityQuantified the degree of “unusualness”
86CATALST publicationsGarfield, J., delMas, R. & Zieffler, A. (2012). Developing statistical modelers and thinkers in an introductory, tertiary-level statistics course. ZDM: The International Journal on Mathematics Education. Ziegler, L. and Garfield, J. (2013) Exploring students' intuitive ideas of randomness using an iPod shuffle activity. Teaching Statistics, 35(1), 2-7. Isaak, R., Garfield, J. and Zieffler, A. (in press). The Course as Textbook. Technology Innovations in Statistics Education. Garfield, J., Zieffler, A., delMas, R. & Ziegler, L. (under review). A New Role for Probability in the Introductory College Statistics Course. Journal of Statistics Education. delMas, R. , Zieffler, A. & Garfield, J. (under review). Tertiary Students' Reasoning about Samples and Sampling Variation in the Context of a Modeling and Simulation Approach to Inference. Educational Studies in Mathematics.NP: Is the TISE article still in press? Are the other two still under review?
87Thank You for your Participation Catalysts for Change (2012). Statistical thinking: A simulation approach to modeling uncertainty. Minneapolis, MN: CATALST Press. (Purchase at amazon.com) If you have any questions about the CATALST course, please contact anyone of the Pis: Joan Garfield: Robert delMas: Andy Zieffler: We appreciate your feedback – please fill out the workshop evaluationNP: Is the TISE article still in press? Are the other two still under review?