# Stat 155, Section 2, Last Time Producing Data How to Sample? –History of Presidential Election Polls Random Sampling Designed Experiments –Treatments &

## Presentation on theme: "Stat 155, Section 2, Last Time Producing Data How to Sample? –History of Presidential Election Polls Random Sampling Designed Experiments –Treatments &"— Presentation transcript:

Stat 155, Section 2, Last Time Producing Data How to Sample? –History of Presidential Election Polls Random Sampling Designed Experiments –Treatments & Levels –Controls –Randomization

Reading In Textbook Approximate Reading for Today’s Material: Pages 231-240, 256-257 Approximate Reading for Next Class: Pages 259-271, 277-286

Chapter 3: Producing Data (how this is done is critical to conclusions) Section 3.1: Statistical Settings 2 Main Types: I.Observational Study II.Designed Experiment

Random Sampling Key Idea: “random error” is smaller than “unintentional bias”, for large enough sample sizes How large? Current sample sizes: ~1,000 - 3,000 Note: now << 50,000 used in 1948. So surveys are much cheaper (thus many more done now….)

Random Sampling How Accurate? Can (& will) calculate using “probability” Justifies term “scientific sampling” 2 nd improvement over quota sampling

Controlled Experiments Common Type: compare “treatment” with “placebo”, a “sham treatment” that controls for psychological effects (think you are better, just because you are treated, so you are better…) Called a “blind” experiment

Controlled Experiments Further Refinement: “Double Blind” experiment means neither patient, nor doctor knows is real or not Eliminates possible doctor bias

Design of Experiments 2.Randomization Useful method for choosing groups above (e.g. Treatment and Control) Recall: Different from “just choose some”, instead means “make each equally likely”

Design of Experiments 2.Randomization Big Plus: Eliminates biases, i.e. effects of “lurking variables” (same as random choice of samples, again pay price of added variability, but well worth it)

Example of an Experiment (to tie above ideas together) Gastric Freezing: Treatment for stomach ulcers –Anesthetize patient –Put balloon in stomach –Fill with freezing coolant

Gastric Freezing Initial Experiment, 1958 24 patients, all cured Became popular, and better than surgery But there were some skeptics…. Was it a Placebo Effect??? I.e. was fact of “some type of treatment” enough for “cure”

Gastric Freezing Approach, 1963: (i)Controlled Experiment (some treated others not, shows who gets better with no treatment) (ii)Randomize: Eliminates other sources of bias, i.e. lurking variables (randomly choose: treated or not)

Gastric Freezing (iii)“Blind” Patient doesn’t know if treated (Got a balloon in stomach or not? Both groups got that, but only Treatment group got freezing coolant) (iv)“Double Blind”: Doctor doesn’t know if treated or not. (somebody else controls freezing coolant) Important: since doctor decides if “cured”

Gastric Freezing Results: Treatment Group: 82 Control Group: 78 Initially: Treatment Control No Symptoms: 29% 29% Improved: 47% 39% After 24 Months: Relapse: 45% 39%

Gastric Freezing Results: No strong effect of treatment over control is apparent. All placebo effect? Analysis: Will build tools to show: “Difference within natural variation, assuming there is no difference”

Gastric Freezing Historical Notes: Famous case for eliminating “ineffective treatments” Showed importance of double blind controlled experiments That are commonly used today Stomach ulcers currently very effectively treated with drugs

And Now for Something Completely Different Pepsi Challenge: Try a blind taste test of Pepsi vs. Coca-Cola Successful ad campaign Most thought Pepsi tasted better Many were surprised by blind test result

Class Experiment Pepsi Challenge Do a careful “double blind approach” –Taste test Pepsi vs. Coke –Where “taster” doesn’t know which is which (“blind” part of experiment) –And same for “giver” (“double blind” part)

Pepsi Challenge Approach: Groups of 3 Each does each job once: Pourer (put your name and others on slip, pour cups outside room, and stays out) Giver (also puts names on slip, goes to get cups after pouring) Taster (always inside room) Giver: record results on their slip

Pepsi Challenge Ideas: Create “double blind”, i.e. “no knowledge of doctor” by pourer filling cups in room, so that giver does not see Avoid “color association” by randomizing Pourer does not watch tasting (no telegraphing with big grin….) After tasting: compare notes, check forms Will report, and analyze results later

Sec. 3.4: Basics of “Inference” Idea: Build foundation for statistical inference, i.e. quantitative analysis (of uncertainty and variability) Fundamental Concepts: Population described by parameters e.g. mean, SD. Unknown, but can get information from…

Fundamental Concepts Last page: Population, here think about parameters:, Sample (usually random), described by corresponding “statistics” e.g. mean, SD. (Will become important to keep these apart)

Population vs. Sample E.g. 1: Political Polls Population is “all voters” Parameter of interest is: = % in population for Candidate A (bigger than 50% or not?) Sample is “voters asked by pollsters” Statistic is = % in sample for A (careful to keep these straight!)

Population vs. Sample E.g. 1: Political Polls Notes – is an “estimate” of –Variability is critical –Will construct models of variability –Possible when sample is random –Recall random sampling also reduces bias

Population vs. Sample E.g. 2: Measurement Error (seemingly quite different…) Population is “all possible measurem’ts” (a thought experiment only) Parameters of interest are: = population mean = population SD

Population vs. Sample E.g. 2: Measurement Error Sample is “measurem’ts actually made” Statistics are: = mean of measurements = SD of measurements

Population vs. Sample E.g. 2: Measurement Error Notes: – estimates –Again will model variability –“Randomness” is just a model for measurement error

Population vs. Sample HW: 3.63 3.65

Basic Mathematical Model Sampling Distribution Idea: Model for “possible values” of statistic E.g. 1: Distribution of in “repeated samplings” (thought experiment only) E.g. 2: Distribution of in “repeated samplings” (again thought experiment)

Basic Mathematical Model Sampling Distribution Tools Can study these with: Histograms  “shape”: often Normal Mean  Gives measure of “bias” SD  Gives measure of “variation”

Bias and Variation Graphical Illustration Scanned from text: Fig. 3.12

Bias and Variation Class Example: Results from previous class on “Estimate % of males at UNC” http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg17.xls Recall several approaches to estimation (3 bad, one sensible)

E.g. % Males at UNC At top: –Counts –Corresponding proportions (on [0,1] scale) –Bin Grid (for histograms on [0,1] numbers) Next Part: –Summarize mean of each –Summarize SD (spread) of each Histograms (appear next)

E.g. % Males at UNC Recall 4 way to collect data: Q1:Sample from class Q2:Stand at door and tally –Q1 “less spread and to left”? Q3:Make up names in head –Q3 “more to right”? Q4:Random Sample –Supposed to be best, can we see it?

E.g. % Males at UNC Better comparison: Q4 vs. each other one Use “interleaved histograms” Q1 & Q4: –Q1 has smaller center: –i.e. “biased”, since Class Population –And less spread: –since “drawn from smaller pool”

E.g. % Males at UNC Q2 & Q4: –Centers have Q2 bigger: –Reflects bias in door choice –And Q2 is “more spread” : –Reflects “spread in doors chosen” + “sampling spread”

E.g. % Males at UNC Q3 & Q4: –Center for Q3 is bigger: –Reflects “more people think of males”? –And Q3 is “more spread” : –Reflects “more variation in human choice”

E.g. % Males at UNC A look under the hood: Highlight an interleaved Chart Click Chart Wizard Note Bar (and interleaved subtype) Different colors are in “series” Computed earlier on left Using Tools  Data Anal.  Histo’m

E.g. % Males at UNC Interesting question: What is “natural variation”? Will model this soon. This is “binomial” part of this example, which we will study later.

Bias and Variation HW: 3.66 (Hi bias – hi var, lo bias – lo var, lo bias – hi var, hi bias – lo var) 3.69

And Now for Something Completely Different Cool movie suggested by Sander Buitelaar http://www.youtube.com/watch?v=G5QlDkgmtw8 Street scene in Amsterdam Photography conveys situation “Plein” = Plaza “football dribble” = …

Chapter 4: Probability Goal: quantify (get numerical) uncertainty Key to answering questions above (e.g. what is “natural variation” in a random sample?) (e.g. which effects are “significant”) Idea: Represent “how likely” something is by a number

Simple Probability E.g. (will use for a while, since simplicity gives easy insights) Roll a die (6 sided cube, faces 1,2,…,6) 1 of 6 faces is a “4” So say “chances of a 4” are: “1 out of 6”. What does that number mean? How do we find such for harder problems?

Simple Probability A way to make this precise: “Frequentist Approach” In many replications (repeat of die roll), expect about of total will be 4s Terminology (attach buzzwords to ideas): Think about “outcomes” from an “experiment” e.g. #s on die e.g. roll die, observe #

Simple Probability Quantify “how likely” outcomes are by assigning “probabilities” I.e. a number between 0 and 1, to each outcome, reflecting “how likely”: Intuition: 0 means “can’t happen” ½ means “happens half the time” 1 means “must happen”

Simple Probability HW: C9: Match one of the probabilities: 0, 0.01, 0.3, 0.6, 0.99, 1 with each statement about an event: a.Impossible, can’t occur. b.Certain, will happen on every trial. c.Very unlikely, but will occur once in a long while. d.Event will occur more often than not.

Similar presentations