Presentation is loading. Please wait.

Presentation is loading. Please wait.

 David Salsburg AP Statistics Reading Daytona Beach, Florida June 16, 2011.

Similar presentations

Presentation on theme: " David Salsburg AP Statistics Reading Daytona Beach, Florida June 16, 2011."— Presentation transcript:

1  David Salsburg AP Statistics Reading Daytona Beach, Florida June 16, 2011

2  William Harvey, circulation of the blood, 1628  Bishop of Chichester: o Harvey was wrong because he used experimentation and, “It is well known that Nature abhors experimentation and will purposely do things wrong if you attempt to experiment.”

3  The Lanarkshire Milk Study (1929) o Question: Does Pasteurization take the “good” out of the milk?  How do you measure the “good” in milk? o Measure weight gain in children as a surrogate.  Yule: “In our lust for measurement, we frequently measure that which we can, rather than that which we wish to measure, and forget that there is a difference.” o Measures of “intelligence”

4  If children were to be used, which children? o in school  Where? o Easily available in London or Manchester, but too heterogeneous a population, too much variability in socioeconomic factors. o Lanarkshire County, Scotland, population 300,000,evenly divided into small factory towns and rural communities.  How many children? o Neyman-Pearson concept of power not yet published.

5  20,000 children, 200-400 per school, several grades o 5,000 randomly assigned an extra daily ration of raw milk o 5,000 randomly assigned an extra daily ration of Pasteurized milk o 10,000 randomly assigned to no extra milk—controls  Study ran from Feb-June, 1930, the children weighed at the beginning of the study and at the end.

6 1) Average weight gain for children on raw milk almost exactly the same as average weight gain for children on Pasteurized milk. 2) Average weight gain for children kept as controls (no extra milk) three times the average weight gain of the two other groups.  No loss of “good” in milk (as measured by weight gain) when pasteurized  Best not to give children any extra milk, raw or pasteurized!

7  Royal Commission sent to investigate o William Sealy Gossett (“Student”) chairman  Conclusion: The teachers had been told to “randomly assign” but many of them took pity on the sickly and poor students and assigned them the extra milk.

8  Can you do it with haphazard choice by humans? o Problem of digit preference  Can you let “nature” do it? o Toxicological studies of mice.

9  Last two digits of populations of English towns in the 1921 census. o A table of 7500 two digit numbers arranged in blocks of 25. First block of 25: 03 47 43 73 86 36 96 47 36 61 46 98 63 71 62 33 26 16 80 45 60 11 14 10 95

10  Martin Gardiner (Scientific American): “This is the quintessential book of the Twentieth Century. Not only was no book produced like this in previous centuries, no one would have ever conceived of a book like this in previous centuries.”

11 1) You do not start at the beginning. Otherwise, all randomizations would be the same. 2) You do not begin haphazardly (at random?). Books tend to have broken binding so haphazard openings often are at the same page.

12 1. You open the book haphazardly and pick a point to start haphazardly. 2. You pick out three digits, two digits, two more digits, and one digit.  You go to the page indicated by the three digits, the line indicated by the first of the two digits, the column indicated by the second of the two digits. Then you proceed up and to the left (at the top of the page) if the final single digit is odd—or down and to the right if it is even.

13  I open it haphazardly (to page 2) and pick a spot haphazardly, yielding the following sequence 2, 12, 23, 6  I go to page 2, line 12, column 23, and go left and up from there.  This yields the sequence: 67, 96, 57, 88, 30, 22, 23, 51, 14, 40, 24, 96,…

14  Suppose I have three treatments, A, B, and C, to be applied to blocks of three A, B, C / A, B, C / A,…  I append the sequence of numbers to this sequence of symbols A-67, B-96, C-57/ A-88,B-30,C-22/ A-23, B-51, C-14/…  I reorder the symbols A, B, C within each block following the order of the random numbers CAB/CBA/CAB/BAC…

15  Use computer algorithm to generate a pseudo-random sequence.  Most popular method, congruence generator: X(i+1) = res( AX(i) + B | C) o A,B,C are mutually prime. o The congruence generator cycles after K values, but K is a function of X(1), A, B, and C and can be calculated.

16  Can a pseudo-random number generator produce truly “random” numbers?  Fisher: Foolish question. All that is needed is that all possible treatment assignments be equally probable.

17  NAACP and jury lists in Texas counties (1960s)  Knut-Vik designs o “Student” (1932) showed that Knut-Vik designs produce biased (downwards) estimates of the residual variance. o Fisher (1935) random assignment produces the least variance of all unbiased designs.

18  Women’s Health Initiative Study of aspirin vrs placebo to prevent heart attacks or cardiovascular death in women. (March, 2005, New England J. of Medicine)  Question: Does low dose aspirin prevent cardiovascular problems for women as it does for men?  All but one of prior studies had used only men.  Consistent finding: 81 mg aspirin a day reduces the incidence of non-fatal heart attacks by app. 30% and the incidence of cardiovascular related death by app. 20%.  One study that did use women as well as men enrolled 214 women, reduced incidence of cardiovascular related death by 9% (not statistically significant).

19 1) Large number of women (39,876) because incidence of cardiovascular events lower in women than in men. 2) Higher daily dose of aspirin (100 mg) 3) Longer follow-up (10 years vrs 5 in men’s studies) 4) Single predefined end-point: Stroke, MI, or cardiovascular related death.  Problems with the end-point: 1. Equivocal symptoms when patients arrive in emergency rooms 2. Death certificates unreliable 3. What happens if a patient has multiple events over the 10 year period?  Solution: Set up elaborate check-list to “define” the events of interest. Choose only the first such event in a patient’s record to count.

20  477 women on aspirin had a cardiovascular event  522 women on placebo had a cardiovascular event  p-value of the comparison—0.13

21  Neyman’s original definition (1934) o “On the two different aspects of the representative method,” J. Royal Statistical Society, vol. 97, pp. 558-625. 1. The paper establishes the fundamental ideas of survey sampling. It was used by the statisticians in the U.S. Bureau of Labor Statistics to establish the first surveys of unemployment. 2. An appendix establishes the fundamental ideas of confidence intervals. A Confidence interval on a parameter θ is a set of hypotheses about the value of θ that cannot be rejected by the data.

22  Bayesian o The expected coverage of the computed confidence interval is 0.95 regardless of the prior distribution on θ.  Frequentist (derived by Neyman to meet Harold Hottelling’s criticism of the Bayesian definition) o 95% of all confidence intervals computed this way will contain the true value of θ. o Anscombe: “What has the statistician’s long run probability of error to do with whether this patient should be given this treatment?”

23  They computed 95% confidence bounds on the ratio Prob{event|aspirin}/Prob{event|placebo} 95% C.I. = [0.80, 1.03]  Interpretation: Use of low dose aspirin in women might reduce incidence of cardiovascular events by as much as 20% (or increase it by as much as 3%)

24  Modern clinical studies cost more than $10,000 per patient.  100,000 subject study would cost > $1 billion.

25  L.J. Cohen, philosopher at Oxford University o Critic of the use of statistical models in science. o One can never come to a certain conclusion with statistical models alone. o To reach a scientific conclusion, it is necessary to bring in information external to the experimental study. o (Cohen’s solution is to replace hypothesis testing with modal valued logic, a system of symbolic logic that denies the law of the excluded middle.)

26 1. The pharmacological mechanism of low dose aspirin is firmly established and is not gender related in experimental animals. 2. The cost of a false positive is small. Aspirin is cheap. Low doses of aspirin are very safe for most people. 3. The cost of a false negative, if the use of low dose aspirin decreases CV events by 20%, is immense. Conclusion: Women should be given daily low doses of aspirin to prevent cardiovascular events.

27  Side note: All the male studies and this women’s study of low dose aspirin have shown a consistent 8-fold increase in the incidence of hemorrhagic stroke for patients on aspirin—the comparison sometimes reaching statistical significance.

Download ppt " David Salsburg AP Statistics Reading Daytona Beach, Florida June 16, 2011."

Similar presentations

Ads by Google