Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics

Similar presentations


Presentation on theme: "Descriptive Statistics"— Presentation transcript:

1 Descriptive Statistics

2 Bar Chart for SI categories
Number of Patients Shock Index Category 0.0 16.7 33.3 50.0 66.7 83.3 100.0 116.7 133.3 150.0 166.7 183.3 200.0 1 2 3 4 5 6 7 8 9 10 Much easier to extract information from a bar chart than from a table!

3 Box plot and histograms: for continuous variables
To show the distribution (shape, center, range, variation) of continuous variables. Does everybody know what I mean when I say percentiles? What is the median? Anyone?

4 Box Plot: Shock Index Shock Index Units 2.0 1.3 0.7 0.0 maximum (1.7)
SI Box Plot: Shock Index Shock Index Units maximum (1.7) Outliers Q IQR = (.25)=1.175 “whisker” 75th percentile (0.8) interquartile range (IQR) = = .25 median (.66) 25th percentile (0.55) minimum (or Q1- 1.5IQR)

5 Histogram of SI Percent SI Bins of size 0.1 Note the “right skew” 0.0
8.3 16.7 25.0 0.7 1.3 2.0 Histogram of SI SI Percent Bins of size 0.1 Note the “right skew” 1. Bin sizes may be altered. 2. How many people do you think are in bin ? 3. Where do you think the center of the data are (what's your best guess at the average weight)? 4. On average, how far do you think a given woman is from the center/mean?

6 100 bins (too much detail)

7 2 bins (too little detail)

8 Box Plot: Shock Index Shock Index Units Also shows the “right skew”
0.0 0.7 1.3 2.0 SI Box Plot: Shock Index Shock Index Units Also shows the “right skew”

9 Box Plot: Age Years Variables More symmetric 100.0 66.7 33.3 0.0
maximum More symmetric 66.7 75th percentile interquartile range median Years 25th percentile 33.3 minimum 0.0 AGE Variables

10 Histogram: Age Percent AGE (Years)
0.0 4.7 9.3 14.0 33.3 66.7 100.0 AGE (Years) Percent Not skewed, but not bell- shaped either…

11 Some histograms from your class (n=24)
Starting with politics…

12

13

14 Feelings about math and writing…

15 Optimism…

16 Diet…

17 Habits…

18 Measures of central tendency
Mean Median Mode

19 Central Tendency Mean – the average; the balancing point
calculation: the sum of values divided by the sample size Balance the Bell Curve on a point. Where is the point of balance, average mass on each side. In math shorthan d:

20 Mean: example Some data: Age of participants:

21 Mean of age in Kline’s data
Means Section of AGE Geometric Harmonic Parameter Mean Median Mean Mean Sum Mode Value 0.0 4.7 9.3 14.0 33.3 66.7 100.0 Percent

22 Mean of age in Kline’s data
0.0 4.7 9.3 14.0 33.3 66.7 100.0 Percent The balancing point

23 Mean of Pulmonary Embolism? (Binary variable?)
80.56% (750) 19.44% (181)

24 Mean The mean is affected by extreme values (outliers) Mean = 3
Mean = 3 Mean = 4 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

25 Central Tendency Median – the exact middle value Calculation:
If there are an odd number of observations, find the middle value If there are an even number of observations, find the middle two values and average them.

26 Median: example Some data:
Age of participants: Median = (22+23)/2 = 22.5

27 Median of age in Kline’s data
Means Section of AGE Geometric Harmonic Parameter Mean Median Mean Mean Sum Mode Value 0.0 4.7 9.3 14.0 33.3 66.7 100.0 AGE (Years) Percent

28 Median of age in Kline’s data
0.0 4.7 9.3 14.0 33.3 66.7 100.0 Percent 50% of mass 50% of mass

29 Does PE have a median? Yes, if you line up the 0’s and 1’s, the middle number is 0.

30 Median The median is not affected by extreme values (outliers).
Median = 3 Median = 3 SSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

31 Central Tendency Mode – the value that occurs most frequently

32 Mode: example Some data: Age of participants: 17 19 21 22 23 23 23 38
Mode = 23 (occurs 3 times)

33 Mode of age in Kline’s data
Means Section of AGE Geometric Harmonic Parameter Mean Median Mean Mean Sum Mode Value

34 Mode of PE? 0 appears more than 1, so 0 is the mode.

35 Measures of Variation/Dispersion
Range Percentiles/quartiles Interquartile range Standard deviation/Variance

36 Range Difference between the largest and the smallest observations.

37 Range of age: 94 years-15 years = 79 years
14.0 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years)

38 Range of PE? 1-0 = 1

39 Quartiles 25% 25% 25% 25% Q 1 Q 2 Q 3 The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger Q2 is the same as the median (50% are smaller, 50% are larger) Only 25% of the observations are greater than the third quartile

40 Interquartile Range Interquartile range = 3rd quartile – 1st quartile = Q3 – Q1

41 Interquartile Range: age
Median (Q2) Q1 Q3 maximum minimum 25% % % % Interquartile range = 65 – 35 = 30

42 Variance Average (roughly) of squared deviations of values from the mean

43 Why squared deviations?
Adding deviations will yield a sum of 0. Absolute values are tricky! Squares eliminate the negatives. Result: Increasing contribution to the variance as you go farther from the mean.

44 Standard Deviation Most commonly used measure of variation
Shows variation about the mean Has the same units as the original data

45 Calculation Example: Sample Standard Deviation
Age data (n=8) : n = Mean = X = 23.25

46 Std. dev is a measure of the “average” scatter around the mean.
14.0 Estimation method: if the distribution is bell shaped, the range is around 6 SD, so here rough guess for SD is 79/6 = 13 9.3 Percent 4.7 0.0 0.0 33.3 66.7 100.0 AGE (Years)

47 Std. Deviation age Variation Section of AGE Standard
Parameter Variance Deviation Value

48 Std Dev of Shock Index Count SI
250.0 Std. dev is a measure of the “average” scatter around the mean. 187.5 Estimation method: if the distribution is bell shaped, the range is around 6 SD, so here rough guess for SD is 1.4/6 =.23 Count 125.0 1. Bin sizes may be altered. 2. How many people do you think are in bin ? 3. Where do you think the center of the data are (what's your best guess at the average weight)? 4. On average, how far do you think a given woman is from the center/mean? 62.5 0.0 0.0 0.5 1.0 1.5 2.0 SI

49 Std. Deviation SI Variation Section of SI
Standard Std Error Interquartile Parameter Variance Deviation of Mean Range Range Value E E

50 Std. Dev of binary variable, PE
Std. dev is a measure of the “average” scatter around the mean. 80.56% 19.44%

51 Std. Deviation PE Variation Section of PE Standard
Parameter Variance Deviation Value

52 Comparing Standard Deviations
Data A Mean = 15.5 S = 3.338 Data B Mean = 15.5 S = 0.926 Data C Mean = 15.5 S = 4.570 SSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

53 Bienaymé-Chebyshev Rule
Regardless of how the data are distributed, a certain percentage of values must fall within K standard deviations from the mean: Note use of  (sigma) to represent “standard deviation.” Note use of  (mu) to represent “mean”. within At least (1 - 1/12) = 0% …….….. k=1 (μ ± 1σ) (1 - 1/22) = 75% … k=2 (μ ± 2σ) (1 - 1/32) = 89% ………....k=3 (μ ± 3σ)

54 Symbol Clarification S = Sample standard deviation (example of a “sample statistic”)  = Standard deviation of the entire population (example of a “population parameter”) or from a theoretical probability distribution X = Sample mean µ = Population or theoretical mean

55 **The beauty of the normal curve:
No matter what  and  are, the area between - and + is about 68%; the area between -2 and +2 is about 95%; and the area between -3 and +3 is about 99.7%. Almost all values fall within 3 standard deviations.

56 68-95-99.7 Rule 68% of the data 95% of the data 99.7% of the data
SAY: within 1 standard deviation either way of the mean within 2 standard deviations of the mean within 3 standard deviations either way of the mean WORKS FOR ALL NORMAL CURVES NO MATTER HOW SKINNY OR FAT 95% of the data 99.7% of the data

57 Summary of Symbols S2= Sample variance S = Sample standard dev
2 = Population (true or theoretical) variance  = Population standard dev. X = Sample mean µ = Population mean IQR = interquartile range (middle 50%)

58 Examples of bad graphics

59 What’s wrong with this graph?
from: ER Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut, 1983, p.69

60 Notice the X-axis From: Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot Wainer, H. 1997, p.29.

61 Correctly scaled X-axis…

62 Report of the Presidential Commission on the Space Shuttle Challenger Accident, 1986 (vol 1, p. 145)
The graph excludes the observations where no O-rings failed.

63 Smooth curve at least shows the trend toward failure at high and low temperatures…

64 Even better: graph all the data (including non- failures) using a logistic regression model
Tappin, L. (1994). "Analyzing data relating to the Challenger disaster". Mathematics Teacher, 87,

65 What’s wrong with this graph?
from: ER Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut, 1983, p.74

66

67 What’s the message here?
Diagraphics II, 1994

68 Diagraphics II, 1994

69 From: Johnson R. Just the Essentials of Statistics. Duxbury Press, 1995.

70 From: Johnson R. Just the Essentials of Statistics. Duxbury Press, 1995.

71 From: Johnson R. Just the Essentials of Statistics. Duxbury Press, 1995.

72 From: Johnson R. Just the Essentials of Statistics. Duxbury Press, 1995.

73 For more examples…

74 “Lying” with statistics
More accurately, misleading with statistics…

75 Example 1: projected statistics
Lifetime risk of melanoma: 1935: 1/1500 1960: 1/600 1985: 1/150 2000: 1/74 2006: 1/60

76 Example 1: projected statistics
How do you think these statistics are calculated? How do we know what the lifetime risk of a person born in 2006 will be?

77 Example 1: projected statistics
Interestingly, a clever clinical researcher recently went back and calculated (using SEER data) the actual lifetime risk (or risk up to 70 years) of melanoma for a person born in 1935. The answer? Closer to 1/150 (one order of magnitude off) (Martin Weinstock of Brown University, AAD conference 2006)

78 Example 2: propagation of statistics
In many papers and reviews of eating disorders in women athletes, authors cite the statistic that 15 to 62% of female athletes have disordered eating. I’ve found that this statistic is attributed to about 50 different sources in the literature and cited all over the place with or without citations...

79 For example… In a recent review (Hobart and Smucker, The Female Athlete Triad, American Family Physician, 2000): “Although the exact prevalence of the female athlete triad is unknown, studies have reported disordered eating behavior in 15 to 62 percent of female college athletes.” No citations given.

80 And… Fact Sheet on eating disorders:
“Among female athletes, the prevalence of eating disorders is reported to be between 15% and 62%.” Citation given: Costin, Carolyn. (1999) The Eating Disorder Source Book: A comprehensive guide to the causes, treatment, and prevention of eating disorders. 2nd edition. Lowell House: Los Angeles.

81 And… From a Fact Sheet on disordered eating from a college website:
“Eating disorders are significantly higher (15 to 62 percent) in the athletic population than the general population.” No citation given.

82 And… “Studies report between 15% and 62% of college women engage in problematic weight control behaviors (Berry & Howe, 2000).” (in The Sport Journal, 2004) Citation: Berry, T.R. & Howe, B.L. (2000, Sept). Risk factors for disordered eating in female university athletes. Journal of Sport Behavior, 23(3),

83 And… 1999 NY Times article “But informal surveys suggest that 15 percent to 62 percent of female athletes are affected by disordered behavior that ranges from a preoccupation with losing weight to anorexia or bulimia.”

84 And “It has been estimated that the prevalence of disordered eating in female athletes ranges from 15% to 62%.” ( in Journal of General Internal Medicine 15 (8),  ) Citations: Steen SN. The competitive athlete. In: Rickert VI, ed. Adolescent Nutrition: Assessment and Management. New York, NY: Chapman and Hall; 1996: Tofler IR, Stryer BK, Micheli LJ. Physical and emotional problems of elite female gymnasts. N Engl J Med. 1996;335:281 3.

85 Where did the statistics come from?
The 15%: Dummer GM, Rosen LW, Heusner WW, Roberts PJ, and Counsilman JE. Pathogenic weight-control behaviors of young competitive swimmers. Physician Sportsmed 1987; 15: The “to”: Rosen LW, McKeag DB, O’Hough D, Curley VC. Pathogenic weight-control behaviors in female athletes. Physician Sportsmed. 1986; 14: The 62%:Rosen LW, Hough DO. Pathogenic weight-control behaviors of female college gymnasts. Physician Sportsmed 1988; 16:

86 Where did the statistics come from?
Study design? Control group? Cross-sectional survey (all) No non-athlete control groups Population/sample size? Convenience samples Rosen et al. 1986: 182 varsity athletes from two midwestern universities (basketball, field hockey, golf, running, swimming, gymnastics, volleyball, etc.) Dummer et al. 1987: year old swimmers at a swim camp Rosen et al. 1988: 42 college gymnasts from 5 teams at an athletic conference

87 Where did the statistics come from?
Measurement? Instrument: Michigan State University Weight Control Survey Disordered eating = at least one pathogenic weight control behavior: Self-induced vomiting fasting Laxatives Diet pills Diuretics In the 1986 survey, they required use 1/month; in the 1988 survey, they required use twice-weekly In the 1988 survey, they added fluid restriction

88 Where did the statistics come from?
Findings? Rosen et al. 1986: 32% used at least one “pathogenic weight-control behavior” (ranges: 8% of 13 basketball players to 73.7% of 19 gymnasts) Dummer et al. 1987: 15.4% of swimmers used at least one of these behaviors Rosen et al. 1988: 62% of gymnasts used at least one of these behaviors

89 References http://www.math.yorku.ca/SCS/Gallery/
Kline et al. Annals of Emergency Medicine 2002; 39: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall Tappin, L. (1994). "Analyzing data relating to the Challenger disaster". Mathematics Teacher, 87, Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut, Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot Wainer, H

90 (Basic Probability and Counting Methods)
Gambling, Probability, and Risk (Basic Probability and Counting Methods) 90 90

91 A gambling experiment Everyone in the room takes 2 cards from the deck (keep face down) Rules, most to least valuable: Pair of the same color (both red or both black) Mixed-color pair (1 red, 1 black) Any two cards of the same suit Any two cards of the same color In the event of a tie, highest card wins (ace is top)

92 What do you want to bet? Look at your two cards. Will you fold or bet?
What is the most rational strategy given your hand?

93 Rational strategy There are N people in the room
What are the chances that someone in the room has a better hand than you? Need to know the probabilities of different scenarios We’ll return to this later in the lecture…

94 Probability Probability – the chance that an uncertain event will occur (always between 0 and 1) Symbols: P(event A) = “the probability that event A will occur” P(red card) = “the probability of a red card” P(~event A) = “the probability of NOT getting event A” [complement] P(~red card) = “the probability of NOT getting a red card” P(A & B) = “the probability that both A and B happen” [joint probability] P(red card & ace) = “the probability of getting a red ace” 94

95 Assessing Probability
1. Theoretical/Classical probability—based on theory (a priori understanding of a phenomena) e.g.: theoretical probability of rolling a 2 on a standard die is 1/6 theoretical probability of choosing an ace from a standard deck is 4/52 theoretical probability of getting heads on a regular coin is 1/2 2. Empirical probability—based on empirical data e.g.: you toss an irregular die (probabilities unknown) 100 times and find that you get a 2 twenty-five times; empirical probability of rolling a 2 is 1/4 empirical probability of an Earthquake in Bay Area by is .62 (based on historical data) empirical probability of a lifetime smoker developing lung cancer is 15 percent (based on empirical data)

96 Recent headlines on earthquake probabiilites…
taly-quake-experts-manslaughter-charge

97 Computing theoretical probabilities:counting methods
Great for gambling! Fun to compute! If outcomes are equally likely to occur… Note: these are called “counting methods” because we have to count the number of ways A can occur and the number of total possible outcomes.

98 Counting methods: Example 1
Example 1: You draw one card from a deck of cards. What’s the probability that you draw an ace?

99 Counting methods: Example 2
Example 2. What’s the probability that you draw 2 aces when you draw two cards from the deck? This is a “joint probability”—we’ll get back to this on Wednesday

100 Counting methods: Example 2
Two counting method ways to calculate this: 1. Consider order: Numerator: AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, AA, or AA = 12 . 52 cards 51 cards Denominator = 52x51 = why?

101 Counting methods: Example 2
2. Ignore order: Numerator: AA, AA, AA, AA, AA, AA = 6 Denominator = Divide out order!

102 Summary of Counting Methods
Counting methods for computing probabilities Permutations— order matters! Combinations— Order doesn’t matter With replacement Without replacement Without replacement

103 Summary of Counting Methods
Counting methods for computing probabilities Permutations— order matters! With replacement Without replacement

104 Permutations—Order matters!
A permutation is an ordered arrangement of objects. With replacement=once an event occurs, it can occur again (after you roll a 6, you can roll a 6 again on the same die). Without replacement=an event cannot repeat (after you draw an ace of spades out of a deck, there is 0 probability of getting it again).

105 Summary of Counting Methods
Counting methods for computing probabilities Permutations— order matters! With replacement

106 Permutations—with replacement
With Replacement – Think coin tosses, dice, and DNA. “memoryless” – After you get heads, you have an equally likely chance of getting a heads on the next toss (unlike in cards example, where you can’t draw the same card twice from a single deck). What’s the probability of getting two heads in a row (“HH”) when tossing a coin? H T Toss 1: 2 outcomes Toss 2: 22 total possible outcomes: {HH, HT, TH, TT}

107 Permutations—with replacement
What’s the probability of 3 heads in a row? H T Toss 1: 2 outcomes Toss 2: Toss 3: HHH HHT HTH HTT THH THT TTH TTT

108 Permutations—with replacement
When you roll a pair of dice (or 1 die twice), what’s the probability of rolling 2 sixes? What’s the probability of rolling a 5 and a 6?

109 Summary: order matters, with replacement
Formally, “order matters” and “with replacement” use powers

110 Summary of Counting Methods
Counting methods for computing probabilities Permutations— order matters! Without replacement

111 Permutations—without replacement
Without replacement—Think cards (w/o reshuffling) and seating arrangements.   Example: You are moderating a debate of gubernatorial candidates. How many different ways can you seat the panelists in a row? Call them Arianna, Buster, Camejo, Donald, and Eve.

112 Permutation—without replacement
 “Trial and error” method: Systematically write out all combinations: A B C D E A B C E D A B D C E A B D E C A B E C D A B E D C . Quickly becomes a pain! Easier to figure out patterns using a the probability tree!

113 Permutation—without replacement
B A C D ……. Seat One: 5 possible Seat Two: only 4 possible Etc…. # of permutations = 5 x 4 x 3 x 2 x 1 = 5! There are 5! ways to order 5 people in 5 chairs (since a person cannot repeat)

114 Permutation—without replacement
What if you had to arrange 5 people in only 3 chairs (meaning 2 are out)? E B A C D Seat One: 5 possible Seat Two: Only 4 possible Seat Three: only 3 possible

115 Permutation—without replacement
Note this also works for 5 people and 5 chairs:

116 Permutation—without replacement
How many two-card hands can I draw from a deck when order matters (e.g., ace of spades followed by ten of clubs is different than ten of clubs followed by ace of spades) . 52 cards 51 cards

117 Summary: order matters, without replacement
Formally, “order matters” and “without replacement” use factorials

118 Practice problems: A wine taster claims that she can distinguish four vintages or a particular Cabernet. What is the probability that she can do this by merely guessing (she is confronted with 4 unlabeled glasses)? (hint: without replacement) In some states, license plates have six characters: three letters followed by three numbers. How many distinct such plates are possible? (hint: with replacement)

119 Answer 1 A wine taster claims that she can distinguish four vintages or a particular Cabernet. What is the probability that she can do this by merely guessing (she is confronted with 4 unlabeled glasses)? (hint: without replacement) P(success) = 1 (there’s only way to get it right!) / total # of guesses she could make Total # of guesses one could make randomly: glass one: glass two: glass three: glass four: 4 choices 3 vintages left 2 left no “degrees of freedom” left = 4 x 3 x 2 x 1 = 4! P(success) = 1 / 4! = 1/24 =

120 Answer 2 In some states, license plates have six characters: three letters followed by three numbers. How many distinct such plates are possible? (hint: with replacement) 263 different ways to choose the letters and 103 different ways to choose the digits  total number = 263 x 103 = 17,576 x 1000 = 17,576,000

121 Counting methods for computing probabilities
Summary of Counting Methods Counting methods for computing probabilities Combinations— Order doesn’t matter Without replacement

122 2. Combinations—Order doesn’t matter
Introduction to combination function, or “choosing” Written as: Spoken: “n choose r”

123 Combinations How many two-card hands can I draw from a deck when order does not matter (e.g., ace of spades followed by ten of clubs is the same as ten of clubs followed by ace of spades) . 52 cards 51 cards

124 Combinations How many five-card hands can I draw from a deck when order does not matter? 48 cards 49 cards . 50 cards . 51 cards . 52 cards . .

125 Combinations 1. 2. 3. …. How many repeats total??

126 Combinations 1. 2. 3. …. i.e., how many different ways can you arrange 5 cards…?

127 That’s a permutation without replacement. 5! = 120
Combinations That’s a permutation without replacement. 5! = 120

128 Combinations How many unique 2-card sets out of 52 cards? 5-card sets?
r-card sets? r-card sets out of n-cards?

129 Summary: combinations
If r objects are taken from a set of n objects without replacement and disregarding order, how many different samples are possible? Formally, “order doesn’t matter” and “without replacement” use choosing

130 Examples—Combinations
A lottery works by picking 6 numbers from 1 to How many combinations of 6 numbers could you choose? Which of course means that your probability of winning is 1/13,983,816!

131 Examples How many ways can you get 3 heads in 5 coin tosses?

132 Summary of Counting Methods
Counting methods for computing probabilities Combinations— Order doesn’t matter With replacement: nr Permutations—order matters! Without replacement: n(n-1)(n-2)…(n-r+1)= Without replacement:

133 Gambling, revisited What are the probabilities of the following hands?
Pair of the same color Pair of different colors Any two cards of the same suit Any two cards of the same color

134 Pair of the same color? P(pair of the same color) =
Numerator = red aces, black aces; red kings, black kings; etc.…= 2x13 = 26

135 Any old pair? P(any pair) =

136 Two cards of same suit?

137 Two cards of same color? Numerator: 26C2 x 2 colors = 26!/(24!2!) = 325 x 2 = 650 Denominator = 1326 So, P (two cards of the same color) = 650/1326 = 49% chance A little non-intuitive? Here’s another way to look at it… 26x25 RR 26x26 RB 26x26 BR 26x25 BB . 52 cards 26 red branches 26 black branches From a Red branch: 26 black left, 25 red left From a Black branch: 26 red left, 25 black left 50/102 Not quite 50/100

138 Rational strategy? To bet or fold?
It would be really complicated to take into account the dependence between hands in the class (since we all drew from the same deck), so we’re going to fudge this and pretend that everyone had equal probabilities of each type of hand (pretend we have “independence”)…  Just to get a rough idea...

139 Rational strategy? P(at least one same-color pair in the class)=
**Trick! P(at least 1) = 1- P(0) P(at least one same-color pair in the class)= 1-P(no same-color pairs in the whole class)=

140 Rational strategy? P(at least one pair)= 1-P(no pairs)=
1-(.94)40=1-8%=92% chance P(>=1 same suit)= 1-P(all different suits)= 1-(.765)40= ~ 100% P(>=1 same color) = 1-P(all different colors)= 1-(.51) 40= ~ 100%

141 Rational strategy… Fold unless you have a same-color pair or a numerically high pair (e.g., Queen, King, Ace). How does this compare to class? -anyone with a same-color pair? -any pair? -same suit? -same color?

142 Practice problem: A classic problem: “The Birthday Problem.” What’s the probability that two people in a class of 25 have the same birthday? (disregard leap years) What would you guess is the probability?

143 Birthday Problem Answer
1. A classic problem: “The Birthday Problem.” What’s the probability that two people in a class of 25 have the same birthday? (disregard leap years)  **Trick! 1- P(none) = P(at least one) Use complement to calculate answer. It’s easier to calculate 1- P(no matches) = the probability that at least one pair of people have the same birthday. What’s the probability of no matches? Denominator: how many sets of 25 birthdays are there? --with replacement (order matters) 36525 Numerator: how many different ways can you distribute 365 birthdays to 25 people without replacement? --order matters, without replacement: [365!/(365-25)!]= [365 x 364 x 363 x 364 x ….. (365-24)]   P(no matches) = [365 x 364 x 363 x 364 x ….. (365-24)] /

144 Use SAS as a calculator 0.568699704, so 57% chance!
 Use SAS as calculator… (my calculator won’t do factorials as high as 365, so I had to improvise by using a loop…which you’ll learn later in HRP 223): %LET num = 25; *set number in the class; data null; top=1; *initialize numerator; do j=0 to (&num-1) by 1; top=(365-j)*top; end; BDayProb=1-(top/365**&num); put BDayProb; run;  From SAS log: , so 57% chance!

145 For class of 40 (our class)?
10 %LET num = 40; *set number in the class; 11 data null; top=1; *initialize numerator; do j=0 to (&num-1) by 1; top=(365-j)*top; end; BDayProb=1-(top/365**&num); put BDayProb; 18 run; , i.e. 89% chance of a match!

146 In this class? --Jan? --Feb? --March? --April? --May? --June? --July?
--August? --September? ….

147 And the odds ratio and risk ratio as conditional probability

148 Today’s lecture Probability trees Statistical independence
Joint probability Conditional probability Marginal probability Bayes’ Rule Risk ratio Odds ratio

149 Probability example Sample space: the set of all possible outcomes.
For example, in genetics, if both the mother and father carry one copy of a recessive disease- causing mutation (d), there are three possible outcomes (the sample space): child is not a carrier (DD) child is a carrier (Dd) child has the disease (dd). Probabilities: the likelihood of each of the possible outcomes (always 0 P 1.0). P(genotype=DD)=.25 P(genotype=Dd)=.50 P(genotype=dd)=.25. Note: mutually exclusive, exhaustive probabilities sum to 1.

150 Using a probability tree
Mendel example: What’s the chance of having a heterozygote child (Dd) if both parents are heterozygote (Dd)? ______________ 1.0 P(DD)=.5*.5=.25 P(Dd)=.5*.5=.25 P(dD)=.5*.5=.25 P(dd)=.5*.5=.25 Child’s outcome P(♂D=.5) P(♂d=.5) Father’s allele P(♀D=.5) P(♀d=.5) Mother’s allele Rule of thumb: in probability, “and” means multiply, “or” means add

151 Independence Formal definition: A and B are independent if and only if P(A&B)=P(A)*P(B) The mother’s and father’s alleles are segregating independently. P(♂D/♀D)=.5 and P(♂D/♀d)=.5 Conditional Probability: Read as “the probability that the father passes a D allele given that the mother passes a d allele.” Joint Probability: The probability of two events happening simultaneously. What father’s gamete looks like is not dependent on the mother’s – doesn’t depend which branch you start on! Formally, P(DD)=.25=P(D♂)*P(D♀) Marginal probability: This is the probability that an event happens at all, ignoring all other outcomes.

152 On the tree Conditional probability Marginal probability: mother
Joint probability ______________ 1.0 P(DD)=.5*.5=.25 P(Dd)=.5*.5=.25 P(dD)=.5*.5=.25 P(dd)=.5*.5=.25 Child’s outcome Father’s allele P(♀D=.5) P(♀d=.5) Mother’s allele Marginal probability: father P(♂D/ ♀D )=.5 P(♂d=.5) P(♂D=.5) P(♂d=.5)

153 Conditional, marginal, joint
The marginal probability that player 1 gets two aces is 12/2652. The marginal probability that player 5 gets two aces is 12/2652. The marginal probability that player 9 gets two aces is 12/2652. The joint probability that all three players get pairs of aces is 0. The conditional probability that player 5 gets two aces given that player 1 got 2 aces is (2/50*1/49).

154 Test of independence event A=player 1 gets pair of aces
event B=player 2 gets pair of aces event C=player 3 gets pair of aces P(A&B&C) = 0 P(A)*P(B)*P(C) = (12/2652)3 (12/2652)3  0 Not independent

155 Independent  mutually exclusive
Events A and ~A are mutually exclusive, but they are NOT independent. P(A&~A)= 0 P(A)*P(~A)  0 Conceptually, once A has happened, ~A is impossible; thus, they are completely dependent.

156 Practice problem If HIV has a prevalence of 3% in San Francisco, and a particular HIV test has a false positive rate of and a false negative rate of .01, what is the probability that a random person selected off the street will test positive?

157 Answer P(test +)=.0297+.00097=.03067 P(+&test+)P(+)*P(test+)
Conditional probability: the probability of testing + given that a person is + Joint probability of being + and testing + Marginal probability of carrying the virus. P(test +)=.99 P(test - )= .01 P (+, test +)=.0297 P(+)=.03 P(-)=.97 P(+, test -)=.003 P(test +) = .001 P(test -) = .999 P(-, test +)=.00097 ______________ 1.0 P(-, test -) = Marginal probability of testing positive P(test +)= =.03067 P(+&test+)P(+)*P(test+) .0297 .03* (=.00092)  Dependent!

158 Law of total probability
One of these has to be true (mutually exclusive, collectively exhaustive). They sum to 1.0.

159 Law of total probability
Formal Rule: Marginal probability for event A= Where: B2 B3 B1 A

160 Example 2 A 54-year old woman has an abnormal mammogram; what is the chance that she has breast cancer? 160

161 Example: Mammography sensitiv ity specific ity
______________ 1.0 P(test +)=.90 P(BC+)=.003 P(BC-)=.997 P(test -) = .10 P(test +) = .11 P (+, test +)=.0027 P(+, test -)=.0003 P(-, test +)=.10967 P(-, test -) = P(test -) = .89 Marginal probabilities of breast cancer….(prevalence among all 54- year olds) specific ity P(BC/test+)=.0027/( )=2.4% 161

162 Bayes’ rule

163 Bayes’ Rule: derivation
Definition: Let A and B be two events with P(B)  0. The conditional probability of A given B is: The idea: if we are given that the event B occurred, the relevant sample space is reduced to B {P(B)=1 because we know B is true} and conditional probability becomes a probability measure on B.

164 Bayes’ Rule: derivation
can be re-arranged to: and, since also:

165 Bayes’ Rule: OR From the “Law of Total Probability”

166 Bayes’ Rule: Why do we care?? Why is Bayes’ Rule useful??
It turns out that sometimes it is very useful to be able to “flip” conditional probabilities. That is, we may know the probability of A given B, but the probability of B given A may not be obvious. An example will help…

167 In-Class Exercise If HIV has a prevalence of 3% in San Francisco, and a particular HIV test has a false positive rate of .001 and a false negative rate of .01, what is the probability that a random person who tests positive is actually infected (also known as “positive predictive value”)?

168 Answer: using probability tree
______________ 1.0 P(test +)=.99 P(+)=.03 P(-)=.97 P(test - = .01) P(test +) = .001 P (+, test +)=.0297 P(+, test -)=.003 P(-, test +)=.00097 P(-, test -) = P(test -) = .999 A positive test places one on either of the two “test +” branches. But only the top branch also fulfills the event “true infection.” Therefore, the probability of being infected is the probability of being on the top branch given that you are on one of the two circled branches above.

169 Answer: using Bayes’ rule

170 Practice problem An insurance company believes that drivers can be divided into two classes—those that are of high risk and those that are of low risk. Their statistics show that a high-risk driver will have an accident at some time within a year with probability .4, but this probability is only .1 for low risk drivers. Assuming that 20% of the drivers are high-risk, what is the probability that a new policy holder will have an accident within a year of purchasing a policy? If a new policy holder has an accident within a year of purchasing a policy, what is the probability that he is a high-risk type driver?

171 Answer to (a) Use law of total probability: P(accident)=
Assuming that 20% of the drivers are of high-risk, what is the probability that a new policy holder will have an accident within a year of purchasing a policy? Use law of total probability: P(accident)= P(accident/high risk)*P(high risk) + P(accident/low risk)*P(low risk) = .40(.20) + .10(.80) = = .16

172 Answer to (b) P(high risk/accident)=.08/.16=50%
If a new policy holder has an accident within a year of purchasing a policy, what is the probability that he is a high-risk type driver? P(high-risk/accident)= P(accident/high risk)*P(high risk)/P(accident) =.40(.20)/.16 = 50% Or use tree: P(accident/LR)=.1 ______________ 1.0 P( no acc/HR)=.6 P(accident/HR)=.4 P(high risk)=.20 P(accident, high risk)=.08 P(no accident, high risk)=.12) P(accident, low risk)=.08 P(low risk)=.80 P( no accident/LR)=.9 P(no accident, low risk)=.72 P(high risk/accident)=.08/.16=50%

173 Fun example/bad investment

174 The odds ratio and risk ratio as conditional probability
Conditional Probability for Epidemiology: The odds ratio and risk ratio as conditional probability

175 The Risk Ratio and the Odds Ratio as conditional probability
In epidemiology, the association between a risk factor or protective factor (exposure) and a disease may be evaluated by the “risk ratio” (RR) or the “odds ratio” (OR). Both are measures of “relative risk”— the general concept of comparing disease risks in exposed vs. unexposed individuals.

176 Odds and Risk (probability)
Definitions: Risk = P(A) = cumulative probability (you specify the time period!) For example, what’s the probability that a person with a high sugar intake develops diabetes in 1 year, 5 years, or over a lifetime? Odds = P(A)/P(~A) For example, “the odds are 3 to 1 against a horse” means that the horse has a 25% probability of winning. Note: An odds is always higher than its corresponding probability, unless the probability is 100%.

177 Odds vs. Risk=probability
If the risk is… Then the odds are… ½ (50%) ¾ (75%) 1/10 (10%) 1/100 (1%) 1:1 3:1 1:9 1:99 Note: An odds is always higher than its corresponding probability, unless the probability is 100%.

178 Cohort Studies (risk ratio)
Exposed Not Exposed Disease-free cohort Disease Disease-free Target population Disease Disease-free TIME

179 The Risk Ratio risk to the exposed risk to the unexposed Exposure (E)
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d risk to the exposed risk to the unexposed

180 Hypothetical Data Normal BP Congestive Heart Failure No CHF 1500 3000
Normal BP Congestive Heart Failure No CHF 1500 3000 High Systolic BP 400 1100 2600

181 Case-Control Studies (odds ratio)
Exposed in past Disease (Cases) Not exposed Target population Exposed No Disease (Controls) Not Exposed

182 Case-control study example:
You sample 50 stroke patients and 50 controls without stroke and ask about their smoking in the past.

183 Hypothetical results:
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50

184 What’s the risk ratio here?
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Tricky: There is no risk ratio, because we cannot calculate the risk of disease!!

185 The odds ratio… We cannot calculate a risk ratio from a case- control study. BUT, we can calculate a measure called the odds ratio…

186 The Odds Ratio (OR) Smoker (E) Stroke (D) No Stroke (~D)
Smoker (E) Smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50 These data give: P(E/D) and P(E/~D). Luckily, you can flip the conditional probabilities using Bayes’ Rule: Unfortunately, our sampling scheme precludes calculation of the marginals: P(E) and P(D), but turns out we don’t need these if we use an odds ratio because the marginals cancel out!

187 The Odds Ratio (OR) Odds of exposure in the cases
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d Odds of exposure in the cases Odds of exposure in the controls

188 The Odds Ratio (OR) Odds of disease in the exposed
Odds of disease in the unexposed Odds of exposure in the cases Odds of exposure in the controls But, this expression is mathematically equivalent to: Backward from what we want… The direction of interest!

189 Proof via Bayes’ Rule What we want! = Odds of exposure in the cases
Odds of exposure in the controls Odds of exposure in the cases Bayes’ Rule Odds of disease in the unexposed Odds of disease in the exposed What we want! =

190 The odds ratio here: Smoker (E) Non-smoker (~E) Stroke (D) 15 35
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Interpretation: there is a 2.25-fold higher odds of stroke in smokers vs. non-smokers.

191 Interpretation of the odds ratio:
The odds ratio will always be bigger than the corresponding risk ratio if RR >1 and smaller if RR <1 (the harmful or protective effect always appears larger) The magnitude of the inflation depends on the prevalence of the disease.

192 The rare disease assumption
1 When a disease is rare: P(~D) = 1 - P(D)  1 1

193 The odds ratio vs. the risk ratio
Rare Outcome Odds ratio Odds ratio Risk ratio Risk ratio 1.0 (null) Common Outcome Odds ratio Odds ratio Risk ratio Risk ratio 1.0 (null)

194 Odds ratios in cross-sectional and cohort studies…
Many cohort and cross-sectional studies report ORs rather than RRs even though the data necessary to calculate RRs are available. Why? If you have a binary outcome and want to adjust for confounders, you have to use logistic regression. Logistic regression gives adjusted odds ratios, not risk ratios (more on this in HRP 261). These odds ratios must be interpreted cautiously (as increased odds, not risk) when the outcome is common. When the outcome is common, authors should also report unadjusted risk ratios and/or use a simple formula to convert adjusted odds ratios back to adjusted risk ratios.

195 Example, wrinkle study…
A cross-sectional study on risk factors for wrinkles found that heavy smoking significantly increases the risk of prominent wrinkles. Adjusted OR=3.92 (heavy smokers vs. nonsmokers) calculated from logistic regression. Interpretation: heavy smoking increases risk of prominent wrinkles nearly 4-fold?? The prevalence of prominent wrinkles in non-smokers is roughly 45%. So, it’s not possible to have a 4-fold increase in risk (=180%)! Raduan et al. J Eur Acad Dermatol Venereol Jul 3.

196 Interpreting ORs when the outcome is common…
If the outcome has a 10% prevalence in the unexposed/reference group*, the maximum possible RR=10.0. For 20% prevalence, the maximum possible RR=5.0 For 30% prevalence, the maximum possible RR=3.3. For 40% prevalence, maximum possible RR=2.5. For 50% prevalence, maximum possible RR=2.0. *Authors should report the prevalence/risk of the outcome in the unexposed/reference group, but they often don’t. If this number is not given, you can usually estimate it from other data in the paper (or, if it’s important enough, the authors).

197 Interpreting ORs when the outcome is common…
If data are from a cross-sectional or cohort study, then you can convert ORs (from logistic regression) back to RRs with a simple formula: Where: OR = odds ratio from logistic regression (e.g., 3.92) P0 = P(D/~E) = probability/prevalence of the outcome in the unexposed/reference group (e.g. ~45%) Formula from: Zhang J. What's the Relative Risk? A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes JAMA. 1998;280:

198 For wrinkle study… So, the risk (prevalence) of wrinkles is increased by 69%, not 292%. Zhang J. What's the Relative Risk? A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes JAMA. 1998;280:

199 Sleep and hypertension study…
ORhypertension= 5.12 for chronic insomniacs who sleep ≤ 5 hours per night vs. the reference (good sleep) group. ORhypertension = 3.53 for chronic insomiacs who sleep 5-6 hours per night vs. the reference group. Interpretation: risk of hypertension is increased 500% and 350% in these groups? No, ~25% of reference group has hypertension. Use formula to find corresponding RRs = 2.5, 2.2 Correct interpretation: Hypertension is increased 150% and 120% in these groups. -Sainani KL, Schmajuk G, Liu V. A Caution on Interpreting Odds Ratios. SLEEP, Vol. 32, No. 8, -Vgontzas AN, Liao D, Bixler EO, Chrousos GP, Vela-Bueno A. Insomnia with objective short sleep duration is associated with a high risk for hypertension. Sleep 2009;32:491-7.

200 Practice problem: 1. Suppose the following data were collected on a random sample of subjects (the researchers did not sample on exposure or disease status). Neck pain No Neck Pain Own a cell phone 143 209 Don’t own a cell phone 22 69 Calculate the odds ratio and risk ratio for the association between cell phone usage and neck pain (common outcome).

201 Answer OR = (69*143)/(22*209) = 2.15 RR = (143/352)/(22/91) = 1.68
Neck pain No Neck Pain Own a cell phone 143 209 Don’t own a cell phone 22 69 OR = (69*143)/(22*209) = 2.15 RR = (143/352)/(22/91) = 1.68

202 Practice problem: 2. Suppose the following data were collected on a random sample of subjects (the researchers did not sample on exposure or disease status). Brain tumor No brain tumor Own a cell phone 5 347 Don’t own a cell phone 3 88 Calculate the odds ratio and risk ratio for the association between cell phone usage and brain tumor (rare outcome).

203 Answer OR = (5*88)/(3*347) = .42267 RR = (5/352)/(3/91) = .43087
Brain tumor No brain tumor Own a cell phone 5 347 Don’t own a cell phone 3 88 OR = (5*88)/(3*347) = RR = (5/352)/(3/91) =

204 Thought problem… Another classic first-year statistics problem. You are on the Monty Hall show. You are presented with 3 doors (A, B, C), only one of which has something valuable to you behind it (the others are bogus). You do not know what is behind any of the doors. You choose door A; Monty Hall opens door B and shows you that there is nothing behind it. Then he gives you the option of sticking with A or switching to C. Do you stay or switch? Does it matter?

205 Some Monty Hall links… html?res=9D0CEFDD1E3FF932A15754C 0A &sec=&spon=&pagewant ed=all /science/08tier.html?_r=1&em&ex= &en=81bdecc33f60033e&ei= 5087%0A&oref=slogin /science/08monty.html#

206 Probability Distributions
206

207 Random Variable A random variable x takes on a defined set of values with different probabilities. For example, if you roll a die, the outcome is random (not fixed) and there are 6 possible outcomes, each of which occur with probability one-sixth. For example, if you poll people about their voting preferences, the percentage of the sample that responds “Yes on Proposition 100” is a also a random variable (the percentage will be slightly differently every time you poll). Roughly, probability is how frequently we expect different outcomes to occur if we repeat the experiment over and over (“frequentist” view)

208 Random variables can be discrete or continuous
Discrete random variables have a countable number of outcomes Examples: Dead/alive, treatment/placebo, dice, counts, etc. Continuous random variables have an infinite continuum of possible values. Examples: blood pressure, weight, the speed of a car, the real numbers from 1 to 6.

209 Probability functions
A probability function maps the possible values of x against their respective probabilities of occurrence, p(x) p(x) is a number from 0 to 1.0. The area under a probability function is always 1. It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 209

210 Discrete example: roll of a die
p(x) 1/6 1 4 5 6 2 3

211 Probability mass function (pmf)
x p(x) 1 p(x=1)=1/6 2 p(x=2)=1/6 3 p(x=3)=1/6 4 p(x=4)=1/6 5 p(x=5)=1/6 6 p(x=6)=1/6 1.0

212 Cumulative distribution function (CDF)
x P(x) 1/6 1 4 5 6 2 3 1/3 1/2 2/3 5/6 1.0

213 Cumulative distribution function
x P(x≤A) 1 P(x≤1)=1/6 2 P(x≤2)=2/6 3 P(x≤3)=3/6 4 P(x≤4)=4/6 5 P(x≤5)=5/6 6 P(x≤6)=6/6

214 Examples 1. What’s the probability that you roll a 3 or less?
P(x≤3)=1/2 2. What’s the probability that you roll a 5 or higher? P(x≥5) = 1 – P(x≤4) = 1-2/3 = 1/3

215 Practice Problem Which of the following are probability functions?
a.      f(x)=.25 for x=9,10,11,12 b.      f(x)= (3-x)/2 for x=1,2,3,4 c f(x)= (x2+x+1)/25 for x=0,1,2,3

216 Answer (a) a. f(x)=.25 for x=9,10,11,12 x f(x) 9 .25 10 11 12
Yes, probability function! 1.0

217 Answer (b) b. f(x)= (3-x)/2 for x=1,2,3,4 x f(x) 1 (3-1)/2=1.0 2
(3-2)/2=.5 3 (3-3)/2=0 4 (3-4)/2=-.5 Though this sums to 1, you can’t have a negative probability; therefore, it’s not a probability function.

218 Answer (c) c. f(x)= (x2+x+1)/25 for x=0,1,2,3 x f(x) 1/25 1 3/25 2
1/25 1 3/25 2 7/25 3 13/25 Doesn’t sum to 1. Thus, it’s not a probability function. 24/25

219 Practice Problem: Find the probability that on a given day:
The number of ships to arrive at a harbor on any given day is a random variable represented by x. The probability distribution for x is: x 10 11 12 13 14 P(x) .4 .2 .1 Find the probability that on a given day: a.    exactly 14 ships arrive b.    At least 12 ships arrive c.    At most 11 ships arrive  p(x=14)= .1 p(x12)= ( ) = .4 p(x≤11)= (.4 +.2) = .6

220 Practice Problem: You are lecturing to a group of 1000 students. You ask them to each randomly pick an integer between 1 and 10. Assuming, their picks are truly random: What’s your best guess for how many students picked the number 9? Since p(x=9) = 1/10, we’d expect about 1/10th of the students to pick students. What percentage of the students would you expect picked a number less than or equal to 6? Since p(x≤ 6) = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 =.6 60%

221 Important discrete distributions in epidemiology…
Binomial Yes/no outcomes (dead/alive, treated/untreated, smoker/non- smoker, sick/well, etc.) Poisson Counts (e.g., how many cases of disease in a given area)

222 Continuous case The probability function that accompanies a continuous random variable is a continuous mathematical function that integrates to 1. The probabilities associated with continuous functions are just areas under the curve (integrals!). Probabilities are given for a range of values, rather than a particular value (e.g., the probability of getting a math SAT score between 700 and 800 is 2%).

223 Continuous case For example, recall the negative exponential function (in probability, this is called an “exponential distribution”): This function integrates to 1:

224 Continuous case: “probability density function” (pdf)
x p(x)=e-x 1 The probability that x is any exact particular value (such as ) is 0; we can only assign probabilities to possible ranges of x.

225 For example, the probability of x falling within 1 to 2:
p(x)=e-x 1 2

226 Cumulative distribution function
As in the discrete case, we can specify the “cumulative distribution function” (CDF): The CDF here = P(x≤A)=

227 Example 2 x p(x) 1

228 Example 2: Uniform distribution
The uniform distribution: all values are equally likely The uniform distribution: f(x)= 1 , for 1 x 0 x p(x) 1 We can see it’s a probability distribution because it integrates to 1 (the area under the curve is 1):

229 Example: Uniform distribution
 What’s the probability that x is between ¼ and ½? x p(x) 1 P(½ x ¼ )= ¼

230 Practice Problem 4. Suppose that survival drops off rapidly in the year following diagnosis of a certain type of advanced cancer. Suppose that the length of survival (or time-to-death) is a random variable that approximately follows an exponential distribution with parameter 2 (makes it a steeper drop off): What’s the probability that a person who is diagnosed with this illness survives a year?

231 Answer The probability of dying within 1 year can be calculated using the cumulative distribution function: Cumulative distribution function is: The chance of surviving past 1 year is: P(x≥1) = 1 – P(x≤1)

232 Expected Value and Variance
All probability distributions are characterized by an expected value and a variance (standard deviation squared). 232

233 For example, bell-curve (normal) distribution:
Mean () One standard deviation from the mean ()

234 Expected value, or mean If we understand the underlying probability function of a certain phenomenon, then we can make informed decisions based on how we expect x to behave on-average over the long-run…(so called “frequentist” theory of probability). Expected value is just the weighted average or mean (µ) of random variable x. Imagine placing the masses p(x) at the points X on a beam; the balance point of the beam is the expected value of x.

235 Example: expected value
Recall the following probability distribution of ship arrivals: x 10 11 12 13 14 P(x) .4 .2 .1

236 Expected value, formally
Discrete case: Continuous case:

237 Empirical Mean is a special case of Expected Value…
Sample mean, for a sample of n subjects: = The probability (frequency) of each person in the sample is 1/n.

238 Expected value, formally
Discrete case: Continuous case:

239 Extension to continuous case: uniform distribution
p(x) 1 x 1

240 Symbol Interlude E(X) = µ these symbols are used interchangeably

241 Expected Value Expected value is an extremely useful concept for good decision- making!

242 Example: the lottery The Lottery (also known as a tax on people who are bad at math…) A certain lottery works by picking 6 numbers from 1 to 49. It costs $1.00 to play the lottery, and if you win, you win $2 million after taxes. If you play the lottery once, what are your expected winnings or losses?

243 Lottery Calculate the probability of winning in 1 try:
“49 choose 6” Out of 49 numbers, this is the number of distinct combinations of 6. The probability function (note, sums to 1.0): x$ p(x) -1 + 2 million 7.2 x 10--8

244 Expected Value The probability function Expected Value
p(x) -1 + 2 million 7.2 x 10--8 Expected Value E(X) = P(win)*$2,000, P(lose)*-$1.00 = 2.0 x 106 * 7.2 x (-1) = = -$.86 Negative expected value is never good! You shouldn’t play if you expect to lose money!

245 Expected Value If you play the lottery every week for 10 years, what are your expected winnings or losses? 520 x (-.86) = -$447.20

246 Gambling (or how casinos can afford to give so many free drinks…)
A roulette wheel has the numbers 1 through 36, as well as 0 and 00. If you bet $1 that an odd number comes up, you win or lose $1 according to whether or not that event occurs. If random variable X denotes your net gain, X=1 with probability 18/38 and X= -1 with probability 20/38. E(X) = 1(18/38) – 1 (20/38) = -$.053 On average, the casino wins (and the player loses) 5 cents per game. The casino rakes in even more if the stakes are higher: E(X) = 10(18/38) – 10 (20/38) = -$.53 If the cost is $10 per game, the casino wins an average of 53 cents per game. If 10,000 games are played in a night, that’s a cool $5300.

247 **A few notes about Expected Value as a mathematical operator:
If c= a constant number (i.e., not a variable) and X and Y are any random variables… E(c) = c E(cX)=cE(X) E(c + X)=c + E(X) E(X+Y)= E(X) + E(Y)

248 E(c) = c E(c) = c Example: If you cash in soda cans in CA, you always get 5 cents per can. Therefore, there’s no randomness. You always expect to (and do) get 5 cents.

249 E(cX)=cE(X) E(cX)=cE(X)
Example: If the casino charges $10 per game instead of $1, then the casino expects to make 10 times as much on average from the game (See roulette example above!)

250 E(c + X)=c + E(X) E(c + X)=c + E(X)
Example, if the casino throws in a free drink worth exactly $5.00 every time you play a game, you always expect to (and do) gain an extra $5.00 regardless of the outcome of the game.

251 E(X+Y)= E(X) + E(Y) E(X+Y)= E(X) + E(Y)
Example: If you play the lottery twice, you expect to lose: -$ $.86. NOTE: This works even if X and Y are dependent!! Does not require independence!! Proof left for later…

252 Practice Problem If a disease is fairly rare and the antibody test is fairly expensive, in a resource-poor region, one strategy is to take half of the serum from each sample and pool it with n other halved samples, and test the pooled lot. If the pooled lot is negative, this saves n-1 tests. If it’s positive, then you go back and test each sample individually, requiring n+1 tests total. Suppose a particular disease has a prevalence of 10% in a third-world population and you have 500 blood samples to screen. If you pool 20 samples at a time (25 lots), how many tests do you expect to have to run (assuming the test is perfect!)?  What if you pool only 10 samples at a time? 5 samples at a time?

253 Answer (a) a. Suppose a particular disease has a prevalence of 10% in a third-world population and you have 500 blood samples to screen. If you pool 20 samples at a time (25 lots), how many tests do you expect to have to run (assuming the test is perfect!)? Let X = a random variable that is the number of tests you have to run per lot: E(X) = P(pooled lot is negative)(1) + P(pooled lot is positive) (21) E(X) = (.90)20 (1) + [ ] (21) = 12.2% (1) % (21) = E(total number of tests) = 25*18.56 = 464

254 Answer (b) b. What if you pool only 10 samples at a time?
E(X) = (.90)10 (1) + [ ] (11) = 35% (1) + 65% (11) = average per lot 50 lots * 7.5 = 375

255 Answer (c) c. 5 samples at a time?
E(X) = (.90)5 (1) + [1-.905] (6) = 59% (1) + 41% (6) = average per lot 100 lots * 3.05 = 305

256 Practice Problem If X is a random integer between 1 and 10, what’s the expected value of X?

257 Answer If X is a random integer between 1 and 10, what’s the expected value of X?

258 Expected value isn’t everything though…
Take the show “Deal or No Deal” Everyone know the rules? Let’s say you are down to two cases left. $1 and $400,000. The banker offers you $200,000. So, Deal or No Deal?

259 Deal or No Deal… This could really be represented as a probability distribution and a non-random variable: x$ p(x) +1 .50 +$400,000 x$ p(x) +$200,000 1.0

260 Expected value doesn’t help…
p(x) +1 .50 +$400,000 x$ p(x) +$200,000 1.0

261 How to decide? Variance! If you take the deal, the variance/standard deviation is 0. If you don’t take the deal, what is average deviation from the mean? What’s your gut guess?

262 Variance/standard deviation
“The average (expected) squared distance (or deviation) from the mean” **We square because squaring has better properties than absolute value. Take square root to get back linear average distance from the mean (=”standard deviation”).

263 Variance, formally Discrete case: Continuous case:

264 Similarity to empirical variance
The variance of a sample: s2 = Division by n-1 reflects the fact that we have lost a “degree of freedom” (piece of information) because we had to estimate the sample mean before we could estimate the sample variance.

265 Symbol Interlude Var(X) = 2 these symbols are used interchangeably

266 Variance: Deal or No Deal
Now you examine your personal risk tolerance…

267 Practice Problem A roulette wheel has the numbers 1 through 36, as well as 0 and 00. If you bet $1.00 that an odd number comes up, you win or lose $1.00 according to whether or not that event occurs. If X denotes your net gain, X=1 with probability 18/38 and X= -1 with probability 20/38. We already calculated the mean to be = - $ What’s the variance of X?

268 Answer Standard deviation is $.99. Interpretation: On average, you’re either 1 dollar above or 1 dollar below the mean, which is just under zero. Makes sense!

269 Handy calculation formula!
Handy calculation formula (if you ever need to calculate by hand!): Intervening algebra!

270 Var(x) = E(x-)2 = E(x2) – [E(x)]2 (your calculation formula!)
Proofs (optional!): E(x-)2 = E(x2–2x + 2) remember “FOIL”?! =E(x2) – E(2x) +E(2) Use rules of expected value:E(X+Y)= E(X) + E(Y) = E(x2) – 2E(x) +2 E(c) = c = E(x2) – 2 + E(x) =  = E(x2) – 2 = E(x2) – [E(x)]2 OR, equivalently: E(x-)2 =

271 For example, what’s the variance and standard deviation of the roll of a die?
p(x) 1 p(x=1)=1/6 2 p(x=2)=1/6 3 p(x=3)=1/6 4 p(x=4)=1/6 5 p(x=5)=1/6 6 p(x=6)=1/6 1.0 x p(x) 1/6 1 4 5 6 2 3 mean average distance from the mean

272 **A few notes about Variance as a mathematical operator:
If c= a constant number (i.e., not a variable) and X and Y are random variables, then Var(c) = 0 Var (c+X)= Var(X)   Var(cX)= c2Var(X) Var(X+Y)= Var(X) + Var(Y) ONLY IF X and Y are independent!!!! {Var(X+Y)= Var(X) + Var(Y)+2Cov(X,Y) IF X and Y are not independent}

273 Var(c) = 0 Var(c) = 0 Constants don’t vary!

274 Var (c+X)= Var(X) Var (c+X)= Var(X)
Adding a constant to every instance of a random variable doesn’t change the variability. It just shifts the whole distribution by c. If everybody grew 5 inches suddenly, the variability in the population would still be the same. + c

275 Var (c+X)= Var(X) Var (c+X)= Var(X)
Adding a constant to every instance of a random variable doesn’t change the variability. It just shifts the whole distribution by c. If everybody grew 5 inches suddenly, the variability in the population would still be the same. + c

276 Var(cX)= c2Var(X) Var(cX)= c2Var(X)
Multiplying each instance of the random variable by c makes it c-times as wide of a distribution, which corresponds to c2 as much variance (deviation squared). For example, if everyone suddenly became twice as tall, there’d be twice the deviation and 4 times the variance in heights in the population.

277 Var(X+Y)= Var(X) + Var(Y)
Var(X+Y)= Var(X) + Var(Y) ONLY IF X and Y are independent!!!!!!!! With two random variables, you have more opportunity for variation, unless they vary together (are dependent, or have covariance): Var(X+Y)= Var(X) + Var(Y) + 2Cov(X, Y)

278 Example of Var(X+Y)= Var(X) + Var(Y): TPMT
TPMT metabolizes the drugs mercaptopurine, azathioprine, and 6- thioguanine (chemotherapy drugs) People with TPMT-/ TPMT+ have reduced levels of activity (10% prevalence) People with TPMT-/ TPMT- have no TPMT activity (prevalence 0.3%). They cannot metabolize mercaptopurine, azathioprine, and 6- thioguanine, and risk bone marrow toxicity if given these drugs.

279 TPMT activity by genotype
Weinshilboum R. Drug Metab Dispos Apr;29(4 Pt 2):601-5

280 TPMT activity by genotype
The variability in TPMT activity is much higher in wild-types than heterozygotes. Weinshilboum R. Drug Metab Dispos Apr;29(4 Pt 2):601-5

281 TPMT activity by genotype
There is variability in expression from each wild-type allele. With two copies of the good gene present, there’s “twice as much” variability. No variability in expression here, since there’s no working gene. Weinshilboum R. Drug Metab Dispos Apr;29(4 Pt 2):601-5

282 Practice Problem Find the variance and standard deviation for the number of ships to arrive at the harbor (recall that the mean is 11.3). x 10 11 12 13 14 P(x) .4 .2 .1

283 Answer: variance and std dev
x2 100 121 144 169 196 P(x) .4 .2 .1 Interpretation: On an average day, we expect ships to arrive in the harbor, plus or minus This gives you a feel for what would be considered a usual day!

284 Practice Problem You toss a coin 100 times. What’s the expected number of heads? What’s the variance of the number of heads?

285 Answer: expected value
Intuitively, we’d probably all agree that we expect around 50 heads, right? Another way to show this Think of tossing 1 coin. E(X=number of heads) = (1) P(heads) + (0)P(tails) E(X=number of heads) = 1(.5) + 0 = .5  If we do this 100 times, we’re looking for the sum of 100 tosses, where we assign 1 for a heads and 0 for a tails. (these are 100 “independent, identically distributed (i.i.d)” events) E(X1 +X2 +X3 +X4 +X5 …..+X100) = E(X1) + E(X2) + E(X3)+ E(X4)+ E(X5) …..+ E(X100) = 100 E(X1) = 50

286 Answer: variance What’s the variability, though? More tricky. But, again, we could do this for 1 coin and then use our rules of variance. Think of tossing 1 coin. E(X2=number of heads squared) = 12 P(heads) + 02 P(tails) E(X2) = 1(.5) + 0 = .5 Var(X) = = = .25 Then, using our rule: Var(X+Y)= Var(X) + Var(Y) (coin tosses are independent!) Var(X1 +X2 +X3 +X4 +X5 …..+X100) = Var(X1) + Var(X2) + Var(X3)+ Var(X4)+ Var(X5) …..+ Var(X100) = 100 Var(X1) = 100 (.25) = 25 SD(X)=5 Interpretation: When we toss a coin 100 times, we expect to get 50 heads plus or minus 5.

287 Or use computer simulation…
Flip coins virtually! Flip a virtual coin 100 times; count the number of heads. Repeat this over and over again a large number of times (we’ll try 30,000 repeats!) Plot the 30,000 results.

288 Coin tosses… Mean = 50 Std. dev = 5 Follows a normal distribution
95% of the time, we get between 40 and 60 heads…

289 Covariance: joint probability
The covariance measures the strength of the linear relationship between two variables The covariance:

290 The Sample Covariance The sample covariance:

291 Interpreting Covariance
Covariance between two random variables: cov(X,Y) > X and Y are positively correlated cov(X,Y) < X and Y are inversely correlated cov(X,Y) = X and Y are independent

292 The binomial and Poisson distributions
Examples of discrete probability distributions: The binomial and Poisson distributions

293 Binomial Probability Distribution
A fixed number of observations (trials), n e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed A binary random variable e.g., head or tail in each toss of a coin; defective or not defective light bulb Generally called “success” and “failure” Probability of success is p, probability of failure is 1 – p Constant probability for each observation e.g., Probability of getting a tail is the same each time we toss the coin

294 Binomial example Take the example of 5 coin tosses. What’s the probability that you flip exactly 3 heads in 5 coin tosses?

295 Binomial distribution
Solution: One way to get exactly 3 heads: HHHTT What’s the probability of this exact arrangement? P(heads)xP(heads) xP(heads)xP(tails)xP(tails) =(1/2)3 x (1/2)2 Another way to get exactly 3 heads: THHHT Probability of this exact outcome = (1/2)1 x (1/2)3 x (1/2)1 = (1/2)3 x (1/2)2

296 Binomial distribution
In fact, (1/2)3 x (1/2)2 is the probability of each unique outcome that has exactly 3 heads and 2 tails. So, the overall probability of 3 heads and 2 tails is: (1/2)3 x (1/2)2 + (1/2)3 x (1/2)2 + (1/2)3 x (1/2)2 + ….. for as many unique arrangements as there are—but how many are there??

297 5C3 = 5!/3!2! = 10 Outcome Probability THHHT (1/2)3 x (1/2)2
HHHTT (1/2)3 x (1/2)2 TTHHH (1/2)3 x (1/2)2 HTTHH (1/2)3 x (1/2)2 HHTTH (1/2)3 x (1/2)2 THTHH (1/2)3 x (1/2)2 HTHTH (1/2)3 x (1/2)2 HHTHT (1/2)3 x (1/2)2 THHTH (1/2)3 x (1/2)2 HTHHT (1/2)3 x (1/2)2 10 arrangements x (1/2)3 x (1/2)2 The probability of each unique outcome (note: they are all equal) ways to arrange 3 heads in 5 trials 5C3 = 5!/3!2! = 10

298 P(3 heads and 2 tails) = x P(heads)3 x P(tails)2 =

299 Binomial distribution function: X= the number of heads tossed in 5 coin tosses
p(x) p(x) x 1 2 3 4 5 number of heads number of heads

300 Example 2 As voters exit the polls, you ask a representative random sample of 6 voters if they voted for proposition 100. If the true percentage of voters who vote for the proposition is 55.1%, what is the probability that, in your sample, exactly 2 voted for the proposition and 4 did not?

301 Solution: . 15 arrangements x (.551)2 x (.449)4
Outcome Probability YYNNNN = (.551)2 x (.449)4 NYYNNN (.449)1 x (.551)2 x (.449)3 = (.551)2 x (.449)4 NNYYNN (.449)2 x (.551)2 x (.449)2 = (.551)2 x (.449)4 NNNYYN (.449)3 x (.551)2 x (.449)1 = (.551)2 x (.449)4 NNNNYY (.449)4 x (.551) = (.551)2 x (.449)4 . ways to arrange 2 Obama votes among 6 voters 15 arrangements x (.551)2 x (.449)4   P(2 yes votes exactly) = x (.551)2 x (.449)4 = 18.5%

302 Binomial distribution, generally
Note the general pattern emerging  if you have only two possible outcomes (call them 1/0 or yes/no or success/failure) in n independent trials, then the probability of exactly X “successes”= n = number of trials 1-p = probability of failure p = probability of success X = # successes out of n trials

303 Definitions: Binomial
Binomial: Suppose that n independent experiments, or trials, are performed, where n is a fixed number, and that each experiment results in a “success” with probability p and a “failure” with probability 1-p. The total number of successes, X, is a binomial random variable with parameters n and p. We write: X ~ Bin (n, p) {reads: “X is distributed binomially with parameters n and p} And the probability that X=r (i.e., that there are exactly r successes) is:

304 Definitions: Bernouilli
Bernouilli trial: If there is only 1 trial with probability of success p and probability of failure 1-p, this is called a Bernouilli distribution. (special case of the binomial with n=1) Probability of success: Probability of failure:

305 Binomial distribution: example
If I toss a coin 20 times, what’s the probability of getting exactly 10 heads?

306 Binomial distribution: example
If I toss a coin 20 times, what’s the probability of getting of getting 2 or fewer heads?

307 **All probability distributions are characterized by an expected value and a variance:
If X follows a binomial distribution with parameters n and p: X ~ Bin (n, p) Then: x= E(X) = np x2 =Var (X) = np(1-p) x =SD (X)= Note: the variance will always lie between 0*N-.25 *N p(1-p) reaches maximum at p=.5 P(1-p)=.25

308 Characteristics of Bernouilli distribution
For Bernouilli (n=1) E(X) = p Var (X) = p(1-p)

309 Variance Proof (optional!)
For Y~Bernouilli (p) Y=1 if yes Y=0 if no For X~Bin (N,p)

310 Recall coin toss example
X= number of heads in 100 tosses of a coin X ~ Bin (100, .5) E(x) = 100*.5=50 Var(X) = 100*.5*.5 = 25 SD(X) = 5

311 Things that follow a binomial distribution…
Cohort study (or cross-sectional): The number of exposed individuals in your sample that develop the disease The number of unexposed individuals in your sample that develop the disease Case-control study: The number of cases that have had the exposure The number of controls that have had the exposure

312 Practice problems 1. You are performing a cohort study. If the probability of developing disease in the exposed group is .05 for the study duration, then if you sample (randomly) 500 exposed people, how many do you expect to develop the disease? Give a margin of error (+/- 1 standard deviation) for your estimate. 2. What’s the probability that at most 10 exposed people develop the disease?

313 Answer X ~ binomial (500, .05) E(X) = 500 (.05) = 25
1. You are performing a cohort study. If the probability of developing disease in the exposed group is .05 for the study duration, then if you sample (randomly) 500 exposed people, how many do you expect to develop the disease? Give a margin of error (+/- 1 standard deviation) for your estimate. X ~ binomial (500, .05) E(X) = 500 (.05) = 25 Var(X) = 500 (.05) (.95) = 23.75 StdDev(X) = square root (23.75) = 4.87  25  4.87

314 Answer 2. What’s the probability that at most 10 exposed subjects develop the disease? This is asking for a CUMULATIVE PROBABILITY: the probability of 0 getting the disease or 1 or 2 or 3 or 4 or up to 10. P(X≤10) = P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)+….+ P(X=10)= (we’ll learn how to approximate this long sum next week)

315 A brief distraction: Pascal’s Triangle Trick
You’ll rarely calculate the binomial by hand. However, it is good to know how to … Pascal’s Triangle Trick for calculating binomial coefficients Recall from math in your past that Pascal’s Triangle is used to get the coefficients for binomial expansion… For example, to expand: (p + q)5  The powers follow a set pattern: p5 + p4q1 + p3q2 + p2q3+ p1q4+ q5 But what are the coefficients?  Use Pascal’s Magic Triangle…

316 Pascal’s Triangle Edges are all 1’s 1 1 1 1 2 1 To get the coefficient for expanding to the 5th power, use the row that starts with 5. (p + q)5 = 1p5 + 5p4q1 + 10p3q2 + 10p2q3+ 5p1q4+ 1q5 Add the two numbers in the row above to get the number below, e.g.: 3+1=4; 5+10=15

317 Same coefficients for X~Bin(5,p)
For example, X=# heads in 5 coin tosses: X P(X) 1 2 3 4 5 From line 5 of Pascal’s triangle!

318 Relationship between binomial probability distribution and binomial expansion
P(X=0) P(X=1) P(X=2) P(X=3) P(X=4) P(X=5)

319 Practice problems If the probability of being a smoker among a group of cases with lung cancer is .6, what’s the probability that in a group of 8 cases you have less than 2 smokers? More than 5? What are the expected value and variance of the number of smokers?

320 Answer 1 1 1 1 2 1

321 Answer, continued 1 4 5 2 3 6 7 8

322 Answer, continued 1 4 5 2 3 6 7 8 P(>5)=.21+.09+.0168 = .3168
P(>5)= = .3168 P(<2)= = E(X) = 8 (.6) = 4.8 Var(X) = 8 (.6) (.4) =1.92 StdDev(X) = 1.38

323 Practice problem If Stanford tickets in the medical center ‘A’ lot approximately twice a week (2/5 weekdays), if you want to park in the ‘A’ lot twice a week for the year, are you financially better off buying a parking sticker (which costs $726 for the year) or parking illegally (tickets are $35 each)?

324 Answer If Stanford tickets in the medical center ‘A’ lot approximately twice a week (2/5 weekdays), if you want to park in the ‘A’ lot twice a week for the year, are you financially better off buying a parking sticker (which costs $726 for the year) or parking illegally (tickets are $35 each)? Use Binomial Let X be a random variable that is the number of tickets you receive in a year. Assuming 2 weeks vacation, there are 50x2 days (twice a week for 50 weeks) you’ll be parking illegally. p=.40 is the chance of receiving a ticket on a given day: X~bin (100, .40) E(X) = 100x.40 = 40 tickets expected (with std dev of about 5) 40 x $35 = $1400 in tickets (+/- $200); better to buy the sticker!

325 Multinomial distribution (beyond the scope of this course)
The multinomial is a generalization of the binomial. It is used when there are more than 2 possible outcomes (for ordinal or nominal, rather than binary, random variables). Instead of partitioning n trials into 2 outcomes (yes with probability p / no with probability 1-p), you are partitioning n trials into 3 or more outcomes (with probabilities: p1, p2, p3,..) General formula for 3 outcomes:

326 Multinomial example Specific Example: if you are randomly choosing 8 people from an audience that contains 50% democrats, 30% republicans, and 20% green party, what’s the probability of choosing exactly 4 democrats, 3 republicans, and 1 green party member? You can see that it gets hard to calculate very fast! The multinomial has many uses in genetics where a person may have 1 of many possible alleles (that occur with certain probabilities in a given population) at a gene locus.

327 Introduction to the Poisson Distribution
Poisson distribution is for counts—if events happen at a constant rate over time, the Poisson distribution gives the probability of X number of events occurring in time T.

328 Poisson Mean and Variance
For a Poisson random variable, the variance and mean are the same! Mean Variance and Standard Deviation where  = expected number of hits in a given time period

329 Poisson Distribution, example
The Poisson distribution models counts, such as the number of new cases of SARS that occur in women in New England next month. The distribution tells you the probability of all possible numbers of new cases, from 0 to infinity. If X= # of new cases next month and X ~ Poisson (), then the probability that X=k (a particular count) is:

330 Example For example, if new cases of West Nile Virus in New England are occurring at a rate of about 2 per month, then these are the probabilities that: 0,1, 2, 3, 4, 5, 6, to 1000 to 1 million to… cases will occur in New England in the next month:

331 Poisson Probability table
X P(X) =.135 1 =.27 2 3 =.18 4 =.09 5

332 Example: Poisson distribution
Suppose that a rare disease has an incidence of 1 in 1000 person-years. Assuming that members of the population are affected independently, find the probability of k cases in a population of 10,000 (followed over 1 year) for k=0,1,2. The expected value (mean) = = .001*10,000 = 10 10 new cases expected in this population per year

333 more on Poisson… “Poisson Process” (rates)
Note that the Poisson parameter  can be given as the mean number of events that occur in a defined time period OR, equivalently,  can be given as a rate, such as =2/month (2 events per 1 month) that must be multiplied by t=time (called a “Poisson Process”)  X ~ Poisson () E(X) = t Var(X) = t

334 Example For example, if new cases of West Nile in New England are occurring at a rate of about 2 per month, then what’s the probability that exactly 4 cases will occur in the next 3 months? X ~ Poisson (=2/month) Exactly 6 cases?

335 Practice problems 1a. If calls to your cell phone are a Poisson process with a constant rate =2 calls per hour, what’s the probability that, if you forget to turn your phone off in a 1.5 hour movie, your phone rings during that time? 1b. How many phone calls do you expect to get during the movie?

336 Answer P(X≥1)=1 – .05 = 95% chance
1a. If calls to your cell phone are a Poisson process with a constant rate =2 calls per hour, what’s the probability that, if you forget to turn your phone off in a 1.5 hour movie, your phone rings during that time? X ~ Poisson (=2 calls/hour) P(X≥1)=1 – P(X=0) P(X≥1)=1 – .05 = 95% chance 1b. How many phone calls do you expect to get during the movie? E(X) = t = 2(1.5) = 3

337 Calculating probabilities in SAS
For binomial probability distribution function: P(X=C) = pdf('binomial', C, p, N) For binomial cumulative distribution function: P(X≤C) = cdf('binomial', C, p, N) For Poisson probability distribution function: P(X=C) = pdf('poisson', C, ) For Poisson cumulative distribution function: P(X≤C) = cdf('poisson', C, )

338 SAS examples data _null_; TwoSixes=pdf('binomial', 8, .0278, 100);
put TwoSixes; run; TwoSixes=cdf('binomial', 8, .0278, 100); TwoSixes=pdf('poisson', 8, 2.78); TwoSixes=cdf('poisson', 8, 2.78);

339 The normal and standard normal
Examples of continuous probability distributions: The normal and standard normal

340 The Normal Distribution
f(X) Changing μ shifts the distribution left or right. Changing σ increases or decreases the spread. σ μ X 340

341 The Normal Distribution: as mathematical function (pdf)
This is a bell shaped curve with different centers and spreads depending on  and  Note constants: = e=

342 The Normal PDF It’s a probability function, so no matter what the values of  and , must integrate to 1! 342

343 Normal distribution is defined by its mean and standard dev.
E(X)= = Var(X)=2 = Standard Deviation(X)=

344 **The beauty of the normal curve:
No matter what  and  are, the area between - and + is about 68%; the area between -2 and +2 is about 95%; and the area between -3 and +3 is about 99.7%. Almost all values fall within 3 standard deviations. 344

345 68-95-99.7 Rule 68% of the data 95% of the data 99.7% of the data 345
SAY: within 1 standard deviation either way of the mean within 2 standard deviations of the mean within 3 standard deviations either way of the mean WORKS FOR ALL NORMAL CURVES NO MATTER HOW SKINNY OR FAT 95% of the data 99.7% of the data 345

346 Rule in Math terms…

347 How good is rule for real data?
Check some example data: The mean of the weight of the women = The standard deviation (SD) = 15.5 347

348 68% of 120 = .68x120 = ~ 82 runners In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean. 112.3 127.8 143.3 348

349 95% of 120 = .95 x 120 = ~ 114 runners In fact, 115 runners fall within 2-SD’s of the mean. 96.8 127.8 158.8 349

350 99.7% of 120 = .997 x 120 = runners In fact, all 120 runners fall within 3-SD’s of the mean. 81.3 127.8 174.3 350

351 Example Suppose SAT scores roughly follows a normal distribution in the U.S. population of college-bound students (with range restricted to ), and the average math SAT is 500 with a standard deviation of 50, then: 68% of students will have scores between 450 and 550 95% will be between 400 and 600 99.7% will be between 350 and 650 351

352 Example Solve for Q?….Yikes! BUT…
What if you wanted to know the math SAT score corresponding to the 90th percentile (=90% of students are lower)? P(X≤Q) = .90  Solve for Q?….Yikes! 352

353 The Standard Normal (Z): “Universal Currency”
The formula for the standardized normal probability density function is 353

354 The Standard Normal Distribution (Z)
All normal distributions can be converted into the standard normal curve by subtracting the mean and dividing by the standard deviation: Somebody calculated all the integrals for the standard normal and put them in a table! So we never have to integrate! Even better, computers now do all the integration. 354

355 Comparing X and Z units 100 200 X 2.0 Z ( = 100,  = 50)
2.0 Z ( = 0,  = 1) 355

356 Example For example: What’s the probability of getting a math SAT score of 575 or less, =500 and =50? i.e., A score of 575 is 1.5 standard deviations above the mean Yikes! But to look up Z= 1.5 in standard normal chart (or enter into SAS) no problem! = .9332 356

357 Practice problem If birth weights in a population are normally distributed with a mean of 109 oz and a standard deviation of 13 oz, What is the chance of obtaining a birth weight of 141 oz or heavier when sampling birth records at random? What is the chance of obtaining a birth weight of 120 or lighter? 357

358 Answer What is the chance of obtaining a birth weight of 141 oz or heavier when sampling birth records at random? From the chart or SAS  Z of 2.46 corresponds to a right tail (greater than) area of: P(Z≥2.46) = 1-(.9931)= or .69 % 358

359 Answer b. What is the chance of obtaining a birth weight of 120 or lighter? From the chart or SAS  Z of .85 corresponds to a left tail area of: P(Z≤.85) = .8023= 80.23% 359

360 Looking up probabilities in the standard normal table
What is the area to the left of Z=1.51 in a standard normal curve? Z=1.51 Area is % Z=1.51

361 Normal probabilities in SAS
data _null_; theArea=probnorm(1.5); put theArea; run; And if you wanted to go the other direction (i.e., from the area to the Z score (called the so-called “Probit” function  data _null_; theZValue=probit(.93); put theZValue;   The “probnorm(Z)” function gives you the probability from negative infinity to Z (here 1.5) in a standard normal curve. The “probit(p)” function gives you the Z-value that corresponds to a left-tail area of p (here .93) from a standard normal curve. The probit function is also known as the inverse standard normal function.

362 Probit function: the inverse
  (area)= Z: gives the Z-value that goes with the probability you want  For example, recall SAT math scores example. What’s the score that corresponds to the 90th percentile? In Table, find the Z-value that corresponds to area of .90  Z= 1.28 Or use SAS data _null_; theZValue=probit(.90); put theZValue; run; If Z=1.28, convert back to raw SAT score  1.28 = X – 500 =1.28 (50) X=1.28(50) = (1.28 standard deviations above the mean!) `

363 Are my data “normal”? Not all continuous random variables are normally distributed!! It is important to evaluate how well the data are approximated by a normal distribution 363

364 Are my data normally distributed?
Look at the histogram! Does it appear bell shaped? Compute descriptive summary measures—are mean, median, and mode similar? Do 2/3 of observations lie within 1 std dev of the mean? Do 95% of observations lie within 2 std dev of the mean? Look at a normal probability plot—is it approximately linear? Run tests of normality (such as Kolmogorov-Smirnov). But, be cautious, highly influenced by sample size! 364

365 Data from our class… Median = 6 Mean = 7.1 Mode = 0 SD = 6.8
Range = 0 to 24 (= 3.5 σ)

366 Data from our class… Median = 5 Mean = 5.4 Mode = none SD = 1.8
Range = 2 to 9 (~ 4 σ

367 Data from our class… Median = 3 Mean = 3.4 Mode = 3 SD = 2.5
Range = 0 to 12 (~ 5 σ

368 Data from our class… Median = 7:00 Mean = 7:04 Mode = 7:00 SD = :55
Range = 5:30 to 9:00 (~4 σ

369 Data from our class… 7.1 +/ = 0.3 – 13.9 0.3 13.9

370 Data from our class… 7.1 +/- 2*6.8 = 0 – 20.7

371 Data from our class… 7.1 +/- 3*6.8 = 0 – 27.5

372 Data from our class… 5.4 +/ = 3.6 – 7.2 3.6 7.2

373 Data from our class… 5.4 +/- 2*1.8 = 1.8 – 9.0 1.8 9.0

374 Data from our class… 5.4 +/- 3*1.8 = 0– 10 10

375 Data from our class… 0.9 5.9 3.4 +/- 2.5= 0.9 – 7.9

376 Data from our class… 8.4 3.4 +/- 2*2.5= 0 – 8.4

377 Data from our class… 10.9 3.4 +/- 3*2.5= 0 – 10.9

378 Data from our class… 6:09 7:59 7:04+/- 0:55 = 6:09 – 7:59

379 Data from our class… 5:14 8:54 7:04+/- 2*0:55 = 5:14 – 8:54

380 Data from our class… 4:19 9:49 7:04+/- 2*0:55 = 4:19 – 9:49

381 The Normal Probability Plot
Order the data. Find corresponding standardized normal quantile values: Plot the observed data values against normal quantile values. Evaluate the plot for evidence of linearity. 381

382 Normal probability plot coffee…
Right-Skewed! (concave up)

383 Normal probability plot love of writing…
Neither right- skewed or left- skewed, but big gap at 6.

384 Norm prob. plot Exercise…
Right-Skewed! (concave up)

385 Norm prob. plot Wake up time
Closest to a straight line…

386 Formal tests for normality
Results: Coffee: Strong evidence of non- normality (p<.01) Writing love: Moderate evidence of non- normality (p=.01) Exercise: Weak to no evidence of non- normality (p>.10) Wakeup time: No evidence of non- normality (p>.25)

387 Normal approximation to the binomial
When you have a binomial distribution where n is large and p is middle-of-the road (not too small, not too big, closer to .5), then the binomial starts to look like a normal distribution in fact, this doesn’t even take a particularly large n Recall: What is the probability of being a smoker among a group of cases with lung cancer is .6, what’s the probability that in a group of 8 cases you have less than 2 smokers?

388 Normal approximation to the binomial
When you have a binomial distribution where n is large and p isn’t too small (rule of thumb: mean>5), then the binomial starts to look like a normal distribution   Recall: smoking example… 1 4 5 2 3 6 7 8 .27 Starting to have a normal shape even with fairly small n. You can imagine that if n got larger, the bars would get thinner and thinner and this would look more and more like a continuous function, with a bell curve shape. Here np=4.8.

389 Normal approximation to binomial
1 4 5 2 3 6 7 8 .27 What is the probability of fewer than 2 smokers? Exact binomial probability (from before) = = Normal approximation probability: =4.8 =1.39 P(Z<2)=.022

390 A little off, but in the right ballpark… we could also use the value to the left of 1.5 (as we really wanted to know less than but not including 2; called the “continuity correction”)… A fairly good approximation of the exact probability, P(Z≤-2.37) =.0069

391 Practice problem 1. You are performing a cohort study. If the probability of developing disease in the exposed group is .25 for the study duration, then if you sample (randomly) 500 exposed people, What’s the probability that at most 120 people develop the disease?

392 Answer By hand (yikes!):
+ By hand (yikes!):  P(X≤120) = P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)+….+ P(X=120)= OR Use SAS: data _null_; Cohort=cdf('binomial', 120, .25, 500); put Cohort; run;   OR use, normal approximation: =np=500(.25)=125 and 2=np(1-p)=93.75; =9.68  P(Z<-.52)= .3015

393 Proportions… The binomial distribution forms the basis of statistics for proportions. A proportion is just a binomial count divided by n. For example, if we sample 200 cases and find 60 smokers, X=60 but the observed proportion=.30. Statistics for proportions are similar to binomial counts, but differ by a factor of n.

394 Stats for proportions For binomial: For proportion:
Differs by a factor of n. Differ s by a factor of n. For proportion: P-hat stands for “sample proportion.”

395 It all comes back to Z… Statistics for proportions are based on a normal distribution, because the binomial can be approximated as normal if np>5

396 Statistical inference: CLT, confidence intervals, p-values

397 Statistical Inference The process of making guesses about the truth from a sample.
Sample statistics *hat notation ^ is often used to indicate “estitmate” Truth (not observable) Sample (observation) Population parameters Make guesses about the whole population 397

398 Statistics vs. Parameters
Sample Statistic – any summary measure calculated from data; e.g., could be a mean, a difference in means or proportions, an odds ratio, or a correlation coefficient E.g., the mean vitamin D level in a sample of 100 men is 63 nmol/L E.g., the correlation coefficient between vitamin D and cognitive function in the sample of 100 men is 0.15 Population parameter – the true value/true effect in the entire population of interest E.g., the true mean vitamin D in all middle-aged and older European men is 62 nmol/L E.g., the true correlation between vitamin D and cognitive function in all middle-aged and older European men is 0.15 398

399 Examples of Sample Statistics:
Single population mean Single population proportion Difference in means (ttest) Difference in proportions (Z-test) Odds ratio/risk ratio Correlation coefficient Regression coefficient It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 399

400 Example 1: cognitive function and vitamin D
Hypothetical data loosely based on [1]; cross- sectional study of 100 middle-aged and older European men. Estimation: What is the average serum vitamin D in middle-aged and older European men? Sample statistic: mean vitamin D levels Hypothesis testing: Are vitamin D levels and cognitive function correlated? Sample statistic: correlation coefficient between vitamin D and cognitive function, measured by the Digit Symbol Substitution Test (DSST). 1. Lee DM, Tajar A, Ulubaev A, et al. Association between 25-hydroxyvitamin D levels and cognitive performance in middle-aged and older European men. J Neurol Neurosurg Psychiatry Jul;80(7):722-9.

401 Distribution of a trait: vitamin D
Right-skewed! Mean= 63 nmol/L Standard deviation = 33 nmol/L IF the true mean was 128 with an average variability of 15 lbs…. 401

402 Distribution of a trait: DSST
Normally distributed Mean = 28 points Standard deviation = 10 points By chance, you would sometimes get values for your sample mean as high as 157 pounds. Very rarely would you see anything higher, though. 402

403 Distribution of a statistic…
Statistics follow distributions too… But the distribution of a statistic is a theoretical construct. Statisticians ask a thought experiment: how much would the value of the statistic fluctuate if one could repeat a particular study over and over again with different samples of the same size? By answering this question, statisticians are able to pinpoint exactly how much uncertainty is associated with a given statistic. 403

404 Distribution of a statistic
Two approaches to determine the distribution of a statistic: 1. Computer simulation Repeat the experiment over and over again virtually! More intuitive; can directly observe the behavior of statistics. 2. Mathematical theory Proofs and formulas! More practical; use formulas to solve problems.

405 Example of computer simulation…
How many heads come up in 100 coin tosses? Flip coins virtually Flip a coin 100 times; count the number of heads. Repeat this over and over again a large number of times (we’ll try 30,000 repeats!) Plot the 30,000 results.

406 Coin tosses… Conclusions:
We usually get between 40 and 60 heads when we flip a coin 100 times. It’s extremely unlikely that we will get 30 heads or 70 heads (didn’t happen in 30,000 experiments!).

407 Distribution of the sample mean, computer simulation…
1. Specify the underlying distribution of vitamin D in all European men aged 40 to 79. Right-skewed Standard deviation = 33 nmol/L True mean = 62 nmol/L (this is arbitrary; does not affect the distribution) 2. Select a random sample of 100 virtual men from the population. 3. Calculate the mean vitamin D for the sample. 4. Repeat steps (2) and (3) a large number of times (say 1000 times). 5. Explore the distribution of the means. 407

408 Distribution of mean vitamin D (a sample statistic)
Normally distributed! Surprise! Mean= 62 nmol/L (the true mean) Standard deviation = 3.3 nmol/L

409 Distribution of mean vitamin D (a sample statistic)
Normally distributed (even though the trait is right-skewed!) Mean = true mean Standard deviation = 3.3 nmol/L The standard deviation of a statistic is called a standard error The standard error of a mean =

410 If I increase the sample size to n=400…
Standard error = 1.7 nmol/L

411 If I increase the variability of vitamin D (the trait) to SD=40…
Standard error = 4.0 nmol/L

412 Mathematical Theory… The Central Limit Theorem!
If all possible random samples, each of size n, are taken from any population with a mean  and a standard deviation , the sampling distribution of the sample means (averages) will: 1. have mean: 2. have standard deviation: It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 3. be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n). It all comes back to Z! 412

413 Symbol Check The mean of the sample means.
The standard deviation of the sample means. Also called “the standard error of the mean.” It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 413

414 Mathematical Proof (optional!)
If X is a random variable from any distribution with known mean, E(x), and variance, Var(x), then the expected value and variance of the average of n observations of X is:

415 Computer simulation of the CLT: (this is what we will do in lab next Wednesday!)
1. Pick any probability distribution and specify a mean and standard deviation. 2. Tell the computer to randomly generate observations from that probability distributions E.g., the computer is more likely to spit out values with high probabilities 3. Plot the “observed” values in a histogram. 4. Next, tell the computer to randomly generate averages-of-2 (randomly pick 2 and take their average) from that probability distribution. Plot “observed” averages in histograms. 5. Repeat for averages-of-10, and averages-of-100. IF the true mean was 128 with an average variability of 15 lbs…. 415

416 Uniform on [0,1]: average of 1 (original distribution)
It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 416

417 Uniform: 1000 averages of 2 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 417

418 Uniform: 1000 averages of 5 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 418

419 Uniform: 1000 averages of 100 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 419

420 ~Exp(1): average of 1 (original distribution)
It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 420

421 ~Exp(1): 1000 averages of 2 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 421

422 ~Exp(1): 1000 averages of 5 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 422

423 ~Exp(1): 1000 averages of 100 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 423

424 ~Bin(40, .05): average of 1 (original distribution)
It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 424

425 ~Bin(40, .05): 1000 averages of 2 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 425

426 ~Bin(40, .05): 1000 averages of 5 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 426

427 ~Bin(40, .05): 1000 averages of 100 It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 427

428 The Central Limit Theorem:
If all possible random samples, each of size n, are taken from any population with a mean  and a standard deviation , the sampling distribution of the sample means (averages) will: 1. have mean: 2. have standard deviation: It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 3. be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n) 428

429 Central Limit Theorem caveats for small samples:
The sample standard deviation is an imprecise estimate of the true standard deviation (σ); this imprecision changes the distribution to a T- distribution. A t-distribution approaches a normal distribution for large n (100), but has fatter tails for small n (<100) If the underlying distribution is non-normal, the distribution of the means may be non-normal. More on T-distributions next week!!

430 Summary: Single population mean (large n)
Hypothesis test: Confidence Interval 430

431 Single population mean (small n, normally distributed trait)
Hypothesis test: Confidence Interval 431

432 Examples of Sample Statistics:
Single population mean Single population proportion Difference in means (ttest) Difference in proportions (Z-test) Odds ratio/risk ratio Correlation coefficient Regression coefficient It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 432

433 Distribution of a correlation coefficient?? Computer simulation…
1. Specify the true correlation coefficient Correlation coefficient = 0.15 2. Select a random sample of 100 virtual men from the population. 3. Calculate the correlation coefficient for the sample. 4. Repeat steps (2) and (3) 15,000 times 5. Explore the distribution of the 15,000 correlation coefficients.

434 Distribution of a correlation coefficient…
Normally distributed! Mean = 0.15 (true correlation) Standard error = 0.10 434

435 Distribution of a correlation coefficient in general…
1. Shape of the distribution Normally distributed for large samples T-distribution for small samples (n<100) 2. Mean = true correlation coefficient (r) 3. Standard error  435

436 Many statistics follow normal (or t-distributions)…
Means/difference in means T-distribution for small samples Proportions/difference in proportions Regression coefficients Natural log of the odds ratio 436

437 Estimation (confidence intervals)…
What is a good estimate for the true mean vitamin D in the population (the population parameter)? 63 nmol/L +/- margin of error

438 95% confidence interval Goal: capture the true effect (e.g., the true mean) most of the time. A 95% confidence interval should include the true effect about 95% of the time. A 99% confidence interval should include the true effect about 99% of the time.

439 Recall: 68-95-99. 7 rule for normal distributions
Recall: rule for normal distributions! These is a 95% chance that the sample mean will fall within two standard errors of the true mean= 62 +/- 2*3.3 = nmol/L to 68.6 nmol/L Mean Mean + 2 Std error =68.6 Mean - 2 Std error=55.4 To be precise, 95% of observations fall between Z= and Z= (so the “2” is a rounded number)…

440 95% confidence interval There is a 95% chance that the sample mean is between 55.4 nmol/L and nmol/L For every sample mean in this range, sample mean +/- 2 standard errors will include the true mean: For example, if the sample mean is nmol/L: 95% CI = /- 6.6 = 62.0 to 75.2 This interval just hits the true mean, 62.0.

441 95% confidence interval Thus, for normally distributed statistics, the formula for the 95% confidence interval is: sample statistic  2 x (standard error) Examples: 95% CI for mean vitamin D: 63 nmol/L  2 x (3.3) = 56.4 – 69.6 nmol/L 95% CI for the correlation coefficient: 0.15  2 x (0.1) = -.05 – .35

442 Simulation of 20 studies of 100 men…
Vertical line indicates the true mean (62) 95% confidence intervals for the mean vitamin D for each of the simulated studies. Only 1 confidence interval missed the true mean.

443 Confidence Intervals give:
*A plausible range of values for a population parameter. *The precision of an estimate.(When sampling variability is high, the confidence interval will be wide to reflect the uncertainty of the observation.) *Statistical significance (if the 95% CI does not cross the null value, it is significant at .05) It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 443

444 Confidence Intervals The value of the statistic in my sample (eg., mean, odds ratio, etc.) point estimate  (measure of how confident we want to be)  (standard error) From a Z table or a T table, depending on the sampling distribution of the statistic. Standard error of the statistic.

445 Common “Z” levels of confidence
Commonly used confidence levels are 90%, 95%, and 99% Confidence Level Z value 80% 90% 95% 98% 99% 99.8% 99.9% 1.28 1.645 1.96 2.33 2.58 3.08 3.27

446 99% confidence intervals…
99% CI for mean vitamin D: 63 nmol/L  2.6 x (3.3) = 54.4 – nmol/L 99% CI for the correlation coefficient: 0.15  2.6 x (0.1) = -.11 – .41

447 Testing Hypotheses 1. Is the mean vitamin D in middle- aged and older European men lower than 100 nmol/L (the “desirable” level)? 2. Is cognitive function correlated with vitamin D?

448 Is the mean vitamin D different than 100?
Start by assuming that the mean = 100 This is the “null hypothesis” This is usually the “straw man” that we want to shoot down Determine the distribution of statistics assuming that the null is true…

449 Computer simulation (10,000 repeats)…
This is called the null distribution! Normally distributed Std error = 3.3 Mean = 100

450 Compare the null distribution to the observed value…
What’s the probability of seeing a sample mean of 63 nmol/L if the true mean is 100 nmol/L? It didn’t happen in 10,000 simulated studies. So the probability is less than 1/10,000

451 Compare the null distribution to the observed value…
This is the p- value! P-value < 1/10,000

452 Calculating the p-value with a formula…
Because we know how normal curves work, we can exactly calculate the probability of seeing an average of 63 nmol/L if the true average weight is 100 (i.e., if our null hypothesis is true): Z= 11.2, P-value << .0001

453 The P-value P-value is the probability that we would have seen our data (or something more unexpected) just by chance if the null hypothesis (null value) is true. Small p-values mean the null value is unlikely given our data. Our data are so unlikely given the null hypothesis (<<1/10,000) that I’m going to reject the null hypothesis! (Don’t want to reject our data!)

454 P-value<.0001 means: The probability of seeing what you saw or something more extreme if the null hypothesis is true (due to chance)<.0001 P(empirical data/null hypothesis) <.0001

455 The P-value By convention, p-values of <.05 are often accepted as “statistically significant” in the medical literature; but this is an arbitrary cut-off. A cut-off of p<.05 means that in about 5 of 100 experiments, a result would appear significant just by chance (“Type I error”).

456 Summary: Hypothesis Testing
The Steps: 1.     Define your hypotheses (null, alternative) 2.     Specify your null distribution 3.     Do an experiment 4.     Calculate the p-value of what you observed 5.     Reject or fail to reject (~accept) the null hypothesis

457 Hypothesis Testing The Steps:
Define your hypotheses (null, alternative) The null hypothesis is the “straw man” that we are trying to shoot down. Null here: “mean vitamin D level = 100 nmol/L” Alternative here: “mean vit D < 100 nmol/L” (one-sided) Specify your sampling distribution (under the null) If we repeated this experiment many, many times, the mean vitamin D would be normally distributed around 100 nmol/L with a standard error of 3.3 3. Do a single experiment (observed sample mean = 63 nmol/L) 4. Calculate the p-value of what you observed (p<.0001) 5. Reject or fail to reject the null hypothesis (reject)

458 Confidence intervals give the same information (and more) than hypothesis tests…
458

459 Duality with hypothesis tests.
Null value 95% confidence interval Null hypothesis: Average vitamin D is 100 nmol/L Alternative hypothesis: Average vitamin D is not 100 nmol/L (two-sided) P-value < .05 459

460 Duality with hypothesis tests.
Null value 99% confidence interval Null hypothesis: Average vitamin D is 100 nmol/L Alternative hypothesis: Average vitamin D is not 100 nmol/L (two-sided) P-value < .01 460

461 2. Is cognitive function correlated with vitamin D?
Null hypothesis: r = 0 Alternative hypothesis: r  0 Two-sided hypothesis Doesn’t assume that the correlation will be positive or negative.

462 Computer simulation (15,000 repeats)…
Null distribution: Normally distributed Std error = 0.1 Mean = 0

463 What’s the probability of our data?
Even when the true correlation is 0, we get correlations as big as or bigger 7% of the time.

464 What’s the probability of our data?
This is a two-sided hypothesis test, so “more extreme” includes as big or bigger negative correlations (<-0.15). P-value = 7% + 7% = 14%

465 What’s the probability of our data?
Our results could have happened purely due to a fluke of chance!

466 Formal hypothesis test
1. Null hypothesis: r=0 Alternative: r  0 (two-sided) 2. Determine the null distribution Normally distributed Standard error = 0.1 3. Collect Data, r=0.15 4. Calculate the p-value for the data: Z = 5. Reject or fail to reject the null (fail to reject) Z of 1.5 corresponds to a two-sided p-value of 14%

467 Or use confidence interval to gauge statistical significance…
95% CI = to 0.35 Thus, 0 (the null value) is a plausible value! P>.05

468 Examples of Sample Statistics:
Single population mean Single population proportion Difference in means (ttest) Difference in proportions (Z-test) Odds ratio/risk ratio Correlation coefficient Regression coefficient It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 468

469 Example 2: HIV vaccine trial
Thai HIV vaccine trial (2009) 8197 randomized to vaccine 8198 randomized to placebo Generated a lot of public discussion about p-values!

470 51/8197 vs. 75/8198 =23 excess infections in the placebo group. =2.8 fewer infections per people vaccinated Source: BBC news,

471 Null hypothesis Null hypothesis: infection rate is the same in the two groups Alternative hypothesis: infection rates differ

472 Computer simulation assuming the null (15,000 repeats)…
Normally distributed, standard error = 11.1

473 Computer simulation assuming the null (15,000 repeats)…
If the vaccine is completely ineffective, we could still get 23 excess infections just by chance. Probability of 23 or more excess infections = 0.04

474 How to interpret p=.04… P(data/null) = .04 P(null/data) .04
*estimated using Bayes’ Rule (and prior data on the vaccine) *Gilbert PB, Berger JO, Stablein D, Becker S, Essex M, Hammer SM, Kim JH, DeGruttola VG. Statistical interpretation of the RV144 HIV vaccine efficacy trial in Thailand: a case study for statistical issues in efficacy trials. J Infect Dis 2011; 203:

475 Alternative analysis of the data (“intention to treat”)…
56/8202 (6.8 per 1000) infections in the vaccine group versus 76/8200 (9.3 per 1000)

476 Computer simulation assuming the null (15,000 repeats)…
Probability of 20 or more excess infections = 0.08 P=.08 is only slightly different than p=.04!

477 Confidence intervals…
95% CI (analysis 1): to 95% CI (analysis 2): to The plausible ranges are nearly identical!

478 One sample statistical tests, continued…

479 Recall: Single population mean (large n)
Hypothesis test: Confidence Interval 479

480 Single population mean (small n, normally distributed trait)
Hypothesis test: Confidence Interval 480

481 What is a T-distribution?
A t-distribution is like a Z distribution, except has slightly fatter tails to reflect the uncertainty added by estimating . The bigger the sample size (i.e., the bigger the sample size used to estimate ), then the closer t becomes to Z. If n>100, t approaches Z. 481

482 T-distribution with only 1 degree of freedom.

483 T-distribution with 4 degrees of freedom.

484 T-distribution with 9 degrees of freedom.

485 T-distribution with 29 degrees of freedom.

486 T-distribution with 99 degrees of freedom. Looks a lot like Z!!

487 Student’s t Distribution
Note: t Z as n increases Standard Normal (t with df = ) t (df = 13) t-distributions are bell- shaped and symmetric, but have ‘fatter’ tails than the normal t (df = 5) t from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004 487

488 Student’s t Table .05 2 t /2 = .05 2.920 Upper Tail Area df .25 .10 1
Let: n = df = n - 1 =  = /2 =.05 df .25 .10 .05 1 1.000 3.078 6.314 2 0.817 1.886 2.920 /2 = .05 3 0.765 1.638 2.353 The body of the table contains t values, not probabilities t 2.920 from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004 488

489 With comparison to the Z value
t distribution values With comparison to the Z value Confidence t t t Z Level (10 d.f.) (20 d.f.) (30 d.f.) ____ Note: t Z as n increases from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004 489

490 The T probability density function
What does t look like mathematically? (You may at least recognize some resemblance to the normal distribution function…) Where:      v is the degrees of freedom     (gamma) is the Gamma function     is the constant Pi ( )

491 The t-distribution in SAS
Yikes! The t-distribution looks like a mess! Don’t want to integrate! Luckily, there are charts and SAS! MUST SPECIFY DEGREES OF FREEDOM! The t-function in SAS is: probt(t-statistic, df)

492 The normality assumption…
Ttests (and all linear models, in fact) have a “normality assumption”: If the outcome variable is not normally distributed and the sample size is small, a ttest is inappropriate it takes longer for the CLT to kick in and the sample means do not immediately follow a t-distribution… This is the source of the “normality assumption” of the ttest…

493 Computer simulation of the distribution of the sample mean (non-normal, small n):
1. Pick any probability distribution and specify a mean and standard deviation. 2. Tell the computer to randomly generate 1000 observations from that probability distributions E.g., the computer is more likely to spit out values with high probabilities 3. Calculate 1000 T-statistics: 4. Plot the T-statistics in histograms. 5. Repeat for different sample sizes (n’s). IF the true mean was 128 with an average variability of 15 lbs…. 493

494 n=2, underlying distribution is exponential (mean=1, SD=1)
This is NOT a t- distribution!

495 n=5, underlying distribution is exponential (mean=1, SD=1)
This is NOT a t-distribution!

496 n=10, underlying distribution is exponential (mean=1, SD=1)
This doesn’t yet follow a t- distribution!

497 n=30, underlying distribution is exponential (mean=1, SD=1)
Still not quite a t-distribution! Note the left skew.

498 n=100, underlying distribution is exponential (mean=1, SD=1)
Now, pretty close to a T- distribution!

499 Conclusions If the underlying data are not normally distributed AND n is small**, the means do not follow a t- distribution (so using a ttest will result in erroneous inferences). Data transformation or non- parametric tests should be used instead. **How small is too small? No hard and fast rule—depends on the true shape of the underlying distribution. Here N>30 (closer to 100) is needed.

500 Practice Problem: A manufacturer of light bulbs claims that its light bulbs have a mean life of 1520 hours with an unknown standard deviation. A random sample of 40 such bulbs is selected for testing. If the sample produces a mean value of hours and a sample standard deviation of 86, is there sufficient evidence to claim that the mean life is significantly less than the manufacturer claimed? Assume that light bulb lifetimes are roughly normally distributed.

501 Answer 1. What is your null hypothesis?
Null hypothesis: mean life = 1520 hours Alternative hypothesis: mean life < 1520 hours 2. What is your null distribution? Since we have to estimate the standard deviation, we need to make inferences from a T-curve with 39 degrees of freedom. 3. Empirical evidence: 1 random sample of 40 has a mean of hours  5. Probably not sufficient evidence to reject the null. We cannot sue the light bulb manufacturer for false advertising! Notice that using t-distribution to calculate the p-value didn’t change much! With n>30, might as well use Z table.

502 Practice problem You want to estimate the average ages of kids that ride a particular kid’s ride at Disneyland. You take a random sample of 8 kids exiting the ride, and find that their ages are: 2,3,4,5,6,6,7,7. Assume that ages are roughly normally distributed. a. Calculate the sample mean. b. Calculate the sample standard deviation. c. Calculate the standard error of the mean. d. Calculate the 99% confidence interval. 502

503 Answer (a,b) a. Calculate the sample mean.
b. Calculate the sample standard deviation. 503

504 Answer (c) c. Calculate the standard error of the mean. 504

505 Answer (d) d. Calculate the 99% confidence interval. t7,.005=3.5 505

506 Example problem, class data:
A two-tailed hypothesis test: A researcher claims that Stanford affiliates eat fewer than the recommended intake of 5 fruits and vegetables per week. We have data to address this claim: 24 people in the class provided data on their daily fruit and vegetable intake. Do we have evidence to dispute her claim? 506

507 Histogram fruit and veggie intake (n=24)…
Mean=3.7 servings Median=3 servings Mode=3 servings Std Dev=1.7 servings 507

508 Answer 508 1. Define your hypotheses (null, alternative)
H0: P(average servings)=5.0 Ha: P(average servings)≠5.0 servings (two-sided) 2. Specify your null distribution 508

509 Answer, continued 509 3. Do an experiment
observed mean in our experiment = 3.7 servings T23 critical value for p<.05, two tailed = 2.07 4.         Calculate the p-value of what you observed   p-value < .05;   5.  Reject or fail to reject (~accept) the null hypothesis Reject! Stanford affiliates eat significantly fewer than the recommended servings of fruits and veggies. 509

510 95% Confidence Interval H0: P(average servings)=5.0
The 95% CI excludes 5, so p- value <.05 510

511 Paired data (repeated measures)
Patient BP Before (diastolic) BP After 1 100 92 2 89 84 3 83 80 4 98 93 5 108 6 95 90 What about these data? How do you analyze these? 511

512 Example problem: paired ttest
Patient Diastolic BP Before D. BP After Change 1 100 92 -8 2 89 84 -5 3 83 80 -3 4 98 93 5 108 -10 6 95 90 Null Hypothesis: Average Change = 0 512

513 Example problem: paired ttest
Change -8 -5 -3 -10 Null Hypothesis: Average Change = 0 With 5 df, T> corresponds to p<.05 (two-sided test) 513

514 Example problem: paired ttest
Change -8 -5 -3 -10 Note: does not include 0. 514

515 Summary: Single population mean (small n, normality)
Hypothesis test: Confidence Interval 515

516 Summary: paired ttest Hypothesis test: Confidence Interval
Where d=change over time or difference within a pair. 516

517 Summary: Single population mean (large n)
Hypothesis test: Confidence Interval 517

518 Examples of Sample Statistics:
Single population mean (known ) Single population mean (unknown ) Single population proportion Difference in means (ttest) Difference in proportions (Z-test) Odds ratio/risk ratio Correlation coefficient Regression coefficient It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 518

519 Recall: normal approximation to the binomial…
Statistics for proportions are based on a normal distribution, because the binomial can be approximated as normal if np>5

520 Recall: stats for proportions
For binomial: Differs by a factor of n. Diffe rs by a fact or of n. For proportion: P-hat stands for “sample proportion.”

521 Sampling distribution of a sample proportion
p=true population proportion. BUT… if you knew p you wouldn’t be doing the experiment! Always a normal distribution! 521

522 Practice Problem A fellow researcher claims that at least 15% of smokers fail to eat any fruits and vegetables at least 3 days a week. You find this hard to believe and decide to check the validity of this statistic by taking a random (representative) sample of smokers. Do you have sufficient evidence to reject your colleague’s claim if you discover that 17 of the 200 smokers in your sample eat no fruits and vegetables at least 3 days a week?

523 Answer 1. What is your null hypothesis?
Null hypothesis: p=proportion of smokers who skip fruits and veggies frequently >= .15 Alternative hypothesis: p < .15 2. What is your null distribution? Var( ) = .15*.85/200 = SD( ) = .025 ~ N (.15, .025) 3. Empirical evidence: 1 random sample: = 17/200 = .085 4. Z = ( )/.025 = -2.6 p-value = P(Z<-2.6) = .0047 5. Sufficient evidence to reject the claim.

524 OR, use computer simulation…
1. Have SAS randomly pick 200 observations from a binomial distribution with p=.15 (the null). 2. Divide the resulting count by 200 to get the observed sample proportion. 3. Repeat this 1000 times (or some arbitrarily large number of times). 4. Plot the resulting distribution of sample proportions in a histogram:

525 How often did we get observed values of 0. 085 or lower when true p=
Only 4/1000 times! Emprical p-value=.004

526 Practice Problem In Saturday’s newspaper, in a story about poll results from Ohio, the article said that 625 people in Ohio were sampled and claimed that the margin of error in the results was 4%. Can you explain where that 4% margin of error came from?

527 Answer

528 Paired data proportions test…
Analogous to paired ttest… Also takes on a slightly different form known as McNemar’s test (we’ll see lots more on this next term…)

529 Paired data proportions test…
1000 subjects were treated with antidepressants for 6 months and with placebo for 6 months (order of tx was randomly assigned) Question: do suicide attempts (yes/no) differ depending on whether a subject is on antidepressants or on placebo?

530 Paired data proportions test…
15 subjects attempted suicide in both conditions (non-informative) 10 subjects attempted suicide in the antidepressant condition but not the placebo condition 5 subjects attempted suicide in the placebo condition but not the antidepressant condition 970 did not attempt suicide in either condition (non-informative) Data boils down to 15 observations… In 10/15 cases (66.6%), antidepressant>placebo.

531 Paired proportions test…
Single proportions test: Under the null hypothesis, antidepressants and placebo work equally well. So, Ho: among discordant cases, p (antidepressant>placebo) = 0.5 Observed p = .666 Not enough evidence to reject the null!

532 Key one-sample Hypothesis Tests…
Test for Ho: μ = μ0 Test for Ho: p = po: Tn-1 approaches Z for large n. ** If np (expected value)<5, use exact binomial rather than Z approximation … 532

533 Corresponding confidence intervals…
For a mean: For a proportion: Tn-1 approaches Z for large n. ** If np (expecte d value)<5 , use exact binomial rather than Z approxim ation… 533

534 Symbol overload! n: Sample size Z: Z-statistic (standard normal)
tdf: T-statistic (t-distribution with df degrees of freedom) p: (“p-hat”): sample proportion X: (“X-bar”): sample mean s: Sample standard deviation p0: Null hypothesis proportion 0: Null hypothesis mean 534

535 Pitfalls of Hypothesis Testing

536 Hypothesis Testing The Steps:
1.     Define your hypotheses (null, alternative) 2.     Specify your null distribution 3.     Do an experiment 4.     Calculate the p-value of what you observed 5.     Reject or fail to reject (~accept) the null hypothesis Follows the logic: If A then B; not B; therefore, not A.

537 Summary: The Underlying Logic of hypothesis tests…
Follows this logic: Assume A. If A, then B. Not B. Therefore, Not A. But throw in a bit of uncertainty…If A, then probably B…

538 Error and Power POWER (the flip side of type-II error: 1- β):
Type-I Error (also known as “α”): Rejecting the null when the effect isn’t real. Type-II Error (also known as “β “): Failing to reject the null when the effect is real. POWER (the flip side of type-II error: 1- β): The probability of seeing a true effect if one exists. Note the sneaky conditionals…

539 Think of… Pascal’s Wager
Your Decision The TRUTH God Exists God Doesn’t Exist Reject God BIG MISTAKE Correct Accept God Correct— Big Pay Off MINOR MISTAKE

540 (example: the drug doesn’t work) (example: the drug works)
Type I and Type II Error in a box Your Statistical Decision True state of null hypothesis H0 True (example: the drug doesn’t work) H0 False (example: the drug works) Reject H0 (ex: you conclude that the drug works) Type I error (α) Correct Do not reject H0 (ex: you conclude that there is insufficient evidence that the drug works) Type II Error (β)

541 Error and Power Type I error rate (or significance level): the probability of finding an effect that isn’t real (false positive). If we require p-value<.05 for statistical significance, this means that 1/20 times we will find a positive result just by chance. Type II error rate: the probability of missing an effect (false negative). Statistical power: the probability of finding an effect if it is there (the probability of not making a type II error). When we design studies, we typically aim for a power of 80% (allowing a false negative rate, or type II error rate, of 20%).

542 Pitfall 1: over-emphasis on p-values
Clinically unimportant effects may be statistically significant if a study is large (and therefore, has a small standard error and extreme precision). Pay attention to effect size and confidence intervals. 542

543 Example: effect size A prospective cohort study of 34,079 women found that women who exercised >21 MET hours per week gained significantly less weight than women who exercised <7.5 MET hours (p<.001) Headlines: “To Stay Trim, Women Need an Hour of Exercise Daily.” Physical Activity and Weight Gain Prevention. JAMA 2010;303:

544 Mean (SD) Differences in Weight Over Any 3-Year Period by Physical Activity Level, Women's Health Study, a Lee, I. M. et al. JAMA 2010;303: Copyright restrictions may apply. 544

545 What was the effect size. Those who exercised the least 0. 15 kg (
What was the effect size? Those who exercised the least 0.15 kg (.33 pounds) more than those who exercised the most over 3 years. Extrapolated over 13 years of the study, the high exercisers gained 1.4 pounds less than the low exercisers! Classic example of a statistically significant effect that is not clinically significant. 545

546 A picture is worth… 546

547 Authors explain: “Figure 2 shows the trajectory of weight gain over time by baseline physical activity levels. When classified by this single measure of physical activity, all 3 groups showed similar weight gain patterns over time.” A picture is worth… But baseline physical activity should predict weight gain in the first three years…do those slopes look different to you? 547

548 Another recent headline
Drinkers May Exercise More Than Teetotalers Activity levels rise along with alcohol use, survey shows “MONDAY, Aug. 31 (HealthDay News) -- Here's something to toast: Drinkers are often exercisers”… “In reaching their conclusions, the researchers examined data from participants in the 2005 Behavioral Risk Factor Surveillance System, a yearly telephone survey of about 230,000 Americans.”… For women, those who imbibed exercised 7.2 minutes more per week than teetotalers. The results applied equally to men…

549 Pitfall 2: association does not equal causation
Statistical significance does not imply a cause-effect relationship. Interpret results in the context of the study design. 549

550 Pitfall 3: data dredging/multiple comparisons
In 1980, researchers at Duke randomized 1073 heart disease patients into two groups, but treated the groups equally. Not surprisingly, there was no difference in survival. Then they divided the patients into 18 subgroups based on prognostic factors. In a subgroup of 397 patients (with three-vessel disease and an abnormal left ventricular contraction) survival of those in “group 1” was significantly different from survival of those in “group 2” (p<.025). How could this be since there was no treatment? (Lee et al. “Clinical judgment and statistics: lessons from a simulated randomized trial in coronary artery disease,” Circulation, 61: , 1980.) 550

551 Pitfall 3: multiple comparisons
The difference resulted from the combined effect of small imbalances in the subgroups 551

552 Multiple comparisons By using a p-value of 0.05 as the criterion for significance, we’re accepting a 5% chance of a false positive (of calling a difference significant when it really isn’t). If we compare survival of “treatment” and “control” within each of 18 subgroups, that’s 18 comparisons. If these comparisons were independent, the chance of at least one false positive would be…

553 Multiple comparisons With 18 independent comparisons, we have 60% chance of at least 1 false positive.

554 Multiple comparisons With 18 independent comparisons, we expect about 1 false positive.

555 Pitfall 3: multiple comparisons
A significance level of 0.05 means that your false positive rate for one test is 5%. If you run more than one test, your false positive rate will be higher than 5%. Control study-wide type I error by planning a limited number of tests. Distinguish between planned and exploratory tests in the results. Correct for multiple comparisons. 555

556 Results from Class survey…
My research question was actually to test whether or not being born on odd or even days predicted anything about your future. In fact, I discovered that people who were born on even days: Had significantly better English SATs (p=.04) Tended to enjoy manuscript writing more (p=.09) Tended to be more pessimistic (p=.09)

557 Results from Class survey…
The differences were clinically meaningful. Compared with those born on odd days (n=11), those born on even days (n=13): Scored 65 points higher on the English SAT (720 vs. 655) Enjoyed manuscript writing by 1.5 units more (6.2 vs. 4.8) Were less optimistic by 1.5 units (6.7 vs. 8.2)

558 Results from Class survey…
I can see the NEJM article title now… “Being born on even days makes you a better writer, but may predispose to depression.”

559 Results from Class survey…
Assuming that this difference can’t be explained by astrology, it’s obviously an artifact! What’s going on?…

560 Results from Class survey…
After the odd/even day question, I asked you 25 other questions… I ran 25 statistical tests (comparing the outcome variable between odd-day born people and even-day born people). So, there was a high chance of finding at least one false positive!

561 P-value distribution for the 25 tests…
Under the null hypothesis of no associations (which we’ll assume is true here!), p-values follow a uniform distribution… My “significant” and near significant p- values!

562 Compare with… Next, I generated 25 “p- values” from a random number generator (uniform distribution). These were the results from three runs…

563 In the medical literature…
Researchers examined the relationship between intakes of caffeine/coffee/tea and breast cancer overall and in multiple subgroups (50 tests) Overall, there was no association Risk ratios were close to 1.0 (ranging from 0.67 to 1.79), indicated protection (<1.0) about as often harm (>1.0), and showed no consistent dose- response pattern But they found 4 “significant” p-values in subgroups: coffee intake was linked to increased risk in those with benign breast disease (p=.08) caffeine intake was linked to increased risk of estrogen/progesterone negative tumors and tumors larger than 2 cm (p=.02) decaf coffee was linked to reduced risk of BC in postmenopausal hormone users (p=.02) Ishitani K, Lin J, PhD, Manson JE, Buring JE, Zhang SM. Caffeine consumption and the risk of breast cancer in a large prospective cohort of women. Arch Intern Med. 2008;168:

564 Distribution of the p-values from the 50 tests
Likely chance findings ! Also, effect sizes showed no consistent pattern. The risk ratios: -were close to 1.0 (ranging from 0.67 to 1.79) -indicated protection (<1.0) about as often harm (>1.0) -showed no consistent dose- response pattern.

565 Hallmarks of a chance finding:
Analyses are exploratory Many tests have been performed but only a few are significant The significant p-values are modest in size (between p=0.01 and p=0.05) The pattern of effect sizes is inconsistent The p-values are not adjusted for multiple comparisons

566 Pitfall 4: high type II error (low statistical power)
Results that are not statistically significant should not be interpreted as "evidence of no effect,” but as “no evidence of effect” Studies may miss effects if they are insufficiently powered (lack precision). Example: A study of 36 postmenopausal women failed to find a significant relationship between hormone replacement therapy and prevention of vertebral fracture. The odds ratio and 95% CI were: 0.38 (0.12, 1.19), indicating a potentially meaningful clinical effect. Failure to find an effect may have been due to insufficient statistical power for this endpoint. Design adequately powered studies and interpret in the context of study power if results are null. Ref: Wimalawansa et al. Am J Med 1998, 104: 566

567 Pitfall 5: the fallacy of comparing statistical significance
“the effect was significant in the treatment group, but not significant in the control group” does not imply that the groups differ significantly

568 Example In a placebo-controlled randomized trial of DHA oil for eczema, researchers found a statistically significant improvement in the DHA group but not the placebo group. The abstract reports: “DHA, but not the control treatment, resulted in a significant clinical improvement of atopic eczema.” However, the improvement in the treatment group was not significantly better than the improvement in the placebo group, so this is actually a null result.

569 Misleading “significance comparisons”
The improvement in the DHA group (18%) is not significantly greater than the improvement in the control group (11%). Koch C, Dölle S, Metzger M, et al. Docosahexaenoic acid (DHA) supplementation in atopic eczema: a randomized, double-blind, controlled trial. Br J Dermatol 2008;158:

570 Within-group vs. between-group tests
Examples of statistical tests used to evaluate within-group effects versus statistical tests used to evaluate between-group effects Statistical tests for within-group effects Statistical tests for between-group effects Paired ttest Two-sample ttest Wilcoxon sign-rank test Wilcoxon sum-rank test (equivalently, Mann-Whitney U test) Repeated-measures ANOVA, time effect ANOVA; repeated-measures ANOVA, group*time effect McNemar’s test Difference in proportions, Chi-square test, or relative risk

571 Also applies to interactions…
Similarly, “we found a significant effect in subgroup 1 but not subgroup 2” does not constitute prove of interaction For example, if the effect of a drug is significant in men, but not in women, this is not proof of a drug-gender interaction.

572 Overview of statistical tests

573 Which test should I use? Outcome Variable
Are the observations independent or correlated? Assumption s independent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups

574 Which test should I use? 1. What is the dependent variable?
Outcome Variable Are the observations independent or correlated? Assumption s independent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups

575 Which test should I use? 2. Are the observations correlated?
Outcome Variable Are the observations independent or correlated? Assumption s independent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups

576 Which test should I use? 3. Are key model assumptions met?
Outcome Variable Are the observations independent or correlated? Assumption s independent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups

577 Are the observations correlated?
What is the unit of observation? person* (most common) limb half a face physician clinical center Are the observations independent or correlated? Independent: observations are unrelated (usually different, unrelated people) Correlated: some observations are related to one another, for example: the same person over time (repeated measures), legs within a person, half a face

578 Example: correlated data
Split-face trial: Researchers assigned 56 subjects to apply SPF 85 sunscreen to one side of their faces and SPF 50 to the other prior to engaging in 5 hours of outdoor sports during mid-day. The outcome is sunburn (yes/no). Unit of observation = side of a face Are the observations correlated? Yes. Russak JE et al. JAAD 2010; 62:

579 Results ignoring correlation:
Table I   --  Dermatologist grading of sunburn after an average of 5 hours of skiing/snowboarding (P = .03; Fisher’s exact test) Sun protection factor Sunburned Not sunburned 85 1 55 50 8 48 Fisher’s exact test compares the following proportions: 1/56 versus 8/56. Note that individuals are being counted twice!

580 Correct analysis of data:
Table 1. Correct presentation of the data from: Russak JE et al. JAAD 2010; 62: (P = .016; McNemar’s exact test). SPF-50 side SPF-85 side Sunburned Not sunburned 1 7 48 McNemar’s exact test evaluates the probability of the following: In all 7 out of 7 cases where the sides of the face were discordant (i.e., one side burnt and the other side did not), the SPF 50 side sustained the burn.

581 Correlations Ignoring correlations will:
overestimate p-values for within- person or within-cluster comparisons underestimate p-values for between- person or between-cluster comparisons

582 Common statistics for various types of outcome data
Are key model assumptions met? Outcome Variable Are the observations independent or correlated? Assumption s independent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups

583 Key assumptions of linear models
Assumptions for linear models (ttest, ANOVA, linear correlation, linear regression, paired ttest, repeated- measures ANOVA, mixed models): Normally distributed outcome variable Most important for small samples; large samples are quite robust against this assumption. Predictors have a linear relationship with the outcome Graphical displays can help evaluate this.

584 Common statistics for various types of outcome data
Outcome Variable Are the observations independent or correlated? Assumption s independent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups Are key model assumptions met?

585 Key assumptions for categorical tests
Assumptions for categorical tests (relative risks, chi-square, logistic regression, McNemar’s test): Sufficient numbers in each cell (np>=5) In the sunscreen trial, “exact” tests (Fisher’s exact, McNemar’s exact) were used because of the sparse data.

586 Continuous outcome (means); HRP 259/HRP 262
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

587 Binary or categorical outcomes (proportions); HRP 259/HRP 261
Outcome Variable Are the observations correlated? Alternative to the chi-square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between more than two groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate- adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).

588 Time-to-event outcome (survival data); HRP 262
Outcome Variable Are the observation groups independent or correlated? Modifications to Cox regression if proportional- hazards is violated: independent correlated Time-to- event (e.g., time to fracture) Kaplan-Meier statistics: estimates survival functions for each group (usually displayed graphically); compares survival functions with log-rank test Cox regression: Multivariate technique for time-to-event data; gives multivariate-adjusted hazard ratios n/a (already over time) Time-dependent predictors or time-dependent hazard ratios (tricky!)

589 Two-sample tests

590 Binary or categorical outcomes (proportions)
Outcome Variable Are the observations correlated? Alternative to the chi-square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate- adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).

591 Recall: The odds ratio (two samples=cases and controls)
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Interpretation: there is a 2.25-fold higher odds of stroke in smokers vs. non-smokers.

592 Inferences about the odds ratio…
Does the sampling distribution follow a normal distribution? What is the standard error?

593 Simulation… 1. In SAS, assume infinite population of cases and controls with equal proportion of smokers (exposure), p=.23 (UNDER THE NULL!) 2. Use the random binomial function to randomly select n=50 cases and n=50 controls each with p=.23 chance of being a smoker. 3. Calculate the observed odds ratio for the resulting 2x2 table. 4. Repeat this 1000 times (or some large number of times). 5. Observe the distribution of odds ratios under the null hypothesis.

594 Properties of the OR (simulation)
(50 cases/50 controls/23% exposed) Under the null, this is the expected variability of the sample ORnote the right skew

595 Properties of the lnOR Normal!

596 Properties of the lnOR From the simulation, can get the empirical standard error (~0.5) and p- value (~.10)

597 Properties of the lnOR Or, in general, standard error =

598 Inferences about the ln(OR)
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 p=.10

599 Confidence interval… Final answer: 2.25 (0.85,5.92) Smoker (E)
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Final answer: (0.85,5.92)

600 Practice problem: Suppose the following data were collected in a case-control study of brain tumor and cell phone usage: Brain tumor No brain tumor Own a cell phone 20 60 Don’t own a cell phone 10 40 Is there sufficient evidence for an association between cell phones and brain tumor?

601 Answer 1. What is your null hypothesis? Null hypothesis: OR=1.0; lnOR = 0 Alternative hypothesis: OR 1.0; lnOR>0 2. What is your null distribution? lnOR~ N(0, ) ; =SD (lnOR) = .44 3. Empirical evidence: = 20*40/60*10 =800/600 = 1.33  lnOR = .288 4. Z = (.288-0)/.44 = .65 p-value = P(Z>.65 or Z<-.65) = .26*2 5. Not enough evidence to reject the null hypothesis of no association TWO-SIDED TEST TWO-SIDED TEST: it would be just as extreme if the sample lnOR were .65 standard deviations or more below the null mean

602 Key measures of relative risk: 95% CIs OR and RR:
For an odds ratio, 95% confidence limits: For a risk ratio, 95% confidence limits:

603 Continuous outcome (means)
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

604 The two-sample t-test 604

605 The two-sample T-test Is the difference in means that we observe between two groups more than we’d expect to see based on chance alone? 605

606 The standard error of the difference of two means
**First add the variances and then take the square root of the sum to get the standard error. Recall, Var (A-B) = Var (A) + Var (B) if A and B are independent! 606

607 Shown by simulation: One sample of 30 (with SD=5).
Difference of the two samples.

608 Distribution of differences
If X and Y are the averages of n and m subjects, respectively: 608

609 But… As before, you usually have to use the sample SD, since you won’t know the true SD ahead of time… So, again becomes a T- distribution... 609

610 Estimated standard error of the difference….
Just plug in the sample standard deviations for each group.

611 Case 1: un-pooled variance
Question: What are your degrees of freedom here? Answer: Not obvious!

612 Case 1: ttest, unpooled variances
It is complicated to figure out the degrees of freedom here! A good approximation is given as df ≈ harmonic mean (or SAS will tell you!):

613 Case 2: pooled variance If you assume that the standard deviation of the characteristic (e.g., IQ) is the same in both groups, you can pool all the data to estimate a common standard deviation. This maximizes your degrees of freedom (and thus your power). Degrees of Freedom!

614 Estimated standard error (using pooled variance estimate)
The degrees of freedom are n+m-2 614

615 Case 2: ttest, pooled variances

616 Alternate calculation formula: ttest, pooled variance

617 Pooled vs. unpooled variance
Rule of Thumb: Use pooled unless you have a reason not to. Pooled gives you more degrees of freedom. Pooled has extra assumption: variances are equal between the two groups. SAS automatically tests this assumption for you (“Equality of Variances” test). If p<.05, this suggests unequal variances, and better to use unpooled ttest.

618 Example: two-sample t-test
In 1980, some researchers reported that “men have more mathematical ability than women” as evidenced by the 1979 SAT’s, where a sample of 30 random male adolescents had a mean score ± 1 standard deviation of 436±77 and 30 random female adolescents scored lower: 416±81 (genders were similar in educational backgrounds, socio-economic status, and age). Do you agree with the authors’ conclusions? 618

619 Sample Standard Deviation
Data Summary n Sampl e Mean Sample Standard Deviation Group 1: women 30 416 81 Group 2: men 436 77

620 Two-sample t-test 1. Define your hypotheses (null, alternative)
H0: ♂-♀ math SAT = 0 Ha: ♂-♀ math SAT ≠ 0 [two-sided]

621 Two-sample t-test 2. Specify your null distribution:
F and M have similar standard deviations/variances, so make a “pooled” estimate of variance.

622 Two-sample t-test 3. Observed difference in our experiment = 20 points

623 Two-sample t-test 4. Calculate the p-value of what you observed
data _null_; pval=(1-probt(.98, 58))*2; put pval; run; 5. Do not reject null! No evidence that men are better in math ;)

624 Example 2: Difference in means
Example: Rosental, R. and Jacobson, L. (1966) Teachers’ expectancies: Determinates of pupils’ I.Q. gains. Psychological Reports, 19,

625 The Experiment (note: exact numbers have been altered)
Grade 3 at Oak School were given an IQ test at the beginning of the academic year (n=90). Classroom teachers were given a list of names of students in their classes who had supposedly scored in the top 20 percent; these students were identified as “academic bloomers” (n=18). BUT: the children on the teachers lists had actually been randomly assigned to the list. At the end of the year, the same I.Q. test was re-administered.

626 Example 2 Statistical question: Do students in the treatment group have more improvement in IQ than students in the control group? What will we actually compare? One-year change in IQ score in the treatment group vs. one-year change in IQ score in the control group. 626

627 “Academic bloomers” (n=18)
The standard deviation of change scores was 2.0 in both groups. This affects statistical significance… Results: “Academic bloomers” (n=18) Controls (n=72) Change in IQ score: 12.2 (2.0)  8.2 (2.0) 12.2 points 8.2 points Difference=4 points 627

628 What does a 4-point difference mean?
Before we perform any formal statistical analysis on these data, we already have a lot of information. Look at the basic numbers first; THEN consider statistical significance as a secondary guide. 628

629 Is the association statistically significant?
This 4-point difference could reflect a true effect or it could be a fluke. The question: is a 4-point difference bigger or smaller than the expected sampling variability? 629

630 Hypothesis testing Step 1: Assume the null hypothesis. Null hypothesis: There is no difference between “academic bloomers” and normal students (= the difference is 0%) 630

631 Hypothesis Testing Step 2: Predict the sampling variability assuming the null hypothesis is true These predictions can be made by mathematical theory or by computer simulation. 631

632 Hypothesis Testing Step 2: Predict the sampling variability assuming the null hypothesis is true—math theory: 632

633 Hypothesis Testing Step 2: Predict the sampling variability assuming the null hypothesis is true—computer simulation: In computer simulation, you simulate taking repeated samples of the same size from the same population and observe the sampling variability. I used computer simulation to take samples of 18 treated and 72 controls 633

634 Computer Simulation Results
Standard error is about 0.52 634

635 3. Empirical data Observed difference in our experiment = = 4.0 635

636 4. P-value t-curve with 88 df’s has slightly wider cut-off’s for 95% area (t=1.99) than a normal curve (Z=1.96)   p-value <.0001 636

637 Visually… If we ran this study times, we wouldn’t expect to get 1 result as big as a difference of 4 (under the null hypothesis). 637

638 5. Reject null! Conclusion: I.Q. scores can bias expectancies in the teachers’ minds and cause them to unintentionally treat “bright” students differently from those seen as less bright. 638

639 Confidence interval (more information!!)
95% CI for the difference: 4.0±1.99(.52) = (3.0 – 5.0) t-curve with 88 df’s has slightly wider cut-off’s for 95% area (t=1.99) than a normal curve (Z=1.96) 639

640 What if our standard deviation had been higher?
The standard deviation for change scores in treatment and control were each 2.0. What if change scores had been much more variable—say a standard deviation of 10.0 (for both)? 640

641 Std. dev in change scores = 2.0
Standard error is 0.52 Std. dev in change scores = 2.0 Std. dev in change scores = 10.0 Standard error is 2.58 641

642 With a std. dev. of 10.0… LESS STATISICAL POWER!
Standard error is 2.58 If we ran this study times, we would expect to get +4.0 or –4.0 12% of the time. P-value=.12 642

643 Don’t forget: The paired T-test
Did the control group in the previous experiment improve at all during the year? Do not apply a two-sample ttest to answer this question! After-Before yields a single sample of differences… “within-group” rather than “between- group” comparison… 643

644 Continuous outcome (means);
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

645 Sample Standard Deviation
Data Summary n Sampl e Mean Sample Standard Deviation Group 1: Change 72 +8.2 2.0 645

646 Did the control group in the previous experiment improve at all during the year?
p-value <.0001

647 Normality assumption of ttest
If the distribution of the trait is normal, fine to use a t-test. But if the underlying distribution is not normal and the sample size is small (rule of thumb: n>30 per group if not too skewed; n>100 if distribution is really skewed), the Central Limit Theorem takes some time to kick in. Cannot use ttest. Note: ttest is very robust against the normality assumption!

648 Alternative tests when normality is violated: Non-parametric tests

649 Continuous outcome (means);
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

650 Non-parametric tests t-tests require your outcome variable to be normally distributed (or close enough), for small samples. Non-parametric tests are based on RANKS instead of means and standard deviations (=“population parameters”).

651 Example: non-parametric tests
10 dieters following Atkin’s diet vs. 10 dieters following Jenny Craig Hypothetical RESULTS: Atkin’s group loses an average of 34.5 lbs. J. Craig group loses an average of 18.5 lbs. Conclusion: Atkin’s is better?

652 Example: non-parametric tests
BUT, take a closer look at the individual data… Atkin’s, change in weight (lbs): +4, +3, 0, -3, -4, -5, -11, -14, -15, -300 J. Craig, change in weight (lbs) -8, -10, -12, -16, -18, -20, -21, -24, -26, -30

653 Jenny Craig Weight Change 30 25 20 P e r c 15 e n t 10 5 -30 -25 -20
-30 -25 -20 -15 -10 -5 5 10 15 20 Weight Change

654 Atkin’s Weight Change 30 25 20 P e r c 15 e n t 10 5 -300 -280 -260
-300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 20 Weight Change

655 t-test inappropriate…
Comparing the mean weight loss of the two groups is not appropriate here. The distributions do not appear to be normally distributed. Moreover, there is an extreme outlier (this outlier influences the mean a great deal).

656 Wilcoxon rank-sum test
RANK the values, 1 being the least weight loss and 20 being the most weight loss. Atkin’s +4, +3, 0, -3, -4, -5, -11, -14, -15, -300  1, 2, 3, 4, 5, 6, 9, 11, 12, 20 J. Craig -8, -10, -12, -16, -18, -20, -21, -24, -26, -30 7, 8, 10, 13, 14, 15, 16, 17, 18, 19

657 Wilcoxon rank-sum test
Sum of Atkin’s ranks:   =73 Sum of Jenny Craig’s ranks: =137 Jenny Craig clearly ranked higher! P-value *(from computer) = .018 *For details of the statistical test, see appendix of these slides…

658 Binary or categorical outcomes (proportions)
Outcome Variable Are the observations correlated? Alternative to the chi-square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate- adjusted odds ratios McNemar’s chi-square test: compares binary outcome between two correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).

659 Difference in proportions (special case of chi-square test)

660 Analagous to pooled variance in the ttest
Null distribution of a difference in proportions Standard error of a proportion= Standard error can be estimated by= (still normally distributed) Standard error of the difference of two proportions= The variance of a difference is the sum of variances (as with difference in means). Analagous to pooled variance in the ttest

661 Null distribution of a difference in proportions
Difference of proportions

662 Difference in proportions test
Follows a normal because binomial can be approximated with normal Difference in proportions test Null hypothesis: The difference in proportions is 0. Recall, variance of a proportion is p(1-p)/n Use average (or pooled) proportion in standard error formula, because under the null hypothesis, groups have equal proportions. 662

663 Recall case-control example:
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50

664 Absolute risk: Difference in proportions exposed
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50

665 Difference in proportions exposed

666 Example 2: Difference in proportions
Research Question: Are antidepressants a risk factor for suicide attempts in children and adolescents? Example modified from: “Antidepressant Drug Therapy and Suicide in Severely Depressed Children and Adults ”; Olfson et al. Arch Gen Psychiatry.2006;63:

667 Example 2: Difference in Proportions
Design: Case-control study Methods: Researchers used Medicaid records to compare prescription histories between 263 children and teenagers (6-18 years) who had attempted suicide and 1241 controls who had never attempted suicide (all subjects suffered from depression). Statistical question: Is a history of use of antidepressants more common among cases than controls?

668 Example 2 Statistical question: Is a history of use of antidepressants more common among heart disease cases than controls? What will we actually compare? Proportion of cases who used antidepressants in the past vs. proportion of controls who did

669 Results 46% No (%) of cases (n=263) No (%) of controls (n=1241)
Any antidepressant drug ever 120 (46%)  448 (36%) 46% 36% Difference=10%

670 Is the association statistically significant?
This 10% difference could reflect a true association or it could be a fluke in this particular sample. The question: is 10% bigger or smaller than the expected sampling variability?

671 Hypothesis testing Step 1: Assume the null hypothesis. Null hypothesis: There is no association between antidepressant use and suicide attempts in the target population (= the difference is 0%) 671

672 Hypothesis Testing Step 2: Predict the sampling variability assuming the null hypothesis is true

673 Also: Computer Simulation Results
Standard error is about 3.3% 673

674 Hypothesis Testing Step 3: Do an experiment We observed a difference of 10% between cases and controls.

675 Hypothesis Testing Step 4: Calculate a p-value

676 P-value from our simulation…
When we ran this study 1000 times, we got 1 result as big or bigger than 10%. We also got 3 results as small or smaller than – 10%. 676

677 P-value From our simulation, we estimate the p-value to be:
4/1000 or .004 677

678 Hypothesis Testing Here we reject the null.
Step 5: Reject or do not reject the null hypothesis. Here we reject the null. Alternative hypothesis: There is an association between antidepressant use and suicide in the target population.

679 What would a lack of statistical significance mean?
If this study had sampled only 50 cases and 50 controls, the sampling variability would have been much higher—as shown in this computer simulation…

680 263 cases and 1241 controls. 50 cases and 50 controls.
Standard error is about 3.3% 263 cases and controls. Standard error is about 10% 50 cases and 50 controls.

681 With only 50 cases and 50 controls…
If we ran this study times, we would expect to get values of 10% or higher 170 times (or 17% of the time). Standard error is about 10% 681

682 Two-tailed p-value Two-tailed p-value = 17%x2=34% 682

683 Practice problem… An August 2003 research article in Developmental and Behavioral Pediatrics reported the following about a sample of UK kids: when given a choice of a non- branded chocolate cereal vs. CoCo Pops, 97% (36) of 37 girls and 71% (27) of 38 boys preferred the CoCo Pops. Is this evidence that girls are more likely to choose brand-named products?

684 Answer Null says p’s are equal so estimate standard error using overall observed p 1. Hypotheses: H0: p♂-p♀= 0 Ha: p♂-p♀≠ 0 [two-sided] 2. Null distribution of difference of two proportions: 3. Observed difference in our experiment = = .26 4. Calculate the p-value of what you observed: data _null_; pval=(1-probnorm(3.06))*2; put pval; run; 5. p-value is sufficiently low for us to reject the null; there does appear to be a difference in gender preferences here.

685 Key two-sample Hypothesis Tests…
Test for Ho: μx- μy = 0 (σ2 unknown, but roughly equal): Test for Ho: p1- p2= 0: 685

686 Corresponding confidence intervals…
For a difference in means, 2 independent samples (σ2’s unknown but roughly equal): For a difference in proportions, 2 independent samples: 686

687 Appendix: details of rank-sum test…

688 Wilcoxon Rank-sum test

689 Example For example, if team 1 and team 2 (two gymnastic teams) are competing, and the judges rank all the individuals in the competition, how can you tell if team 1 has done significantly better than team 2 or vice versa?

690 Answer Intuition: under the null hypothesis of no difference between the two groups… If n1=n2, the sums of T1 and T2 should be equal. But if n1 ≠n2, then T2 (n2=bigger group) should automatically be bigger. But how much bigger under the null? For example, if team 1 has 3 people and team 2 has 10, we could rank all 13 participants from 1 to 13 on individual performance. If team1 (X) and team2 don’t differ in talent, the ranks ought to be spread evenly among the two groups, e.g.… 1 2 X X X (exactly even distribution if team1 ranks 3rd, 7th, and 11th)

691 Remember this? Take-home point:
sum of within-group ranks for smaller group. sum of within-group ranks for larger group. Take-home point:

692 It turns out that, if the null hypothesis is true, the difference between the larger-group sum of ranks and the smaller-group sum of ranks is exactly equal to the difference between T1 and T2

693 From slide 23 From slide 24 Define new statistics Here, under null: U2= U1= U2+U1=30

694  under null hypothesis, U1 should equal U2:
The U’s should be equal to each other and will equal n1n2/2: U1 + U2 = n1n2 Under null hypothesis, U1 = U2 = U0 E(U1 + U2) = 2E(U0) = n1n2 E(U1 = U2=U0) = n1n2/2 So, the test statistic here is not quite the difference in the sum-of-ranks of the 2 groups It’s the smaller observed U value: U0 For small n’s, take U0, and get p-value directly from a U table.

695 For large enough n’s (>10 per group)…

696 Add observed data to the example…
Example: If the girls on the two gymnastics teams were ranked as follows: Team 1: 1, 5, Observed T1 = 13 Team 2: 2,3,4,6,8,9,10,11,12, Observed T2 = 78 Are the teams significantly different? Total sum of ranks = 13*14/2 = n1n2=3*10 = 30 Under the null hypothesis: expect U1 - U2 = 0 and U1 + U2 = 30 (each should equal about 15 under the null) and U0 = 15    U1= – 13 = 23 U2= – 78 = 7  U0 = 7 Not quite statistically significant in U table…p=.1084 (see attached) x2 for two-tailed test

697 Example problem 2 A study was done to compare the Atkins Diet (low-carb) vs. Jenny Craig (low-cal, low-fat). The following weight changes were obtained; note they are very skewed because someone lost 100 pounds; the mean loss for Atkins is going to look higher because of the bozo, but does that mean the diet is better overall? Conduct a Mann-Whitney U test to compare ranks. Atkins Jenny Craig -100 -11 -8 -15 -4 -5 +5 +6 +8 -20 +2

698 Answer Atkins Jenny Craig 1 4 5 3 7 6 9 10 11 2 8
Corresponding Ranks (lower is more weight loss!): Answer Atkins Jenny Craig 1 4 5 3 7 6 9 10 11 2 8 Sum of ranks for JC = 25 (n=5) Sum of ranks for Atkins=41 (n=6) n1n2=5*6 = 30 under the null hypothesis: expect U1 - U2 = 0 and U1 + U2 = 30 and U0 = 15    U1= – 25 = 20 U2= – 41 = 10 U0 = 10; n1=5, n2=6 Go to Mann-Whitney chart….p=.2143x 2 = .42

699 Introduction to sample size and power calculations
How much chance do we have to reject the null hypothesis when the alternative is in fact true? (what’s the probability of detecting a real effect?)

700 Can we quantify how much power we have for given sample sizes?

701 For 5% significance level, one-tail area=2.5%
study 1: 263 cases, 1241 controls Rejection region. Any value >= 6.5 (0+3.3*1.96) Null Distribution: difference=0. For 5% significance level, one-tail area=2.5% (Z/2 = 1.96) Power= chance of being in the rejection region if the alternative is true=area to the right of this line (in yellow) Clinically relevant alternative: difference=10%.

702 study 1: 263 cases, 1241 controls Rejection region. Any value >= 6.5 (0+3.3*1.96) Power= chance of being in the rejection region if the alternative is true=area to the right of this line (in yellow) Power here:

703 study 1: 50 cases, 50 controls Critical value= 0+10*1.96=20 2.5% area
Z/2=1.96 Power closer to 15% now.

704 Study 2: 18 treated, 72 controls, STD DEV = 2
Critical value= *1.96 = 1 Clinically relevant alternative: difference=4 points Power is nearly 100%!

705 Study 2: 18 treated, 72 controls, STD DEV=10
Critical value= *1.96 = 5 Power is about 40%

706 Study 2: 18 treated, 72 controls, effect size=1.0
Critical value= *1.96 = 1 Power is about 50% Clinically relevant alternative: difference=1 point

707 Factors Affecting Power
1. Size of the effect 2. Standard deviation of the characteristic 3. Bigger sample size 4. Significance level desired It turns out that if you were to go out and sample many, many times, most sample statistics that you could calculate would follow a normal distribution. What are the 2 parameters (from last time) that define any normal distribution? Remember that a normal curve is characterized by two parameters, a mean and a variability (SD) What do you think the mean value of a sample statistic would be? The standard deviation? Remember standard deviation is natural variability of the population Standard error can be standard error of the mean or standard error of the odds ratio or standard error of the difference of 2 means, etc. The standard error of any sample statistic. 707

708 1. Bigger difference from the null mean
average weight from samples of 100 Null Clinically relevant alternative

709 2. Bigger standard deviation
average weight from samples of 100

710 3. Bigger Sample Size average weight from samples of 100

711 4. Higher significance level
Rejection region. average weight from samples of 100

712 Sample size calculations
Based on these elements, you can write a formal mathematical equation that relates power, sample size, effect size, standard deviation, and significance level… **WE WILL DERIVE THESE FORMULAS FORMALLY SHORTLY**

713 Simple formula for difference in means
Represents the desired power (typically .84 for 80% power). Sample size in each group (assumes equal sized groups) Standard deviation of the outcome variable Represents the desired level of statistical significance (typically 1.96). Effect Size (the difference in means)

714 Simple formula for difference in proportions
Represents the desired power (typically .84 for 80% power). Sample size in each group (assumes equal sized groups) A measure of variability (similar to standard deviation) Represents the desired level of statistical significance (typically 1.96). Effect Size (the difference in proportions)

715 Derivation of sample size formula….

716 Study 2: 18 treated, 72 controls, effect size=1.0
Critical value= 0+.52*1.96=1 Power close to 50%

717 SAMPLE SIZE AND POWER FORMULAS
Critical value= 0+standard error (difference)*Z/2 Power= area to right of Z=

718 Power= area to right of Z=
Power is the area to the right of Z. OR power is the area to the left of - Z. Since normal charts give us the area to the left by convention, we need to use - Z to get the correct value. Most textbooks just call this “Z”; I’ll use the term Zpower to avoid confusion.

719 All-purpose power formula…

720 Derivation of a sample size formula…
Sample size is embedded in the standard error….

721 Algebra…

722

723 Sample size formula for difference in means

724 Examples Example 1: You want to calculate how much power you will have to see a difference of 3.0 IQ points between two groups: 30 male doctors and 30 female doctors. If you expect the standard deviation to be about 10 on an IQ test for both groups, then the standard error for the difference will be about: = 2.57

725 Power formula… P(Z≤ -.79) =.21; only 21% power to see a difference of 3 IQ points.

726 Example 2: How many people would you need to sample in each group to achieve power of 80% (corresponds to Z=.84) 174/group; 348 altogether

727 Sample Size needed for comparing two proportions:
Example: I am going to run a case-control study to determine if pancreatic cancer is linked to drinking coffee. If I want 80% power to detect a 10% difference in the proportion of coffee drinkers among cases vs. controls (if coffee drinking and pancreatic cancer are linked, we would expect that a higher proportion of cases would be coffee drinkers than controls), how many cases and controls should I sample? About half the population drinks coffee.

728 Derivation of a sample size formula:
The standard error of the difference of two proportions is:

729 Derivation of a sample size formula:
Here, if we assume equal sample size and that, under the null hypothesis proportions of coffee drinkers is .5 in both cases and controls, then s.e.(diff)=

730

731 For 80% power… There is 80% area to the left of a Z-score of .84 on a standard normal curve; therefore, there is 80% area to the right of -.84. Would take 392 cases and 392 controls to have 80% power! Total=784

732 Question 2: How many total cases and controls would I have to sample to get 80% power for the same study, if I sample 2 controls for every case? Ask yourself, what changes here?

733 Different size groups…
Need: 294 cases and 2x294=588 controls total. Note: you get the best power for the lowest sample size if you keep both groups equal (882 > 784). You would only want to make groups unequal if there was an obvious difference in the cost or ease of collecting data on one group. E.g., cases of pancreatic cancer are rare and take time to find.

734 General sample size formula

735 General sample size needs when outcome is binary:

736 Compare with when outcome is continuous:

737 Question How many subjects would we need to sample to have 80% power to detect an average increase in MCAT biology score of 1 point, if the average change without instruction (just due to chance) is plus or minus 3 points (=standard deviation of change)?

738 Standard error here=

739 Where D=change from test 1 to test 2. (difference)
Therefore, need: (9)( )2/1 = 70 people total

740 Sample size for paired data:

741 Paired data difference in proportion: sample size:

742 More than two groups: ANOVA and Chi-square

743 First, recent news… RESEARCHERS FOUND A NINE-FOLD INCREASE IN THE RISK OF DEVELOPING PARKINSON'S IN INDIVIDUALS EXPOSED IN THE WORKPLACE TO CERTAIN SOLVENTS…

744 The data… Table 3. Solvent Exposure Frequencies and Adjusted Pairwise Odds Ratios in PD–Discordant Twins, n = 99 Pairsa

745 Which statistical test?
Outcome Variable Are the observations correlated? Alternative to the chi-square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate- adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).

746 Comparing more than two groups…

747 Continuous outcome (means)
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

748 ANOVA example Mean micronutrient intake from the school lunch by school S1a, n=28 S2b, n=25 S3c, n=21 P-valued Calcium (mg) Mean 117.8 158.7 206.5 0.000 SDe 62.4 70.5 86.2 Iron (mg) 2.0 0.854 SD 0.6 Folate (μg) 26.6 38.7 42.6 13.1 14.5 15.1 Zinc (mg) 1.9 1.5 1.3 0.055 1.0 1.2 0.4 a School 1 (most deprived; 40% subsidized lunches). b School 2 (medium deprived; <10% subsidized). c School 3 (least deprived; no subsidization, private school). d ANOVA; significant differences are highlighted in bold (P<0.05). FROM: Gould R, Russell J, Barker ME. School lunch menus and 11 to 12 year old children's food choice in three secondary schools in England-are the nutritional standards being met? Appetite Jan;46(1):86-92.

749 ANOVA (ANalysis Of VAriance)
Idea: For two or more groups, test difference between means, for quantitative normally distributed variables. Just an extension of the t-test (an ANOVA with only two groups is mathematically equivalent to a t- test).

750 One-Way Analysis of Variance
Assumptions, same as ttest Normally distributed outcome Equal variances between the groups Groups are independent

751 Hypotheses of One-Way ANOVA

752 ANOVA It’s like this: If I have three groups to compare:
I could do three pair-wise ttests, but this would increase my type I error So, instead I want to look at the pairwise differences “all at once.” To do this, I can recognize that variance is a statistic that let’s me look at more than one difference at a time…

753 The “F-test” Is the difference in the means of the groups more
than background noise (=variability within groups)? Summarizes the mean differences between all groups at once. Analogous to pooled variance from a ttest. Recall, we have already used an “F-test” to check for equality of variances If F>>1 (indicating unequal variances), use unpooled variance in a t-test.

754 The F-distribution The F-distribution is a continuous probability distribution that depends on two parameters n and m (numerator and denominator degrees of freedom, respectively):

755 The F-distribution A ratio of variances follows an F- distribution:
The F-test tests the hypothesis that two variances are equal. F will be close to 1 if sample variances are equal.

756 How to calculate ANOVA’s by hand…
Treatment 1 Treatment 2 Treatment 3 Treatment 4 y11 y21 y31 y41 y12 y22 y32 y42 y13 y23 y33 y43 y14 y24 y34 y44 y15 y25 y35 y45 y16 y26 y36 y46 y17 y27 y37 y47 y18 y28 y38 y48 y19 y29 y39 y49 y110 y210 y310 y410 n=10 obs./group k=4 groups The group means The (within) group variances

757 Sum of Squares Within (SSW), or Sum of Squares Error (SSE)
The (within) group variances + Sum of Squares Within (SSW) (or SSE, for chance error)

758 Sum of Squares Between (SSB), or Sum of Squares Regression (SSR)
Overall mean of all 40 observations (“grand mean”) Sum of Squares Between (SSB). Variability of the group means compared to the grand mean (the variability due to the treatment).

759 Total Sum of Squares (SST)
Total sum of squares(TSS). Squared difference of every observation from the overall mean. (numerator of variance of Y!)

760 Partitioning of Variance
= + SSW + SSB = TSS

761 (n individuals per group)
ANOVA Table Between (k groups) k-1 SSB (sum of squared deviations of group means from grand mean) SSB/k-1 Go to Fk-1,nk-k chart Total variation nk-1 TSS (sum of squared deviations of observations from grand mean)   Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value Within (n individuals per group) nk-k SSW (sum of squared deviations of observations from their group mean) s2=SSW/nk-k TSS=SSB + SSW

762 (squared difference in means multiplied by n)
ANOVA=t-test Between (2 groups) 1 SSB (squared difference in means multiplied by n) Squared difference in means times n Go to F1, 2n-2 Chart notice values are just (t 2n-2)2 Total variation 2n-1 TSS Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value Within 2n-2 SSW equivalent to numerator of pooled variance Pooled variance

763 Example Treatment 1 Treatment 2 Treatment 3 Treatment 4 60 inches 50
48 47 67 52 49 42 43 54 55 56 68 62 59 61 65 64 60 72 63 71

764 Example Step 1) calculate the sum of squares between groups:
Mean for group 1 = 62.0 Mean for group 2 = 59.7 Mean for group 3 = 56.3 Mean for group 4 = 61.4 Grand mean= Treatment 1 Treatment 2 Treatment 3 Treatment 4 60 inches 50 48 47 67 52 49 42 43 54 55 56 68 62 59 61 65 64 60 72 63 71 SSB = [( )2 + ( )2 + ( )2 + ( )2 ] xn per group= 19.65x10 = 196.5

765 Example Step 2) calculate the sum of squares within groups:
(60-62) 2+(67-62) 2+ (42-62) 2+ (67- 62) 2+ (56-62) 2+ (62-62) 2+ (64-62) 2+ (59-62) 2+ (72- 62) 2+ (71-62) 2+ ( ) 2+ ( ) 2+ ( ) ) 2+ ( ) 2+ ( ) 2…+….(sum of 40 squared deviations) = Treatment 1 Treatment 2 Treatment 3 Treatment 4 60 inches 50 48 47 67 52 49 42 43 54 55 56 68 62 59 61 65 64 60 72 63 71

766 Step 3) Fill in the ANOVA table
Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value Between Within Total 3 196.5 65.5 1.14 .344 36 2060.6 57.2 39 2257.1

767 Step 3) Fill in the ANOVA table
Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value Between Within Total 3 196.5 65.5 1.14 .344 36 2060.6 57.2 39 2257.1 INTERPRETATION of ANOVA: How much of the variance in height is explained by treatment group? R2=“Coefficient of Determination” = SSB/TSS = /2275.1=9%

768 Coefficient of Determination
The amount of variation in the outcome variable (dependent variable) that is explained by the predictor (independent variable).

769 Beyond one-way ANOVA Often, you may want to test more than 1 treatment. ANOVA can accommodate more than 1 treatment or factor, so long as they are independent. Again, the variation partitions beautifully! TSS = SSB1 + SSB2 + SSW

770 ANOVA example Table 6. Mean micronutrient intake from the school lunch by school S1a, n=25 S2b, n=25 S3c, n=25 P-valued Calcium (mg) Mean 117.8 158.7 206.5 0.000 SDe 62.4 70.5 86.2 Iron (mg) 2.0 0.854 SD 0.6 Folate (μg) 26.6 38.7 42.6 13.1 14.5 15.1 Zinc (mg) 1.9 1.5 1.3 0.055 1.0 1.2 0.4 a School 1 (most deprived; 40% subsidized lunches). b School 2 (medium deprived; <10% subsidized). c School 3 (least deprived; no subsidization, private school). d ANOVA; significant differences are highlighted in bold (P<0.05). FROM: Gould R, Russell J, Barker ME. School lunch menus and 11 to 12 year old children's food choice in three secondary schools in England-are the nutritional standards being met? Appetite Jan;46(1):86-92.

771 Answer Step 1) calculate the sum of squares between groups:
Mean for School 1 = 117.8 Mean for School 2 = 158.7 Mean for School 3 = 206.5 Grand mean: 161 SSB = [( )2 + ( )2 + ( )2] x25 per group= 98,113

772 Answer Step 2) calculate the sum of squares within groups:
S.D. for S1 = 62.4 S.D. for S2 = 70.5 S.D. for S3 = 86.2 Therefore, sum of squares within is: (24)[ ]=391,066

773 Answer Step 3) Fill in your ANOVA table **R2=98113/489179=20%
Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value Between 2 98,113 49056 9 <.05 Within 72 391,066 5431 Total 74 489,179 **R2=98113/489179=20% School explains 20% of the variance in lunchtime calcium intake in these kids.

774 ANOVA summary A statistically significant ANOVA (F-test) only tells you that at least two of the groups differ, but not which ones differ. Determining which groups differ (when it’s unclear) requires more sophisticated analyses to correct for the problem of multiple comparisons…

775 Question: Why not just do 3 pairwise ttests?
Answer: because, at an error rate of 5% each test, this means you have an overall chance of up to 1-(.95)3= 14% of making a type-I error (if all 3 comparisons were independent)  If you wanted to compare 6 groups, you’d have to do 6C2 = 15 pairwise ttests; which would give you a high chance of finding something significant just by chance (if all tests were independent with a type-I error rate of 5% each); probability of at least one type-I error = 1-(.95)15=54%.

776 Recall: Multiple comparisons

777 Correction for multiple comparisons
How to correct for multiple comparisons post-hoc… Bonferroni correction (adjusts p by most conservative amount; assuming all tests independent, divide p by the number of tests) Tukey (adjusts p) Scheffe (adjusts p) Holm/Hochberg (gives p-cutoff beyond which not significant)

778 Procedures for Post Hoc Comparisons
    If your ANOVA test identifies a difference between group means, then you must identify which of your k groups differ. If you did not specify the comparisons of interest (“contrasts”) ahead of time, then you have to pay a price for making all kCr pairwise comparisons to keep overall type-I error rate to α. Alternately, run a limited number of planned comparisons (making only those comparisons that are most important to your research question). (Limits the number of tests you make).

779 1. Bonferroni For example, to make a Bonferroni correction, divide your desired alpha cut-off level (usually .05) by the number of comparisons you are making. Assumes complete independence between comparisons, which is way too conservative. Obtained P-value Original Alpha # tests New Alpha Significant?  .001 .05 5 .010 Yes .011 4 .013 .019 3 .017 No .032 2 .025 .048 1 .050

780 2/3. Tukey and Sheffé Both methods increase your p-values to account for the fact that you’ve done multiple comparisons, but are less conservative than Bonferroni (let computer calculate for you!). SAS options in PROC GLM: adjust=tukey adjust=scheffe

781 4/5. Holm and Hochberg Arrange all the resulting p-values (from the T=kCr pairwise comparisons) in order from smallest (most significant) to largest: p1 to pT

782 Holm Start with p1, and compare to Bonferroni p (=α/T).
If p1< α/T, then p1 is significant and continue to step 2. If not, then we have no significant p-values and stop here. If p2< α/(T-1), then p2 is significant and continue to step. If not, then p2 thru pT are not significant and stop here. If p3< α/(T-2), then p3 is significant and continue to step If not, then p3 thru pT are not significant and stop here. Repeat the pattern…

783 Hochberg Start with largest (least significant) p-value, pT, and compare to α. If it’s significant, so are all the remaining p-values and stop here. If it’s not significant then go to step 2. If pT-1< α/(T-1), then pT-1 is significant, as are all remaining smaller p-vales and stop here. If not, then pT-1 is not significant and go to step 3. Repeat the pattern… Note: Holm and Hochberg should give you the same results. Use Holm if you anticipate few significant comparisons; use Hochberg if you anticipate many significant comparisons.

784 Practice Problem A large randomized trial compared an experimental drug and 9 other standard drugs for treating motion sickness. An ANOVA test revealed significant differences between the groups. The investigators wanted to know if the experimental drug (“drug 1”) beat any of the standard drugs in reducing total minutes of nausea, and, if so, which ones. The p-values from the pairwise ttests (comparing drug 1 with drugs 2-10) are below. a. Which differences would be considered statistically significant using a Bonferroni correction? A Holm correction? A Hochberg correction? Drug 1 vs. drug … 2 3 4 5 6 7 8 9 10 p-value .05 .3 .25 .04 .001 .006 .08 .002 .01

785 Answer Bonferroni makes new α value = α/9 = .05/9 =.0056; therefore, using Bonferroni, the new drug is only significantly different than standard drugs 6 and 9. Arrange p-values: 6 9 7 10 5 2 8 4 3 .001 .002 .006 .01 .04 .05 .08 .25 .3 Holm: .001<.0056; .002<.05/8=.00625; .006<.05/7=.007; .01>.05/6=.0083; therefore, new drug only significantly different than standard drugs 6, 9, and 7. Hochberg: .3>.05; .25>.05/2; .08>.05/3; .05>.05/4; .04>.05/5; .01>.05/6; .006<.05/7; therefore, drugs 7, 9, and 6 are significantly different.

786 Practice problem b. Your patient is taking one of the standard drugs that was shown to be statistically less effective in minimizing motion sickness (i.e., significant p-value for the comparison with the experimental drug). Assuming that none of these drugs have side effects but that the experimental drug is slightly more costly than your patient’s current drug-of-choice, what (if any) other information would you want to know before you start recommending that patients switch to the new drug?

787 Answer The magnitude of the reduction in minutes of nausea.
If large enough sample size, a 1-minute difference could be statistically significant, but it’s obviously not clinically meaningful and you probably wouldn’t recommend a switch.

788 Continuous outcome (means)
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

789 Non-parametric ANOVA Proc NPAR1WAY in SAS Kruskal-Wallis one-way ANOVA
(just an extension of the Wilcoxon Sum-Rank (Mann Whitney U) test for 2 groups; based on ranks) Proc NPAR1WAY in SAS

790 Binary or categorical outcomes (proportions)
Outcome Variable Are the observations correlated? Alternative to the chi-square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate- adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).

791 Chi-square test for comparing proportions (of a categorical variable) between >2 groups
I. Chi-Square Test of Independence When both your predictor and outcome variables are categorical, they may be cross-classified in a contingency table and compared using a chi-square test of independence. A contingency table with R rows and C columns is an R x C contingency table.

792 Example Asch, S.E. (1955). Opinions and social pressure. Scientific American, 193,

793 The Experiment A Subject volunteers to participate in a “visual perception study.” Everyone else in the room is actually a conspirator in the study (unbeknownst to the Subject). The “experimenter” reveals a pair of cards…

794 The Task Cards Standard line Comparison lines A, B, and C

795 The Experiment Everyone goes around the room and says which comparison line (A, B, or C) is correct; the true Subject always answers last – after hearing all the others’ answers. The first few times, the 7 “conspirators” give the correct answer. Then, they start purposely giving the (obviously) wrong answer. 75% of Subjects tested went along with the group’s consensus at least once.

796 Further Results In a further experiment, group size (number of conspirators) was altered from 2-10. Does the group size alter the proportion of subjects who conform?

797 Number of group members?
The Chi-Square test Conformed? Number of group members? 2 4 6 8 10 Yes 20 50 75 60 30 No 80 25 40 70 Apparently, conformity less likely when less or more group members…

798 = 235 conformed out of 500 experiments. Overall likelihood of conforming = 235/500 = .47

799 Calculating the expected, in general
Null hypothesis: variables are independent Recall that under independence: P(A)*P(B)=P(A&B) Therefore, calculate the marginal probability of B and the marginal probability of A. Multiply P(A)*P(B)*N to get the expected cell count.

800 Number of group members?
Expected frequencies if no association between group size and conformity… Conformed? Number of group members? 2 4 6 8 10 Yes 47 No 53

801 Do observed and expected differ more than expected due to chance?
Do observed and expected differ more than expected due to chance?

802 Chi-Square test Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(5-1)=4

803 The Chi-Square distribution: is sum of squared normal deviates
The expected value and variance of a chi-square: E(x)=df Var(x)=2(df)

804 Chi-Square test Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(5-1)=4 Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom, indicates statistical significance. Here 85>>4.

805 Chi-square example: recall data…
Brain tumor No brain tumor Own a cell phone 5 347 352 Don’t own a cell phone 3 88 91 8 435 453

806 Same data, but use Chi-square test
Brain tumor No brain tumor Own 5 347 352 Don’t own 3 88 91 8 435 453 Expected value in cell c= 1.7, so technically should use a Fisher’s exact here! Next term…

807 Caveat **When the sample size is very small in any cell (expected value<5), Fisher’s exact test is used as an alternative to the chi-square test.

808 Binary or categorical outcomes (proportions)
Outcome Variable Are the observations correlated? Alternative to the chi-square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate- adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (np <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (np <5).

809 Linear correlation and linear regression

810 Continuous outcome (means)
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

811 Recall: Covariance

812 Interpreting Covariance
cov(X,Y) > X and Y are positively correlated cov(X,Y) < X and Y are inversely correlated cov(X,Y) = X and Y are independent

813 Correlation coefficient
Pearson’s Correlation Coefficient is standardized covariance (unitless):

814 Correlation Measures the relative strength of the linear relationship between two variables Unit-less Ranges between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship

815 Scatter Plots of Data with Various Correlation Coefficients
Y Y Y X X X r = -1 r = -.6 r = 0 Y Y Y X X X r = +1 r = +.3 r = 0 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

816 Linear Correlation Linear relationships Curvilinear relationships Y Y
X X Y Y X X Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

817 Linear Correlation Strong relationships Weak relationships Y Y X X Y Y
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

818 Linear Correlation No relationship Y X Y X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

819 Calculating by hand…

820 Simpler calculation formula…
Numerator of covariance Numerators of variance

821 Distribution of the correlation coefficient:
The sample correlation coefficient follows a T-distribution with n-2 degrees of freedom (since you have to estimate the standard error). *note, like a proportion, the variance of the correlation coefficient depends on the correlation coefficient itselfsubstitute in estimated r

822 Continuous outcome (means)
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

823 Linear regression In correlation, the two variables are treated as equals. In regression, one variable is considered independent (=predictor) variable (X) and the other the dependent (=outcome) variable Y.

824 What is “Linear”? Remember this: Y=mX+B? m B

825 What’s Slope? A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.

826 Prediction If you know something about X, this knowledge helps you predict something about Y. (Sound familiar?…sound like conditional probabilities?)

827 Regression equation… Expected value of y at a given level of x=

828 Predicted value for an individual…
yi=  + *xi random errori Follows a normal distribution Fixed – exactl y on the line

829 Assumptions (or the fine print)
Linear regression assumes that… 1. The relationship between X and Y is linear 2. Y is distributed normally at each value of X 3. The variance of Y at every value of X is the same (homogeneity of variances) 4. The observations are independent

830 The standard error of Y given X is the average variability around the regression line at any given value of X. It is assumed to be equal at all values of X. Sy/ x Sy/ x

831 Regression Picture R2=SSreg/SSt otal C B A2 B2 C2 A y
yi x y *Least squares estimation gave us the line (β) that minimized C2 A B C2 SStotal Total squared distance of observations from naïve mean of y  Total variation SSreg Distance from regression line to naïve mean of y  Variability due to x (regression) SSresidual Variance around the regression line  Additional variability not explained by x—what least squares method aims to minimize R2=SSreg/SSt otal

832 Recall example: cognitive function and vitamin D
Hypothetical data loosely based on [1]; cross-sectional study of 100 middle-aged and older European men. Cognitive function is measured by the Digit Symbol Substitution Test (DSST). 1. Lee DM, Tajar A, Ulubaev A, et al. Association between 25-hydroxyvitamin D levels and cognitive performance in middle-aged and older European men. J Neurol Neurosurg Psychiatry Jul;80(7):722-9.

833 Distribution of vitamin D
Mean= 63 nmol/L Standard deviation = 33 nmol/L 833

834 Distribution of DSST Normally distributed Mean = 28 points
Standard deviation = 10 points 834

835 Four hypothetical datasets
I generated four hypothetical datasets, with increasing TRUE slopes (between vit D and DSST): 0.5 points per 10 nmol/L 1.0 points per 10 nmol/L 1.5 points per 10 nmol/L

836 Dataset 1: no relationship

837 Dataset 2: weak relationship

838 Dataset 3: weak to moderate relationship

839 Dataset 4: moderate relationship

840 The “Best fit” line Regression equation:
E(Yi) = *vit Di (in 10 nmol/L)

841 The “Best fit” line Note how the line is a little deceptive; it draws your eye, making the relationship appear stronger than it really is! Regression equation: E(Yi) = *vit Di (in 10 nmol/L)

842 The “Best fit” line Regression equation:
E(Yi) = *vit Di (in 10 nmol/L)

843 The “Best fit” line Regression equation:
E(Yi) = *vit Di (in 10 nmol/L) Note: all the lines go through the point (63, 28)!

844 Estimating the intercept and slope: least squares estimation
A little calculus…. What are we trying to estimate? β, the slope, from What’s the constraint? We are trying to minimize the squared distance (hence the “least squares”) between the observations themselves and the predicted values , or (also called the “residuals”, or left-over unexplained variability) Differencei = yi – (βx + α) Differencei2 = (yi – (βx + α)) 2 Find the β that gives the minimum sum of the squared differences. How do you maximize a function? Take the derivative; set it equal to zero; and solve. Typical max/min problem from calculus…. From here takes a little math trickery to solve for β…

845 Resulting formulas… Slope (beta coefficient) = Intercept=
Regression line always goes through the point:

846 Relationship with correlation
In correlation, the two variables are treated as equals. In regression, one variable is considered independent (=predictor) variable (X) and the other the dependent (=outcome) variable Y.

847 Example: dataset 4 SDx = 33 nmol/L SDy= 10 points
Cov(X,Y) = 163 points*nmol/L Beta = 163/332 = 0.15 points per nmol/L = 1.5 points per 10 nmol/L r = 163/(10*33) = 0.49 Or r = 0.15 * (33/10) = 0.49

848 Significance testing…
Slope Distribution of slope ~ Tn-2(β,s.e.( )) H0: β1 = 0 (no linear relationship) H1: β1 0 (linear relationship does exist) Tn-2=

849 Formula for the standard error of beta (you will not have to calculate by hand!):

850 Example: dataset 4 Standard error (beta) = 0.03
T98 = 0.15/0.03 = 5, p<.0001 95% Confidence interval = 0.09 to 0.21

851 Residual Analysis: check assumptions
The residual for observation i, ei, is the difference between its observed and predicted value Check the assumptions of regression by examining the residuals Examine for linearity assumption Examine for constant variance for all levels of X (homoscedasticity) Evaluate normal distribution assumption Evaluate independence assumption Graphical Analysis of Residuals Can plot residuals vs. X

852 Predicted values… For Vitamin D = 95 nmol/L (or 9.5 in 10 nmol/L):

853 Residual = observed - predicted
X=95 nmol/L 34

854  Residual Analysis for Linearity Y Y x x x x Not Linear Linear
residuals residuals Not Linear Linear Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

855  Residual Analysis for Homoscedasticity Y Y x x x x Constant variance
residuals residuals Constant variance Non-constant variance Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

856 Residual Analysis for Independence
Not Independent Independent X residuals X residuals X residuals Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall 856

857 Residual plot, dataset 4

858 Multiple linear regression…
What if age is a confounder here? Older men have lower vitamin D Older men have poorer cognition “Adjust” for age by putting age in the model: DSST score = intercept + slope1xvitamin D + slope2 xage

859 2 predictors: age and vit D…

860 Different 3D view…

861 Fit a plane rather than a line…
On the plane, the slope for vitamin D is the same at every age; thus, the slope for vitamin D represents the effect of vitamin D when age is held constant.

862 Equation of the “Best fit” plane…
DSST score = xvitamin D (in 10 nmol/L) xage (in years) P-value for vitamin D >>.05 P-value for age <.0001 Thus, relationship with vitamin D was due to confounding by age!

863 Multiple Linear Regression
More than one predictor… E(y)=  + 1*X + 2 *W + 3 *Z… Each regression coefficient is the amount of change in the outcome variable that would be expected per one-unit change of the predictor, if all other variables in the model were held constant.

864 Functions of multivariate analysis:
Control for confounders Test for interactions between predictors (effect modification) Improve predictions

865 A ttest is linear regression!
Divide vitamin D into two groups: Insufficient vitamin D (<50 nmol/L) Sufficient vitamin D (>=50 nmol/L), reference group We can evaluate these data with a ttest or a linear regression…

866 As a linear regression…
Intercept represents the mean value in the sufficient group. Slope represents the difference in means between the groups. Difference is significant. Parameter ````````````````Standard Variable Estimate Error t Value Pr > |t| Intercept <.0001 insuff

867 ANOVA is linear regression!
Divide vitamin D into three groups: Deficient (<25 nmol/L) Insufficient (>=25 and <50 nmol/L) Sufficient (>=50 nmol/L), reference group DSST=  (=value for sufficient) + insufficient*(1 if insufficient) + 2 *(1 if deficient) This is called “dummy coding”—where multiple binary variables are created to represent being in each category (or not) of a categorical variable

868 The picture… Sufficient vs. Insufficient Sufficient vs. Deficient

869 Results… Interpretation:
Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 deficient insufficient Interpretation: The deficient group has a mean DSST points lower than the reference (sufficient) group. The insufficient group has a mean DSST points lower than the reference (sufficient) group.

870 Other types of multivariate regression
Multiple linear regression is for normally distributed outcomes Logistic regression is for binary outcomes Cox proportional hazards regression is used when time-to-event is the outcome

871 Common multivariate regression models.
Outcome (dependent variable) Example outcome variable Appropriate multivariate regression model Example equation What do the coefficients give you? Continuous Blood pressure Linear regression blood pressure (mmHg) =  + salt*salt consumption (tsp/day) + age*age (years) + smoker*ever smoker (yes=1/no=0) slopes—tells you how much the outcome variable increases for every 1-unit increase in each predictor. Binary High blood pressure (yes/no) Logistic regression ln (odds of high blood pressure) = odds ratios—tells you how much the odds of the outcome increase for every 1-unit increase in each predictor. Time-to-event Time-to- death Cox regression ln (rate of death) = hazard ratios—tells you how much the rate of the outcome increases for every 1-unit increase in each predictor.

872 Multivariate regression pitfalls
Multi-collinearity Residual confounding Overfitting

873 Multicollinearity Multicollinearity arises when two variables that measure the same thing or similar things (e.g., weight and BMI) are both included in a multiple regression model; they will, in effect, cancel each other out and generally destroy your model.   Model building and diagnostics are tricky business!

874 Residual confounding You cannot completely wipe out confounding simply by adjusting for variables in multiple regression unless variables are measured with zero error (which is usually impossible). Example: meat eating and mortality

875 Men who eat a lot of meat are unhealthier for many reasons!
Sinha R, Cross AJ, Graubard BI, Leitzmann MF, Schatzkin A. Meat intake and mortality: a prospective study of over half a million people. Arch Intern Med 2009;169:562-71

876 Mortality risks… Sinha R, Cross AJ, Graubard BI, Leitzmann MF, Schatzkin A. Meat intake and mortality: a prospective study of over half a million people. Arch Intern Med 2009;169:562-71

877 Overfitting In multivariate modeling, you can get highly significant but meaningless results if you put too many predictors in the model. The model is fit perfectly to the quirks of your particular sample, but has no predictive ability in a new sample.

878 Overfitting: class data example
I asked SAS to automatically find predictors of optimism in our class dataset. Here’s the resulting linear regression model: Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept exercise sleep obama <.0001 Clinton mathLove Exercise, sleep, and high ratings for Clinton are negatively related to optimism (highly significant!) and high ratings for Obama and high love of math are positively related to optimism (highly significant!).

879 If something seems to good to be true…
Clinton, univariate: Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept Clinton Clinton Sleep, Univariate: Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept sleep sleep Exercise, Univariate: Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept <.0001 exercise exercise

880 More univariate models…
Obama, Univariate: Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept obama obama Compare with multivari ate result; p<.0001 Love of Math, univariate: Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept mathLove mathLove Compare with multivari ate result; p=.0011

881 Overfitting Rule of thumb: You need at least 10 subjects for each additional predictor variable in the multivariate regression model. Pure noise variables still produce good R2 values if the model is overfitted. The distribution of R2 values from a series of simulated regression models containing only noise variables. (Figure 1 from: Babyak, MA. What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosomatic Medicine 66: (2004).)

882 Continuous predictors
Review of statistical tests The following table gives the appropriate choice of a statistical test or measure of association for various types of data (outcome variables and predictor variables) by study design. e.g., blood pressure= pounds + age + treatment (1/0) Continuous outcome Binary predictor Continuous predictors

883 Wilcoxon rank-sum test
Statistical procedure or measure of association Types of variables to be analyzed Outcome variable Predictor variable/s Cross-sectional/case-control studies Binary (two groups) Continuous T-test Binary Ranks/ordinal Wilcoxon rank-sum test Categorical (>2 groups) Continuous ANOVA Continuous Simple linear regression Multivariate (categorical and continuous) Continuous Multiple linear regression Categorical Chi-square test (or Fisher’s exact) Binary Odds ratio, risk ratio Cohort Studies/Clinical Trials Multivariate Binary Logistic regression Binary Risk ratio Categorical Time-to-event Kaplan-Meier/ log-rank test Multivariate Time-to-event Cox-proportional hazards regression, hazard ratio Categorical Continuous Repeated measures ANOVA Multivariate Continuous Mixed models; GEE modeling

884 Alternative summary: statistics for various types of outcome data
Outcome Variable Are the observations independent or correlated? Assumption s independent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Difference in proportions Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Chi-square test assumes sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups

885 Continuous outcome (means); HRP 259/HRP 262
Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non- parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

886 Binary or categorical outcomes (proportions); HRP 259/HRP 261
Outcome Variable Are the observations correlated? Alternative to the chi-square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate- adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).

887 Time-to-event outcome (survival data); HRP 262
Outcome Variable Are the observation groups independent or correlated? Modifications to Cox regression if proportional- hazards is violated: independent correlated Time-to- event (e.g., time to fracture) Kaplan-Meier statistics: estimates survival functions for each group (usually displayed graphically); compares survival functions with log-rank test Cox regression: Multivariate technique for time-to-event data; gives multivariate-adjusted hazard ratios n/a (already over time) Time-dependent predictors or time-dependent hazard ratios (tricky!)


Download ppt "Descriptive Statistics"

Similar presentations


Ads by Google