Basic statistics European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,

Basic statistics European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham, tim.massingham@ebi.ac.uk

Introduction Basic statistics What is a statistical test Count data and Simpson’s paradox A Lady drinking tea and statistical power Thailand HIV vaccine trial and missing data Correlation Nonparametric statistics Robustness and efficiency Paired data Grouped data Multiple testing adjustments Family Wise Error Rate and simple corrections More powerful corrections False Discovery Rate

What is a statistic Anything that can be measured Calculated from measurements Branches of statistics Frequentist Neo-Fisherian Bayesian Lies Damn lies (and each of these can be split further) “If your experiment needs statistics, you ought to have done a better experiment.” Ernest Rutherford “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” H.G.Wells “He uses statistics as a drunken man uses lamp-posts, for support rather than illumination.” Andrew Lang (+others) Classical statistics

Repeating the experiment 1,000 times 10,000 times100 times Initial experiment Imagine repeating experiment, some variation in statistic

Frequentist inference Repeated experiments at heart of frequentist thinking Have a “null hypothesis” Null distribution What distribution of statistic would look like if we could repeatedly sample LikelyUnlikely Compare actual statistic to null distribution Classical statistics Finding statistics for which the null distribution is known

Anatomy of a Statistical Test Value of statistic Density of null distribution P-value of test Probability of observing statistic or something more extreme Equal to area under the curve Density measures relative probability Total area under curve equals exactly one

Some antiquated jargon Critical values Dates back to when we used tables Old way Calculate statistic Look up critical values for test Report “significant at 99% level” Or “rejected null hypothesis at ….” Size0.050.010.0010.0001 Critical value1.642.333.093.72 Pre-calculated values (standard normal distribution)

Power Probability of correctly rejecting null hypothesis (for a given size) Null distributionAn alternative distribution Power changes with size Reject nullAccept null

Confidence intervals Use same construction to generate confidence intervals Confidence interval = region which excludes unlikely values For the null distribution, the “confidence interval” is the region which we accept the null hypothesis. The tails where we reject the null hypothesis are the critical region

Count data

(Almost) the simplest form of data we can work with Each experiment gives us a discrete outcome Have some “null” hypothesis about what to expect YellowBlackRedGreenOrange Observed1927152217 Expected20 Example: Are all jelly babies equally liked by PhD students?

Chi-squared goodness of fit test YellowBlackRedGreenOrange Observed1927152217 Expected20 Summarize results and expectations in a table jelly_baby <- c( 19, 27, 15, 22, 17) expected_jelly <- c(20,20,20,20,20) chisq.test( jelly_baby, p=expected_jelly, rescale.p=TRUE ) Chi-squared test for given probabilities data: jelly_baby X-squared = 4.4, df = 4, p-value = 0.3546 pchisq(4.4,4,lower.tail=FALSE) [1] 0.3546

Chi-squared goodness of fit test jelly_baby <- c( 19, 27, 15, 22, 17) expected_jelly <- c(20,20,20,20,20) chisq.test( jelly_baby, p=expected_jelly, rescale.p=TRUE ) Chi-squared test for given probabilities data: jelly_baby X-squared = 4.4, df = 4, p-value = 0.3546 What’s this? YellowBlackRedGreenOrange Observed1927152217 19271522? How much do we need to know to reconstruct table? Number of samples Any four of the observations or equivalent, ratios for example

More complex models Specifying the null hypothesis entirely in advance is very restrictive YellowBlackRedGreenOrange Observed1927152217 Expected20 4 df 0 df Allowed expected models that have some features from data e.g. Red : Green ratio Each feature is one degree of freedom YellowBlackRedGreenOrange Observed1927152217 Expected20 16.223.820 16.2:23.8 = 15:2216.2+23.8 = 40 4 df 1 df

Example: Chargaff’s parity rule Chargaff’s 2 nd Parity Rule In a single strand of dsDNA, %A≈%T and %C≈%G ACGT 2021445130820412850292056142 Helicobacter pylori From data, %AT = 61% %CG=39% Apply Chargaff to get %{A,C,G,T} ACGT Proportion0.3050.194 0.305 Number20387941296616 2038794 Null hypothesis has one degree of freedom Alt. hypothesis has three degrees of freedom Difference: two degrees of freedom

Contingency tables PieFruit Custardac Ice-creambd Observe two variables in pairs - is there a relationship between them? Silly example: is there a relationship desserts and toppings? A test of row / column independence Real example McDonald-Kreitman test - Drosophila ADH locus mutations BetweenWithin Nonsynonymous72 Synonymous1742

Contingency tables A contingency table is a chi-squared test in disguise PieFruit Custard p Ice-cream (1-p) q(1-q)1 Null hypothesis: rows and columns are independent Multiply probabilities Pie & Custard Pie & Ice- cream Fruit & Custard Fruit & Ice- cream Observedabcd Expectedn p qn (1-p) qn p (1-q)n (1-p)(1-q) p q p (1-q) (1-p) q(1-p)(1-q) × n

Contingency tables Pie & Custard Pie & Ice- cream Fruit & Custard Fruit & Ice- cream Observedabcn - a - b - c Expectedn p qn (1-p) qn p (1-q)n (1-p)(1-q) Observed:three degrees of freedom(a, b & c) Expected:two degrees of freedom(p & q) In general, for a table with r rows and c columns Observed:r c - 1 degrees of freedom Expected:(r-1) + (c-1) degrees of freedom Difference:(r-1)(c-1)degrees of freedom Chi-squared test with one degree of freedom

Bisphenol A Bisphenol A is an environmental estrogen monomer Used to manufacture polycarbonate plastics lining for food cans dental sealants food packaging Many in vivo studies on whether safe: could polymer break down? Is the result of the study independent of who performed it? F vom Saal and C Hughes (2005) An Extensive New Literature Concerning Low-Dose Effects of Bisphenol A Shows the Need for a New Risk Assessment. Environmental Health Perspectives 113(8):928 HarmfulNon-harmful Government9410 Industry011

Bisphenol A HarmfulNon-harmful Government941090.4% Industry0119.6% 81.7%18.2%115 Observed table HarmfulNon-harmful Government85.019.090.4% Industry 9.0 2.09.6% 81.7%18.2% Expected table E.g. 0.817 × 0.904 ×115 = 85.0 Chi-squared statistic = 48.6. Test with 1 d.f.p-value = 3.205e-12

Bisphenol A Association measure Discovered that we have dependence How strong is it? Coefficient of association for 2×2 table Chi-squared statistic Number of observations Number between 0 = independent to 1 = complete dependence For the Bisphenol A study data Should test really be one degree of freedom? Reasonable to assume that government / industry randomly assigned? Perhaps null model only has one degree of freedom pchisq(48.5589, df=1, lower.tail=FALSE) [1] 3.205164e-12 pchisq(48.5589, df=2, lower.tail=FALSE) [1] 2.854755e-11

Simpson’s paradox Famous example of Simpson’s paradox C R Charig, D R Webb, S R Payne, and J E Wickham (1986) Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy. BMJ 292: 879–882 Compare two treatments for kidney stones open surgery percutaneous nephrolithotomy (surgery through a small puncture) SuccessFail open27377350 perc. neph.28961350 562138700 78% success 83% success Percutaneous nephrolithotomy appears better (but not significantly so, p-value 0.15)

Simpson’s paradox SuccessFail open81687 perc. neph.23436270 31542357 SuccessFail open19271263 perc. neph.552580 24796343 Small kidney stones Large kidney stones Missed a clinically relevant factor: the size of the stones The order of treatments is reversed 93% success 87% success p-value 0.15 73% success 69% success p-value 0.55 Combined Open78% Perc. neph.83%

Simpson’s paradox SuccessFail open81687 prec. neph.23436270 31542357 SuccessFail open19271263 prec. neph.552580 24796343 Small kidney stones Large kidney stones What’s happened? SmallLarge Failure of randomisation (actually an observational study) Small and large stones have a different prognosis (p-value<1.2e-7) 88% success total 72% success total Open Prec. neph 93% success 87% success 73% success 69% success

A Lady Tasting Tea

The Lady Tasting Tea “A LADY declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup … Our experiment consists in mixing eight cups of tea, four in one way and four in the other, and presenting them to the subject for judgment in a random order. The subject has been told in advance of what the test will consist, namely that she will be asked to taste eight cups, that these shall be four of each kind, and that they shall be presented to her in a random order.” Fisher, R. A. (1956) Mathematics of a Lady Tasting Tea Eight cups of tea Exactly four one way and four the other The subject knows there are four of each The order is randomised Guess teaGuess milk Tea first??4 Milk first??4 448

Fisher’s exact test Guess teaGuess milk Tea first??4 Milk first??4 448 Looks like a Chi-squared test But the experiment design fixes the marginal totals Eight cups of tea Exactly four one way and four the other The subject knows there are four of each The order is randomised Fisher’s exact test gives exact p-values with fixed marginal totals Often incorrectly used when marginal totals not known

Sided-ness Not interested if she can’t tell the difference Guess teaGuess milk Tea first404 Milk first044 448 Guess teaGuess milk Tea first044 Milk first404 448 Two possible ways of being significant, only interested in one Exactly right Exactly wrong

Sided-ness tea <- rbind( c(0,4), c(4,0) ) tea [,1] [,2] [1,] 0 4 [2,] 4 0 fisher.test(tea)$p.value [1] 0.02857143 fisher.test(tea,alternative="greater")$p.value [1] 1 fisher.test(tea,alternative="less")$p.value [1] 0.01428571 More correctMore wrong Only interested in significantly greater Just use area in one tail

Statistical power Are eight cups of tea enough? Guess teaGuess milk Tea first404 Milk first044 448 A perfect score, p-value 0.01429 Guess teaGuess milk Tea first314 Milk first134 448 Better than chance, p-value 0.2429

Statistical power Guess teaGuess milk Tea firstp n(1-p) nn Milk first(1-p) np nn nn2n Assume the lady correctly guesses proportion p of the time 468101214161820 70%0.390.310.250.200.170.140.120.110.09 80%0.250.160.110.070.050.040.030.020.01 90%0.120.060.020.010.0070.0030.0018e-54e-5 To investigate simulate 10,000 experiments calculate p-value for experiment take mean

Thailand HIV vaccine trials

Thailand HIV vaccine trial News story from end of September 2009 “Phase III HIV trial in Thailand shows positive outcome” 16,402 heterosexual volunteers tested every six months for three years Sero +veSero -ve Control518197 Vaccine748198 fisher.test(hiv) Fisher's Exact Test for Count Data data: hiv p-value = 0.04784 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.4721409 0.9994766 sample estimates: odds ratio 0.689292 16,395 Rerks-Ngarm, Pitisuttithum et al. (2009) New England Journal of Medicine 10.1056/NEJMoa0908492

Thailand HIV vaccine trial "Oh my God, it's amazing. " … "The significance that has been established in this trial is that there is a 5% chance that this is a fluke. So we are 95% certain that what we are seeing is real and not down to pure chance. And that's great." Significant? Would you publish? Should you publish? Study was randomized (double-blinded) Deals with many possible complications male / female balance between arms of trial high / low risk life styles volunteers who weren’t honest about their sexuality (or changed their mind mid-trial) genetic variability in population (e.g. CCL3L1) incorrect HIV test / samples mixed-up vaccine improperly administered

Thailand HIV vaccine trial "Oh my God, it's amazing. " … "The significance that has been established in this trial is that there is a 5% chance that this is a fluke. So we are 95% certain that what we are seeing is real and not down to pure chance. And that's great." Significant? Would you publish? Should you publish? Initially published data based on modified Intent To Treat Intent To Treat People count as soon as they are enrolled in trial modified Intent To Treat Excluded people found to sero +ve at beginning of trial Per-Protocol Only count people who completed course of vaccines A multiple testing issue? How many unsuccessful HIV vaccine trials have there been? One or more and these results are toast.

Missing data Some people go missing during / drop out of trials This could be informative E.g. someone finds out they have HIV from another source, stops attending check-ups Double-blinded trials help a lot Extensive follow-up, hospital records of death etc Missing Completely At Random Missing data completely unrelated to trial Missing At Random Missing data can be imputed Missing Not At Random Missing data informative about effects in trial 123456 Volunteer 1----?? Volunteer 2--++++

Correlation

Correlation? Wikimedia, public domain http://commons.wikimedia.org/wiki/File:Correlation_examples.png Various types of data and their correlation coefficients

Correlation does not imply causation If A and B are correlated then one or more of the following are true A causes B B causes A A and B have a common cause (might be surprising) Do pirates cause global warming? Pirates:http://commons.wikimedia.org/wiki/File:PiratesVsTemp_English.jpg R. Matthews (2001) Storks delivery Babies (p=0.008) Teaching Statistics 22(2):36-38

Pearson correlation coefficient Standard measure of correlation “Correlation coefficient” Measure of linear correlation, statistic belongs to [-1,+1] 0independent 1 perfect positive correlation -1 perfect negative correlation cor.test( gene1, gene2, method="pearson") Pearson's product-moment correlation data: gene1 and gene2 t = 851.4713, df = 22808, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9842311 0.9850229 sample estimates: cor 0.984632 Measure of correlation Roughly proportion of variance of gene1 explained by gene2 (or vice versa)

Pearson: when things go wrong Pearson’s correlation test Correlation = - 0.05076632(p-value = 0.02318) Correlation = -0.06499109 (p-value = 0.003632) Correlation = -0.1011426 (p-value = 5.81e-06) Correlation = 0.1204287 (p-value = 6.539e-08) A single observation can change the outcome of many tests Pearson correlation is sensitive to outliers 200 observations from normal distribution x ~ normal(0,1)y ~ normal(1,3)

Spearman’s test Nonparametric test for correlation Doesn’t assume data is normal Insensitive to outliers Coefficient has roughly same meaning Replace observations by their ranks Ranks Raw expression cor.test( gene1, gene2, method="spearman") Spearman's rank correlation rho data: gene1 and gene2 S = 30397631415, p-value < 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.984632

Comparison Look at gene expression data cor.test(lge1,lge2,method="pearson") Pearson's product-moment correlation data: lge1 and lge2 t = 573.0363, df = 22808, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9661278 0.9678138 sample estimates: cor 0.9669814 cor.test(lge1,lge2,method="spearman") Spearman's rank correlation rho data: lge1 and lge2 S = 30397631415, p-value < 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.984632 cor.test(lge1,lge2,method="kendall") Kendall's rank correlation tau data: lge1 and lge2 z = 203.0586, p-value < 2.2e-16 alternative hypothesis: true tau is not equal to 0 sample estimates: tau 0.8974576

Comparison Spearman and Kendall are scale invariant Log - Log scale Pearson0.97 Spearman0.98 Kendall0.90 Normal - Log scale Pearson0.56 Spearman0.98 Kendall0.90 Normal - Normal scale Pearson0.99 Spearman0.98 Kendall0.90

Basic statistics European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,

Similar presentations

Presentation on theme: "Basic statistics European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basic statistics European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,

Similar presentations

Presentation on theme: "Basic statistics European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,"— Presentation transcript:

Similar presentations

About project

Feedback