 # Intro to Statistics Part2 Arier Lee University of Auckland.

## Presentation on theme: "Intro to Statistics Part2 Arier Lee University of Auckland."— Presentation transcript:

Intro to Statistics Part2 Arier Lee University of Auckland

Standard error – the standard deviation of the sampling distribution of a statistic Standard error – the standard deviation of the sampling distribution of a statistic The standard deviation of the sample means is called the standard error of the mean and it measures how precisely the population mean is estimated by the sample mean The standard deviation of the sample means is called the standard error of the mean and it measures how precisely the population mean is estimated by the sample mean The standard error is a measure of the precision of the estimated mean whereas the standard deviation summarises the variability or the spread of the observations The standard error is a measure of the precision of the estimated mean whereas the standard deviation summarises the variability or the spread of the observations Standard error <= standard deviation Standard error <= standard deviation The larger the sample size the smaller the standard error The larger the sample size the smaller the standard error Standard error

A 95% confidence interval for a mean is calculated by A 95% confidence interval for a mean is calculated by (mean-1.96*SE, mean+1.96*SE) An example: In a sample of 2000 pregnant women, serum cholesterol was measured and it was found that the sample mean is 5.62 and SE=0.15. 95% confidence interval: An example: In a sample of 2000 pregnant women, serum cholesterol was measured and it was found that the sample mean is 5.62 and SE=0.15. 95% confidence interval: (5.33, 5.91) Confidence intervals

95% CI does not mean that there is a 95% chance that the true mean lies between 5.33 and 5.91 95% CI does not mean that there is a 95% chance that the true mean lies between 5.33 and 5.91 If we repeat the study over and over again, calculating a 95% confidence interval each time, about 95 of 100 such intervals would include the true mean If we repeat the study over and over again, calculating a 95% confidence interval each time, about 95 of 100 such intervals would include the true mean Whether the one that we have obtained from our study is one of them we will never know – but we have some confidence Whether the one that we have obtained from our study is one of them we will never know – but we have some confidence It is a measure of precision of our estimate It is a measure of precision of our estimate Bigger confidence interval -> less precision Bigger confidence interval -> less precision Confidence intervals

Exploratory data analysis Exploratory data analysis Presentation of results Presentation of results Examples: Bar charts, Line graphs, Scatter plots, Box plots, Kaplan Meier Plots etc. Examples: Bar charts, Line graphs, Scatter plots, Box plots, Kaplan Meier Plots etc. Graphs can only be as good as the data they display Graphs can only be as good as the data they display No amount of creativity can produce a good graph from dubious data No amount of creativity can produce a good graph from dubious data Graphical presentation of the data

Bar chart 2005 maternity report

Line graph

Box plot median Q1 Q3 1.5 x (Q3-Q1) Smallest obs marks end of whisker Obs beyond end of whisker

Data to chart ratio Mental health score by treatment groups GoodBad

Inadequate chart type Effect of ethnicity on road traffic injury deaths and hospitalisations, 2000-8, Auckland region, by age group, adjusted for gender and deprivation (using National Minimum Data Set and Mortality Collection data) Points with error bars Points with error bars Log scale Log scale Graphs of risk or rate ratio should be presented with

Odds ratio presented with logarithmic scale Outcome: Blindness

Use appropriate graph types for the appropriate purpose, e.g. line chart for trend Use appropriate graph types for the appropriate purpose, e.g. line chart for trend All axes, tick marks, title, should be labelled All axes, tick marks, title, should be labelled Appropriate scale used Appropriate scale used Adequate data to chart ratio Adequate data to chart ratio Avoid unnecessary complexity such as Avoid unnecessary complexity such as Irrelevant decoration Irrelevant decoration Too much colours Too much colours 3D effects 3D effects Keep it simple! Keep it simple! Graphical presentation of the data

Research process Research question Primary and secondary endpoints Study design Sampling and/or randomisation scheme Power and sample size calculation Pre-define analyses methods Analyse data Interpret results Disseminate

One of the statistical, economical and ethical issues of the design of medical studies One of the statistical, economical and ethical issues of the design of medical studies Statistical: Ensure the study is large enough to detect an effect if it exists Statistical: Ensure the study is large enough to detect an effect if it exists Economical: Ensure not enlist more patients than are needed Economical: Ensure not enlist more patients than are needed Ethical: unethical to engage more people in a trial than are needed Ethical: unethical to engage more people in a trial than are needed Larger samples -> more precise estimates Larger samples -> more precise estimates How large? How large? Sample size and power of a study

The power of a test is the probability of detecting a true difference The power of a test is the probability of detecting a true difference The size of the sample needed depends on The size of the sample needed depends on required power required power detectable difference detectable difference variability in the population variability in the population level of significance (probability of falsely reject the NULL) level of significance (probability of falsely reject the NULL) statistical test being used statistical test being used Need information to calculated a meaningful sample size – literature search Need information to calculated a meaningful sample size – literature search Sample size and power of a study

A double blind randomised controlled study on treatment for chronic hypertension during pregnancy A double blind randomised controlled study on treatment for chronic hypertension during pregnancy Comparing two treatments: Comparing two treatments: Standard treatment Standard treatment New treatment New treatment Sample size and power of a study - an example

Based on current evidence, assume Based on current evidence, assume – Detectable difference: 10mmHg – Standard deviation: 15 mmHg – 90% power – 5% significance level – Two-sided test – 1:1 ratio Using PS (a power and sample size calculation software) – 48 subjects per group Using PS (a power and sample size calculation software) – 48 subjects per group After considering drop-out rate, say 10%, round to, say, 60 subjects per group After considering drop-out rate, say 10%, round to, say, 60 subjects per group Sample size and power of a study - an example

Sample size and power of a study Chronic hypertension during pregnancy example To detect a difference of 10mmHg To detect a difference of 10mmHg SD varies from 5 to 30mmHg SD varies from 5 to 30mmHg

Sample size calculation is an evidence based best guess Relies on assumptions Relies on assumptions Not a precise number Not a precise number No guarantee of significant effect at the end of a study No guarantee of significant effect at the end of a study Sample size and power of a study

Any Questions?