Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.

Slides:



Advertisements
Similar presentations
Introduction to Summary Statistics
Advertisements

Introduction to statistics in medicine – Part 1 Arier Lee.
Statistical Tests Karen H. Hagglund, M.S.
Measures of Dispersion or Measures of Variability
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
QUANTITATIVE DATA ANALYSIS
Stat 301 – Day 28 Review. Last Time - Handout (a) Make sure you discuss shape, center, and spread, and cite graphical and numerical evidence, in context.
Introduction to Educational Statistics
Data observation and Descriptive Statistics
1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.
Central Tendency and Variability Chapter 4. Central Tendency >Mean: arithmetic average Add up all scores, divide by number of scores >Median: middle score.
1.3 Psychology Statistics AP Psychology Mr. Loomis.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Estimation of Statistical Parameters
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
Statistics Recording the results from our studies.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
Smith/Davis (c) 2005 Prentice Hall Chapter Six Summarizing and Comparing Data: Measures of Variation, Distribution of Means and the Standard Error of the.
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 11. Course website:
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
The exam duration: 1hour 30 min. Marks :25 All MCQ’s. You should choose the correct answer. No major calculations, but simple maths IQ is required. No.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Descriptive Statistics
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Categorical data 1 Single proportion and comparison of 2 proportions دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه علوم.
UTOPPS—Fall 2004 Teaching Statistics in Psychology.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
INVESTIGATION 1.
NORMAL DISTRIBUTION Normal curve Smooth, Bell shaped, bilaterally symmetrical curve Total area is =1 Mean is 0 Standard deviation=1 Mean, median, mode.
Agenda Descriptive Statistics Measures of Spread - Variability.
Medical Statistics as a science
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 6 Putting Statistics to Work.
Copyright © 2005 Pearson Education, Inc. Slide 6-1.
Introduction to Medical Statistics. Why Do Statistics? Extrapolate from data collected to make general conclusions about larger population from which.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Hypothesis Testing and Statistical Significance
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
Day 2 Session 1 Basic statistics Gabriele Price Senior Public Health Intelligence Analyst South.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Lecture 9-I Data Analysis: Bivariate Analysis and Hypothesis Testing
Descriptive Statistics
Doc.RNDr.Iveta Bedáňová, Ph.D.
ESTIMATION.
Research Methods in Psychology PSY 311
Description of Data (Summary and Variability measures)
STATS DAY First a few review questions.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Central tendency and spread
Introduction Second report for TEGoVA ‘Assessing the Accuracy of Individual Property Values Estimated by Automated Valuation Models’ Objective.
Descriptive and inferential statistics. Confidence interval
Data analysis and basic statistics
An introduction to an expansive and complex field
Descriptive Statistics
GENERALIZATION OF RESULTS OF A SAMPLE OVER POPULATION
Descriptive Statistics
Basic Biostatistics Measures of central tendency and dispersion
Presentation transcript:

Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009

Overview Types of data Summarising data The Normal distribution Confidence intervals Hypothesis testing P-values

Types of Data Numerical (Quantitative) Counted or measured Discrete Continuous Categorical (Qualitative) Characterises a quality Nominal Ordered

Numerical data Discrete Integers (whole numbers) Examples Number of people Number of teeth Continuous Any value on a scale Examples Height Weight

Categorical data Nominal No natural order Examples Gender Ethnic group Ordered Have a natural order Examples Socio-economic group Cancer Staging (I – IV)

Which types of data are the following? Screening test result Parity (no. of children) Pain scale Age at last birthday Exact age Alive at 6 months? Number of bed days in hospital –categorical nominal –numerical discrete –categorical ordered –numerical discrete –numerical continuous –categorical nominal –numerical discrete

Summarising numerical data 1. Location (central tendency) Mean, median, mode 2. Spread (variation) Range, percentiles, standard deviation

Location Mean – sum of all obs / number of obs Median – value that divides the dist in 2, odd no. of obs - middle obs even no. of obs - mean of central pair Mode – value that occurs most frequently

Mean and median? a)3, 4, 5, 6, 7 b)9,10, 20, 21 c)1, 2, 3, 4, 990 Mean 25/5 60/4 1000/5 Median

Variation Range = highest value – lowest value Interquartile range = upper quartile – lower quartile (i.e. 3 rd quartile – 1 st quartile) Percentile – value below which a given proportion of the data lies

Variance Step 1: Calculate ‘Deviations’ = the difference between each observation and the mean of the data Step 2: Square these Deviations Step 3: Average the Squared Deviations … this is the Variance (Strictly, divide by n-1, not n)

Standard Deviation Step 4: Take the square root of the Variance (this returns the statistic to the same units as the data) … this is the Standard Deviation SD measures the amount of variability in the population

Summarising categorical data Percentages and rates Covered in Day 3 – Introduction to Analysis session

Normal distribution Symmetric Bell shaped Standard Normal Distribution Mean = 0 SD = 1 Represents the distribution of values observed if whole population was studied

Normal distribution Mean, Median, Mode

Normal Distribution, changes in mean

Normal Distribution, changes in SD

Normal distribution Defined by complex math formulae Published tables listing the area under the Standard Normal Curve Standard N scores – Z scores Used to calculate area between 2 points 95% of dist lies within +/- 2 SD of mean Known as ‘reference range’

Normal distribution

Importance of N distribution Many biological variables are N dist or can be made N dist by transformation Many statistical tests require data to be N distributed If data skewed need to transform 1/X, Log (X), sqrt (X)

Symmetric and Skewed Data

Population Samples Bias Deviation from true result Minimised by random sampling Random Error In any random sample there will be sampling variation Minimised by random sampling

Sampling Variability Hypothesis TestsConfidence Intervals

Standard Error Standard deviation measures the amount of variability in the sample estimate It indicates how closely the population mean or proportion is likely to be to the sample estimate

Standard Error Mean, Proportion,

Confidence Intervals Based on the Normal distribution, 95% sample estimates will be within 1.96 SEs from the true value For 95% of samples this interval will contain the true population value For any one sample there is a 95% chance that the interval contains the true value

Confidence Intervals 5% risk (or 1 in 20 chance) than true value lies outside the 95 % interval Tells us how imprecise our estimate is Provides a range of values within which the true (population) value is likely to lie Narrow 95% CI precise estimate Wide 95% CI imprecise estimate

Self-reported smoking status in women (%), by ethnic group with 95% confidence intervals ( England, 2004)

What can we say about the true smoking prevalence for the general population? For which ethnic groups is the prevalence of smoking significantly different from 25%? Is the prevalence of smoking significantly different between the Black Caribbean and Black African populations? Is the prevalence of smoking significantly different between the Pakistani and Bangladeshi populations? Interpretation of confidence intervals

95% confident that the true smoking prevalence for the general population is between 22.5 and 24.5% For Black African, Indian, Pakistani, Bangladeshi and Chinese the prevalence of smoking is significantly different from 25% The prevalence of smoking is significantly different between Black Caribbean and Black African groups Cannot be sure that the prevalence of smoking is significantly different between the Pakistani and Bangladeshi populations Interpretation of confidence intervals

Non overlapping intervals indicative of real differences Overlapping intervals need to be considered with caution Need to be careful about using confidence intervals as a means of testing. The smaller the sample size, the wider the confidence interval

Hypothesis Tests Assess strength of evidence for an association Test statistic calculated using population value, sample estimate and stnd. error Null hypothesis; no true difference between groups in population from which samples arose

Hypothesis Tests If the null hypothesis is true, what are the chances of getting as big (or bigger) as that observed Uses population value sample estimate and Standard Error Null hypothesis; no true difference between groups in population from which samples arose

Illustration of acceptance regions

P-values probability of obtaining a difference as large (or larger) as that observed, if there is really no difference in the population from which the samples came, i.e. if the null hypothesis is true

P-values Small p-value (p<0.05) unlikely that the sample arose for a pop where null is true Evidence for a real difference in pop Large p-value (p>0.05) likely that the sample arose for a pop where null is true No evidence to reject the null hypothesis

Interpretation of P-values Source; Essential medical statistics By Betty R. Kirkwood, Jonathan A. C. Sterne

Quiz A person was defined as hypertensive if their diastolic blood pressure was > 90 mmHg & their systolic was > 140 mmHg. The variable ‘hypertensive’ is: a)Paired continuous b)Nominal categorical c)Skewed d)Continuous

What conclusion can be drawn from this figure? a)The mean is less than the standard deviation b)The mean is higher than the median c)There are fewer observations below the mean than above it d)The mean is approximately equal to the median

Based on a sample of 153 newborns, the 95% CI for the pop mean birth weight was between 3181 and 3319 grams: a)95% of the individual birth weights are between 3181 & 3319 grams b)The true mean for the 153 newborns is probably between 3181 & 3319 grams c)The mean of the population from which the 153 newborns came is between 3181 & 3319 grams d)There is a 95% chance that the true mean of the population from which the 153 newborns came is included in the range grams

Useful Resource

Finding out more

Conclusions Cover some basic statistical concepts Gain insight into what they mean Gain confidence in understanding basic statistics

Basic Statistics Exercise Exercise 1 - Calculate some summary statistics for class size data in spreadsheet Exercise 2 – using the CI template provided calculate the 95% CI for the mean class size from exercise 1

Basic Statistics Exercise To download file go to and search on “intelligence training” then Day 2 or go to Useful Excel Functions AVERAGE, MEDIAN, MODE, QUARTILE, PERCENTILE, VAR, STDEV