Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Slides:

Advertisements

Similar presentations

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!

Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.

Chapter 6 Sampling and Sampling Distributions

Statistical Tests Karen H. Hagglund, M.S.

QUANTITATIVE DATA ANALYSIS

Basic Statistical Review

Chapter 7 Sampling and Sampling Distributions

Summary of Quantitative Analysis Neuman and Robson Ch. 11

1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.

AM Recitation 2/10/11.

Statistical Analysis I have all this data. Now what does it mean?

Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.

Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.

Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.

Statistical Analysis I have all this data. Now what does it mean?

The exam duration: 1hour 30 min. Marks :25 All MCQ’s. You should choose the correct answer. No major calculations, but simple maths IQ is required. No.

Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.

Statistics in Biology. Histogram Shows continuous data – Data within a particular range.

Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.

Determination of Sample Size: A Review of Statistical Theory

Review Hints for Final. Descriptive Statistics: Describing a data set.

Academic Research Academic Research Dr Kishor Bhanushali M

Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.

Chapter Eight: Using Statistics to Answer Questions.

Principles of statistical testing

Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,

HL Psychology Internal Assessment

Revision of topics for CMED 305 Final Exam. The exam duration: 2 hours Marks :25 All MCQ’s. (50 questions) You should choose the correct answer. No major.

Chapter 13 Understanding research results: statistical inference.

Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”

Outline Sampling Measurement Descriptive Statistics:

I. ANOVA revisited & reviewed

Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.

Different Types of Data

CHAPTER 13 Data Processing, Basic Data Analysis, and the Statistical Testing Of Differences Copyright © 2000 by John Wiley & Sons, Inc.

Chapter 9 Hypothesis Testing.

MATH-138 Elementary Statistics

Chapter 9: Non-parametric Tests

Lecture Nine - Twelve Tests of Significance.

Data analysis Research methods.

Non-Parametric Tests 12/1.

Non-Parametric Tests 12/1.

Non-Parametric Tests 12/6.

Hypothesis testing. Chi-square test

CHOOSING A STATISTICAL TEST

Statistics for Psychology

Non-Parametric Tests.

Description of Data (Summary and Variability measures)

SDPBRN Postgraduate Training Day Dundee Dental Education Centre

Chapter 9 Hypothesis Testing.

Introduction to Statistics

9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE

Descriptive and inferential statistics

Hypothesis testing. Parametric tests

Descriptive and inferential statistics. Confidence interval

Hypothesis testing. Chi-square test

Association, correlation and regression in biomedical research

STATISTICS Topic 1 IB Biology Miss Werba.

Introduction to Biostatistics

Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Statistics II: An Overview of Statistics

Data Processing, Basic Data Analysis, and the

Parametric versus Nonparametric (Chi-square)

Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics

Chapter Nine: Using Statistics to Answer Questions

Chapter Fifteen Frequency Distribution, Cross-Tabulation, and

DESIGN OF EXPERIMENT (DOE)

Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Introductory Statistics

Presentation transcript:

Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine SMME I 2017/2018 Final exam Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Layout 30 min 40 min SMME I Final Exam: Entry test in Bioethics Entry test in Biostatistics ______________________ 1 case for statistical analysis and interpretation 1 bioethical case for comment and discussion 1 theory question from the bioethics questionnaire Oral discussion 30 min 40 min

Resources

Do not forget List of formulas Calculator Student book

Population vs Sample Population Parameters μ, σ, σ2 Sample / Statistics x, s, s2

Descriptive vs Inferential statistics Population Parameters Sampling From population to sample Sample Statistics From sample to population Inferential statistics 6

Sampling Stages of sampling: Defining target population Determining sampling size Selecting a sampling method Properties of a good sample: Random selection Representativeness by structure Representativeness by number of cases

Sample size calculation Generally, the sample size for any study depends on: Acceptable level of confidence; Expected effect size and absolute error of precision; Underlying scatter in the population; Power of the study. High power Large sample size Large effect Little scatter Low power Small sample size Small effect Lots of scatter

Levels of measurement

Graphical summaries Variable Graph Statistics One qualitative Bar chart Pie chart Frequency table Relative frequency table Proportion Two qualitative Side-by-side bar chart Segmented bar chart Two-way table Difference in proportions One quantitative Dotplot Histogram Boxplot Measures of central tendency Measures of spread Other: five number summary, percentiles, distribution shape One quantitative by one qualitative Side-by-side boxplots Stacked dotplots Statistics broken down by group Difference in means Two quantitative Scatterplot Correlation

Central tendency and spread Central tendency: Mean, mode and median Spread: Range, interquartile range, standard deviation Mistakes: Focusing on only the mean and ignoring the variability Standard deviation and standard error of the mean Variation and variance What is best to use in different scenarios? Symmetrical data: mean and standard deviation Skewed data: median and interquartile range

Rule of 3-sigma When data are approximately normally distributed: approximately 68% of the data lie within one SD of the mean; approximately 95% of the data lie within two SDs of the mean; approximately 99% of the data lie within three SDs of the mean.

Normal (Gaussian) distribution Central limit theorem: Create a population with a known distribution that is not normal; Randomly select many samples of equal size from that population; Tabulate the means of these samples and graph the frequency distribution. Central limit theorem states that if your samples are large enough, the distribution of the means will approximate a normal distribution even if the population is not Gaussian. Mistakes: Normal vs common (or disease free); Few biological distributions are exactly normal.

Outliers Values that lie very far away from the other values in the data set.

Confidence interval for the population mean Population mean: point estimate vs interval estimate Standard error of the mean – how close the sample mean is likely to be to the population mean. Assumptions: a random representative sample, independent observations, the population is normally distributed (at least approximately). Confidence interval depends on: sample mean, standard deviation, sample size, degree of confidence. Mistakes: 95% of the values lie within the 95% CI; A 95% CI covers the mean ± 2 SD.

Hypothesis testing The general idea of hypothesis testing involves: Making an initial assumption; Collecting evidence (data); Based on the available evidence (data), deciding whether to reject or not reject the initial assumption. Every hypothesis test — regardless of the population parameter involved — requires the above three steps.

Hypothesis testing Decision: Reject null hypothesis Do not reject null hypothesis Null hypothesis is true Type I error No error Null hypothesis is false Type II error

Level of significance Level of significance (α) – the threshold for declaring if a result is significant. If the null hypothesis is true, α is the probability of rejecting the null hypothesis. α is decided as part of the research design, while P-value is computed from data. α = 0.05 is most commonly used. Small α value reduces the chance of Type I error, but increases the chance of Type II error. Trade-off based on the consequences of Type I (false-positive) and Type II (false-negative) errors.

Power Power – the probability of rejecting a false null hypothesis. Statistical power is inversely related to β or the probability of making a Type II error (power is equal to 1 – β). Power depends on the sample size, variability, significance level and hypothetical effect size. You need a larger sample when you are looking for a small effect and when the standard deviation is large.

Choosing a statistical test Choice of a statistical test depends on: Level of measurement for the dependent and independent variables; Number of groups or dependent measures; Number of units of observation; Type of distribution; The population parameter of interest (mean, variance, differences between means and/or variances).

Parametric and non-parametric tests Parametric test – the variable we have measured in the sample is normally distributed in the population to which we plan to generalize our findings Non-parametric test – distribution free, no assumption about the distribution of the variable in the population

Parametric and non-parametric tests Type of test Non-parametric Parametric Scale Nominal Ordinal Ordinal, Interval, Ratio 1 group χ2 goodness of fit test Wilcoxon signed rank test 1-sample t-test 2 unrelated groups χ2 test Mann–Whitney U test 2-sample t-test 2 related groups McNemar test Paired t-test K unrelated groups Kruskal–Wallis H test ANOVA K related groups Friedman matched samples test ANOVA with repeated measurements

Normality test Normality tests are used to determine if a data set is modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. In descriptive statistics terms, a normality test measures a goodness of fit of a normal model to the data – if the fit is poor then the data are not well modeled in that respect by a normal distribution, without making a judgment on any underlying variable. In frequentist statistics statistical hypothesis testing, data are tested against the null hypothesis that it is normally distributed.

Chi-square test limitations No categories should be less than 1 No more than 1/5 of the expected categories should be less than 5 To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more Yates Correction* When there is only 1 degree of freedom, regular chi-test should not be used Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values

Association is not causation. Beware! Association is not causation. The observed association between two variables might be due to the action of a third, unobserved variable.

Fisher exact test This test is only available for 2 x 2 tables. For small n, the probability can be computed exactly by counting all possible tables that can be constructed based on the marginal frequencies. Thus, the Fisher exact test computes the exact probability under the null hypothesis of obtaining the current distribution of frequencies across cells, or one that is more uneven.