Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.

Slides:



Advertisements
Similar presentations
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1)
Advertisements

Sampling: Final and Initial Sample Size Determination
Inference Sampling distributions Hypothesis testing.
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Hypothesis Testing: Hypotheses
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
Chapter 9 Hypothesis Testing.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
4.1 Introducing Hypothesis Tests 4.2 Measuring significance with P-values Visit the Maths Study Centre 11am-5pm This presentation.
Synthesis and Review 3/26/12 Multiple Comparisons Review of Concepts Review of Methods - Prezi Essential Synthesis 3 Professor Kari Lock Morgan Duke University.
Hypothesis Testing – Examples and Case Studies
More Randomization Distributions, Connections
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
Section 5.2 Confidence Intervals and P-values using Normal Distributions.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Estimation: Sampling Distribution
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence Intervals: Bootstrap Distribution
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
The z test statistic & two-sided tests Section
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
PSY 307 – Statistics for the Behavioral Sciences Chapter 9 – Sampling Distribution of the Mean.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 250 Dr. Kari Lock Morgan SECTION 4.1 Hypothesis test Null and alternative.
Inference with Proportions Review Mr. Hardin AP STATS 2015.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Section 4.5 Confidence Intervals and Hypothesis Tests.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 2.
Inference for Proportions Section Starter Do dogs who are house pets have higher cholesterol than dogs who live in a research clinic? A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Please hand in homework on Law of Large Numbers Dan Gilbert “Stumbling on Happiness”
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
Inference about proportions Example: One Proportion Population of students Sample of 175 students CI: What proportion (percentage) of students abstain.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 p-value.
Statistics: Unlocking the Power of Data Lock 5 Section 4.1 Introducing Hypothesis Tests.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Inference for Proportions
Normal Distribution Chapter 5 Normal distribution
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Hypothesis Tests for 1-Sample Proportion
When we free ourselves of desire,
Connecting Intuitive Simulation-Based Inference to Traditional Methods
CHAPTER 22: Inference about a Population Proportion
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem (5.2) Normal distribution for p-values (5.2) Normal distribution for confidence intervals (5.2) Standard normal (5.2)

Statistics: Unlocking the Power of Data Lock 5 Question #1 of the Day How do Malaria parasites impact mosquito behavior?

Statistics: Unlocking the Power of Data Lock 5 Malaria Parasites and Mosquitoes Mice were randomized to either eat from a malaria infected mouse or a healthy mouse After infection, the parasites go through two stages: 1) Oocyst (not yet infectious), Days 1-8 2) Sporozoite (infectious), Days 9 – 28 Response variable: whether the mosquito approached a human (in a cage with them) Does this behavior differ by infected vs control? Does it differ by infection stage? Dr. Andrew Read, Professor of Biology and Entomology and Penn State, is a co-author Cator LJ, George J, Blanford S, Murdock CC, Baker TC, Read AF, Thomas MB. (2013). ‘Manipulation’ without the parasite: altered feeding behaviour of mosquitoes is not dependent on infection with malaria parasites. Proc R Soc B 280: ‘Manipulation’ without the parasite: altered feeding behaviour of mosquitoes is not dependent on infection with malaria parasites

Statistics: Unlocking the Power of Data Lock 5 Malaria Parasites and Mosquitoes Malaria parasites would benefit if:  Mosquitos sought fewer blood meals after getting infected, but before becoming infectious (oocyst stage), because blood meals are risky  Mosquitoes sought more blood meals after becoming infectious (sporozoite stage), to pass on the infection Does infecting mosquitoes with Malaria actually impact their behavior in this way?

Statistics: Unlocking the Power of Data Lock 5 Oocyt Stage We’ll first look at the Oocyst stage, after the infected group has been infected, but before they are infectious. p C : proportion of controls to approach human p I : proportion of infecteds to approach human What are the relevant hypotheses? a) H 0 : p I = p C, H a : p I < p C b) H 0 : p I = p C, H a : p I > p C c) H 0 : p I < p C, H a : p I = p C d) H 0 : p I > p C, H a : p I = p C

Statistics: Unlocking the Power of Data Lock 5 Data: Oocyst Stage

Statistics: Unlocking the Power of Data Lock 5 Randomization Test p-value observed statistic

Statistics: Unlocking the Power of Data Lock 5 Mate choice and offspring fitness Difference in proportions Randomization and Bootstrap Distributions Tea and immunity Difference in means Mercury in fish Single mean Sleep versus caffeine Difference in proportions What do you notice? All bell-shaped distributions!

Statistics: Unlocking the Power of Data Lock 5 Normal Distribution The symmetric bell-shaped curve we have seen for almost all of our distribution of statistics is called a normal distribution The normal distribution is fully described by it’s mean and standard deviation: N(mean, standard deviation)

Statistics: Unlocking the Power of Data Lock 5 Randomization Distributions If a randomization distribution is normally distributed, we can write it as a)N(null value, se) b)N(statistic, se) c)N(parameter, se)

Statistics: Unlocking the Power of Data Lock 5 Malaria and Mosquitoes Which normal distribution should we use to approximate this? a) N(0, ) b) N(0, 0.056) c) N(-0.131, 0.056) d) N(0.056, 0)

Statistics: Unlocking the Power of Data Lock 5 Normal Distribution We can compare the original statistic to this Normal distribution to find the p-value!

Statistics: Unlocking the Power of Data Lock 5 p-value from N(null, SE) p-value observed statistic Exact same idea as randomization test, just using a smooth curve!

Statistics: Unlocking the Power of Data Lock 5 Standardized Data Often, we standardize the statistic to have mean 0 and standard deviation 1 How? z-scores! What is the equivalent for the null distribution? statistic null value SE

Statistics: Unlocking the Power of Data Lock 5 Standardized Statistic The standardized test statistic (also known as a z-statistic) is Calculating the number of standard errors a statistic is from the null lets us assess extremity on a common scale

Statistics: Unlocking the Power of Data Lock 5 Standardized Statistic Malaria and Mosquitoes: From original data: statistic = From null hypothesis: null value = 0 From randomization distribution: SE = Compare to N(0,1) to find p-value…

Statistics: Unlocking the Power of Data Lock 5 Standard Normal The standard normal distribution is the normal distribution with mean 0 and standard deviation 1 Standardized statistics are compared to the standard normal distribution

Statistics: Unlocking the Power of Data Lock 5 p-value from N(0,1) If a statistic is normally distributed under H 0, the p-value can be calculated as the proportion of a N(0,1) beyond

Statistics: Unlocking the Power of Data Lock 5 p-value from N(0,1) p-value standardized statistic Exact same idea as before, just standardized!

Statistics: Unlocking the Power of Data Lock 5 Randomization test: N(null, SE) Replace with smooth curve Standardize N(0,1) The p-value is always the proportion in the tail(s) beyond the relevant statistic! We have evidence that mosquitoes exposed to malaria parasites are less likely to approach a human before they become infectious than mosquitoes not exposed to malaria parasites.

Statistics: Unlocking the Power of Data Lock 5 Sporozoite Stage For the data from the Sporozoite stage, after infectious, what are the relevant hypotheses? p C : proportion of controls to approach human p I : proportion of infecteds to approach human What are the relevant hypotheses? a) H 0 : p I = p C, H a : p I < p C b) H 0 : p I = p C, H a : p I > p C c) H 0 : p I < p C, H a : p I = p C d) H 0 : p I > p C, H a : p I = p C

Statistics: Unlocking the Power of Data Lock 5 Oocyst StageSporozoite Stage Data

Statistics: Unlocking the Power of Data Lock 5 Sporozoite Stage The difference in proportions is 0.15 and the standard error is Is this significant? a) Yes b) No Standard normal

Statistics: Unlocking the Power of Data Lock 5 Proportion of Infected All mosquitoes in the infected group were exposed to the malaria parasites, but not all mosquitoes were actually infected Of the 201 mosquitoes in the infected group that we actually have data on, only 90 were actually infected (90/201 = 0.448) What proportion of mosquitoes eating from a malaria infected mouse become infected? We want a confidence interval!

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Interval 95% Confidence Interval

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Distributions If a bootstrap distribution is normally distributed, we can write it as a)N(parameter, sd) b)N(statistic, sd) c)N(parameter, se) d)N(statistic, se) sd = standard deviation of data values se = standard error = standard deviation of statistic

Statistics: Unlocking the Power of Data Lock 5 Normal Distribution We can find the middle P% of this Normal distribution to get the confidence interval!

Statistics: Unlocking the Power of Data Lock 5 CI from N(statistic, SE) 95% CI Same idea as the bootstrap, just using a smooth curve!

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Interval: N(statistic, SE) Replace with smooth curve 95% CI

Statistics: Unlocking the Power of Data Lock 5 (Un)-standardization Standardized scale: To un-standardize:

Statistics: Unlocking the Power of Data Lock 5 (Un)-standardization In testing, we go to a standardized statistic In intervals, we find (-z*, z*) for a standardized distribution, and return to the original scale Un-standardization (reverse of z-scores): What’s the equivalent for the distribution of the statistic? (bootstrap distribution) statistic ± z * SE

Statistics: Unlocking the Power of Data Lock 5 z*z* -z * P% P% Confidence Interval 2. Return to original scale with statistic ± z*× SE 1. Find values (–z * and z * ) that capture the middle P% of N(0,1)

Statistics: Unlocking the Power of Data Lock 5 Confidence Interval using N(0,1) If a statistic is normally distributed, we find a confidence interval for the parameter using statistic ± z*× SE where the proportion between –z* and +z* in the standard normal distribution is the desired level of confidence.

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals Find z * for a 99% confidence interval. z * = 2.575

Statistics: Unlocking the Power of Data Lock 5 Proportion of Infected Proportion of infected mosquitoes:  Sample statistic (from data): 90/201 =  z* (from standard normal):  SE (from bootstrap distribution): Give a 99% confidence interval for the proportion of mosquitoes who get infected.

Statistics: Unlocking the Power of Data Lock 5 z* Why use the standard normal? z * is always the same, regardless of the data! Common confidence levels:  95%: z * = 1.96 (but 2 is close enough)  90%: z * =  99%: z * = 2.576

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Interval: N(statistic, SE) Replace with smooth curve middle 95% N(0, 1) middle 95% Un- standardize statistic ± z* × SE ± 1.96 × (0.375, 0.521) We are 95% confident that only between and of the mosquitoes exposed to infection actually get infected.

Statistics: Unlocking the Power of Data Lock 5 Malaria and Mosquitoes Should we limit our analysis to only those mosquitoes that actually got infected? Why or why not? In favor of yes:  We care about whether mosquitoes behave differently after being infected, not just after being exposed to an infection  Including mosquitoes that didn’t actually get infected may weaken results In favor of no:  Mosquitoes were not randomized to be infected or not, they were randomized to the possibility of becoming infected.  We could have confounding variables and could no longer make conclusions about causality Methods for this, but beyond the scope of this course

Statistics: Unlocking the Power of Data Lock 5 Confidence Interval Formula From original data From bootstrap distribution From N(0,1) IF SAMPLE SIZES ARE LARGE…

Statistics: Unlocking the Power of Data Lock 5 Formula for p-values From randomization distribution From H 0 From original data Compare z to N(0,1) for p-value IF SAMPLE SIZES ARE LARGE…

Statistics: Unlocking the Power of Data Lock 5 Standard Error Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations? We can!!! Or at least we’ll be able to next class…

Statistics: Unlocking the Power of Data Lock 5 To Do Read Chapter 5 Do HW 5.2 (due Friday, 10/30)