Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Chapter 6 Sampling and Sampling Distributions
Inference for Regression
© 2011 Pearson Education, Inc
Estimation of Sample Size
Objectives (BPS chapter 24)
Chapter 19 Confidence Intervals for Proportions.
Confidence Intervals for Proportions
PSY 307 – Statistics for the Behavioral Sciences
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Sample size computations Petter Mostad
Chapter 7 Sampling and Sampling Distributions
The Basics of Regression continued
Why sample? Diversity in populations Practicality and cost.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Need to know in order to do the normal dist problems How to calculate Z How to read a probability from the table, knowing Z **** how to convert table values.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
The t-test Inferences about Population Means when population SD is unknown.
Inference for regression - Simple linear regression
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
POSC 202A: Lecture 9 Lecture: statistical significance.
7-1 Estim Unit 7 Statistical Inference - 1 Estimation FPP Chapters 21,23, Point Estimation Margin of Error Interval Estimation - Confidence Intervals.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
PARAMETRIC STATISTICAL INFERENCE
Inference for distributions: - Comparing two means IPS chapter 7.2 © 2006 W.H. Freeman and Company.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Sampling Methods and Sampling Distributions
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
Section 10.1 Confidence Intervals
General Linear Model 2 Intro to ANOVA.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Estimation of a Population Mean
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 8. Parameter Estimation Using Confidence Intervals.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Measuring change in sample survey data. Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different.
Inference for Proportions Section Starter Do dogs who are house pets have higher cholesterol than dogs who live in a research clinic? A.
1 VI. Why do samples allow inference? How sure do we have to be? How many do I need to be that sure? Sampling Distributions, Confidence Intervals, & Sample.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Statistics 19 Confidence Intervals for Proportions.
Dr.Theingi Community Medicine
Stats Methods at IC Lecture 3: Regression.
Regression and Correlation
Confidence Intervals for Proportions
Confidence Intervals for Proportions
Sections 6-4 & 7-5 Estimation and Inferences Variation
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Lecture: statistical significance.
Comparing Two Proportions
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when planning a survey.

Scot Exec Course Nov/Dec 04 Summary Statistical inference –Design based –Model based Confidence intervals and hypothesis tests - general Their modification for survey designs –Design effects and design factors Calculation of sample numbers for studies –Their modification for complex surveys

Scot Exec Course Nov/Dec 04 Statistical inference Making inferences about some aspect of the population, using observation to draw conclusions about the population now, or will evolve in future Data are what we are given Inference allows us to turn them into information

Scot Exec Course Nov/Dec 04 Elements needed for statistical inference – design based Want to learn something about a population You have –A model of how the sample was selected from the population. –Some data obtained from the sample –Knowledge of how to estimate! E.g. Obtain data on the income of 10,000 from a population of 5 million. Need inference to estimate the income distribution of the whole 5 million and to know how close this is to the population value

Scot Exec Course Nov/Dec 04 Elements needed for statistical inference – model based You have –A model that could have generated the data for your population, along with ideas about what current and future populations this might generalise to.. –Some data that can be assumed to be generated by this model. –Knowledge of how to carry out the inference! E.g. Obtain data on the income of 10,000 from a population and can make the assumption that the income distribution follows some mathematical distribution Need inference about the assumed model for the income distribution of the whole 5 million and how close your estimate will be to the true value

Scot Exec Course Nov/Dec 04 How do design and model based inferences differ? Conceptually poles apart In practice they give the same answers Except when numbers are small Or when a large proportion of the population has been sampled But its good to think about what you are doing and decide which type fits your problem

Scot Exec Course Nov/Dec 04 Next set of results Apply to a simple unstructured sample –No clustering –No stratification –No weighting Taken from a population with replacement (not a problem in model based inference) Exactly the same large-sample results apply for model-based and design-based inferences

Scot Exec Course Nov/Dec 04 Mean of 9 x s  

Scot Exec Course Nov/Dec 04 Standard error of the mean Approx a normal distr with s.d. The data are fixed, so this tells us where  is likely to be. is called the standard error of the sample mean Sometimes s.e.mean - it measures the expected distance of the “true” mean from the mean of the observed sample. A 100(1  confidence interval for  from the normal distribution Is

Scot Exec Course Nov/Dec 04 Values of Z for confidence intervals 95% c.I. Gives Z = % Z = % Z = 1 90% Z = 1.64

Scot Exec Course Nov/Dec 04 We can use it for proportions too Want too estimate a proportion  - e.g. a proportion of 20 year olds who use the internet –Then r/n estimates  –with standard error –to use this formula we replace  with A rule of thumb is that this approximation is OK if the smaller of r and (n-r) is >5.

Scot Exec Course Nov/Dec 04 Are these formulae good enough? Yes – unless your survey is too small to be any use They extend easily to differences in means and proportions Similar approximate results apply to regression models and logistic regressions BUT – they only apply to simple samples

Scot Exec Course Nov/Dec 04 But my data are more complicated than this And nobody will let me put standard erorrs or confidence intervals in my report A goal of a good statistical report is that it should not include and tables or graphs where what seems to be information are just the result of chance variation (noise). –set out your task in terms of an outcome predicted from other factors –Carry out a set of regression predictions –Base the tables to go in the report on the regression models that are found to be more than chance effects

Scot Exec Course Nov/Dec 04 Inferences for complex surveys The usual formulae and regression models don’t hold Most surveys use weighting And allowances for clustering and stratification have to be made Software that modifies the results we have just discussed and calculates them correctly for complex surveys is now available

Scot Exec Course Nov/Dec 04 Two main methods are used Taylor linearisation – theory of this all worked out in the 1940s and 50s Replication methods, jacknives and bootsraps – 1960s and 1970s Only now is software readily available to do things properly

Scot Exec Course Nov/Dec 04 Getting by without the correct software Carry out an analysis using an ordinary computer package (eg. SAS, SPSS simple procedures) But use a weight in the analysis to get results that will correct the bias in the estimates Your weighted analysis will get you the wrong standard errors and wrong tests, but the estimates will be about right. Use design effect tables to get some idea of the standard errors

Scot Exec Course Nov/Dec 04 Using the correct software Is not difficult – PEAS web site explains how Routines are available in SAS, SPSS, STATA and R But it does mean that you need to get details of the survey design E.g. PSU, stratification variables need to be available Easier for you than for me

Scot Exec Course Nov/Dec 04 Getting by without the correct software Use a table of design effects (DE) Often published with the surveys To get a s.e. from a complex survey –Calculate the design factor (DF) as the square root of the DE Multiply the s.e. from a simple analysis by DF For most household surveys DEs vary from about 0.8 to 2 or 3. This is a rough and ready method and will only work if weights are not too far from 1.0

Scot Exec Course Nov/Dec 04 Disadvantages of this DEs are not constant for a survey They are also different (usually lower) when subgroups of a survey are selected They may also be lower in complicated models, like regressions where it is also very hard to know how to apply them. Methods are approximate

Scot Exec Course Nov/Dec 04 Uses of design effects (DEs) They tell you about how well your survey design has worked Most survey software produce estimates of design effects with their output A design effect of 2 means your effective sample size is halved It is good to have such estimates when planning sample numbers for surveys.

Scot Exec Course Nov/Dec 04 Sample numbers for planning studies Think ahead about the sort of comparisons you might want to make Are you interested in time trends? Or in comparisons between certain groups –If so, what proportions in each Do you want to estimate something (eg % of children in poverty)?

Scot Exec Course Nov/Dec 04 Use spread sheet sample numbers.xls

Scot Exec Course Nov/Dec 04 To modify these for surveys Simply multiply your answer by an estimate of the design effect Or try to do the next survey better by getting a smaller design effect