Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Similar presentations


Presentation on theme: "Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when."— Presentation transcript:

1 Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when planning a survey.

2 Scot Exec Course Nov/Dec 04 Summary Statistical inference –Design based –Model based Confidence intervals and hypothesis tests - general Their modification for survey designs –Design effects and design factors Calculation of sample numbers for studies –Their modification for complex surveys

3 Scot Exec Course Nov/Dec 04 Statistical inference Making inferences about some aspect of the population, using observation to draw conclusions about the population now, or will evolve in future Data are what we are given Inference allows us to turn them into information

4 Scot Exec Course Nov/Dec 04 Elements needed for statistical inference – design based Want to learn something about a population You have –A model of how the sample was selected from the population. –Some data obtained from the sample –Knowledge of how to estimate! E.g. Obtain data on the income of 10,000 from a population of 5 million. Need inference to estimate the income distribution of the whole 5 million and to know how close this is to the population value

5 Scot Exec Course Nov/Dec 04 Elements needed for statistical inference – model based You have –A model that could have generated the data for your population, along with ideas about what current and future populations this might generalise to.. –Some data that can be assumed to be generated by this model. –Knowledge of how to carry out the inference! E.g. Obtain data on the income of 10,000 from a population and can make the assumption that the income distribution follows some mathematical distribution Need inference about the assumed model for the income distribution of the whole 5 million and how close your estimate will be to the true value

6 Scot Exec Course Nov/Dec 04 How do design and model based inferences differ? Conceptually poles apart In practice they give the same answers Except when numbers are small Or when a large proportion of the population has been sampled But its good to think about what you are doing and decide which type fits your problem

7 Scot Exec Course Nov/Dec 04 Next set of results Apply to a simple unstructured sample –No clustering –No stratification –No weighting Taken from a population with replacement (not a problem in model based inference) Exactly the same large-sample results apply for model-based and design-based inferences

8 Scot Exec Course Nov/Dec 04 Mean of 9 x s  

9 Scot Exec Course Nov/Dec 04 Standard error of the mean Approx a normal distr with s.d. The data are fixed, so this tells us where  is likely to be. is called the standard error of the sample mean Sometimes s.e.mean - it measures the expected distance of the “true” mean from the mean of the observed sample. A 100(1  confidence interval for  from the normal distribution Is

10 Scot Exec Course Nov/Dec 04 Values of Z for confidence intervals 95% c.I. Gives Z = 1.96 99% Z = 2.58 68% Z = 1 90% Z = 1.64

11 Scot Exec Course Nov/Dec 04 We can use it for proportions too Want too estimate a proportion  - e.g. a proportion of 20 year olds who use the internet –Then r/n estimates  –with standard error –to use this formula we replace  with A rule of thumb is that this approximation is OK if the smaller of r and (n-r) is >5.

12 Scot Exec Course Nov/Dec 04 Are these formulae good enough? Yes – unless your survey is too small to be any use They extend easily to differences in means and proportions Similar approximate results apply to regression models and logistic regressions BUT – they only apply to simple samples

13 Scot Exec Course Nov/Dec 04 But my data are more complicated than this And nobody will let me put standard erorrs or confidence intervals in my report A goal of a good statistical report is that it should not include and tables or graphs where what seems to be information are just the result of chance variation (noise). –set out your task in terms of an outcome predicted from other factors –Carry out a set of regression predictions –Base the tables to go in the report on the regression models that are found to be more than chance effects

14 Scot Exec Course Nov/Dec 04 Inferences for complex surveys The usual formulae and regression models don’t hold Most surveys use weighting And allowances for clustering and stratification have to be made Software that modifies the results we have just discussed and calculates them correctly for complex surveys is now available

15 Scot Exec Course Nov/Dec 04 Two main methods are used Taylor linearisation – theory of this all worked out in the 1940s and 50s Replication methods, jacknives and bootsraps – 1960s and 1970s Only now is software readily available to do things properly

16 Scot Exec Course Nov/Dec 04 Getting by without the correct software Carry out an analysis using an ordinary computer package (eg. SAS, SPSS simple procedures) But use a weight in the analysis to get results that will correct the bias in the estimates Your weighted analysis will get you the wrong standard errors and wrong tests, but the estimates will be about right. Use design effect tables to get some idea of the standard errors

17 Scot Exec Course Nov/Dec 04 Using the correct software Is not difficult – PEAS web site explains how Routines are available in SAS, SPSS, STATA and R But it does mean that you need to get details of the survey design E.g. PSU, stratification variables need to be available Easier for you than for me

18 Scot Exec Course Nov/Dec 04 Getting by without the correct software Use a table of design effects (DE) Often published with the surveys To get a s.e. from a complex survey –Calculate the design factor (DF) as the square root of the DE Multiply the s.e. from a simple analysis by DF For most household surveys DEs vary from about 0.8 to 2 or 3. This is a rough and ready method and will only work if weights are not too far from 1.0

19 Scot Exec Course Nov/Dec 04 Disadvantages of this DEs are not constant for a survey They are also different (usually lower) when subgroups of a survey are selected They may also be lower in complicated models, like regressions where it is also very hard to know how to apply them. Methods are approximate

20 Scot Exec Course Nov/Dec 04 Uses of design effects (DEs) They tell you about how well your survey design has worked Most survey software produce estimates of design effects with their output A design effect of 2 means your effective sample size is halved It is good to have such estimates when planning sample numbers for surveys.

21 Scot Exec Course Nov/Dec 04 Sample numbers for planning studies Think ahead about the sort of comparisons you might want to make Are you interested in time trends? Or in comparisons between certain groups –If so, what proportions in each Do you want to estimate something (eg % of children in poverty)?

22 Scot Exec Course Nov/Dec 04 Use spread sheet sample numbers.xls

23 Scot Exec Course Nov/Dec 04 To modify these for surveys Simply multiply your answer by an estimate of the design effect Or try to do the next survey better by getting a smaller design effect


Download ppt "Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when."

Similar presentations


Ads by Google