# SJS SDI_151 Design of Statistical Investigations Stephen Senn Introduction to Sampling.

## Presentation on theme: "SJS SDI_151 Design of Statistical Investigations Stephen Senn Introduction to Sampling."— Presentation transcript:

SJS SDI_151 Design of Statistical Investigations Stephen Senn Introduction to Sampling

SJS SDI_152 Representative Inference So far in the course we have been interested in comparisons –with some sort of causal investigation We now look at the case where we are interested in collecting representative material –samples to describe populations First we consider some possible applications

SJS SDI_153 Applications of Sampling Methods Quality control of manufacturing processes Financial audit Opinion polls Clinical audit Anthropology Social surveys Ecological surveys –capture/recapture

SJS SDI_154 An Important Practical Distinction All of these application areas require sampling theory and careful consideration as to how samples are drawn. However some of them have a further difficulty, which is that the opinions of human beings have to be ascertained. In what follows we shall often take opinion polls/social surveys as typical examples of sampling problems. (But our first example is not of this sort.) This will enable us to discuss also the further problems that arise in these contexts. However, first we shall review some very elementary statistical concepts

SJS SDI_155 Standard Deviation/ Standard Error There is common confusion between standard deviation and standard error The standard deviation describes the spread of original values The standard error is a measure of reliability of some statistic based on the original values

SJS SDI_156 An Illustration of This Difference This will now be illustrated using a simple example This example is again a medical one –My apologies! –I need a large data set –This one will have to do

SJS SDI_157 Example Surv_2 Cross-over trial in asthma 790 baseline FEV 1 readings –Since baselines unaffected by treatment –Regard as homogenous sample –Ignore fact that they are repeated measures The following slide shows distribution of readings

SJS SDI_158

9 Distribution Curve skewed to the right –Clearly not Normal Statistics –Mean 1.965 –Median 1.820 –Variance 0.462

SJS SDI_1510 Sampling Suppose that we take simple random samples of size 10 –Take these at random from original distribution With replacment Calculate mean of these Study distribution of these means –This is what is called a sampling distribution Illustrated on next slide

SJS SDI_1511

SJS SDI_1512 Distribution Curve less obviously skewed to the right –Approximation to Normal is closer Distribution is narrower Statistics –Mean 1.961 (very similar to previously) –Median 1.948 (now much closer to mean) –Variance 0.043 (approximately 1/10 of previous value)

SJS SDI_1513

SJS SDI_1514 The Different Variances Case 1 –Variance of original values –The square root of this is the standard deviation Case 2 –Variance of means –Square root of these is standard error of the mean (SEM) In general –Square root of the variance of a statistic (e.g. a mean) is a standard error

SJS SDI_1515 Standard Deviation v Standard Error Standard deviation –Used to describe variation of original values Can be population Can be sample Standard error –Used to describe reliability of a statistic. For example SE of mean SE of treatment differences

SJS SDI_1516 Estimating the Standard Error The standard error of a simple random sample of size n drawn from a population with variance 2 is / n. In practice 2, being a population parameter, is unknown so we estimate it using the sample variance, s 2. Hence we estimate the standard error of the mean by s/ n

SJS SDI_1517 Transformations Can be very valuable –Improve accuracy of analysis Under-utilised Previous FEV 1 example follows –log-transformation –data more nearly Normal But will not deal with all problems –Outliers ( in particular bad values)

SJS SDI_1518

SJS SDI_1519 Normal Distribution Ideal mathematical representation Rarely applies in practice to original data However, many sampling distributions have approximately Normal form This increases its utility considerably A combination of transformation of original data plus averaging can frequently make it applicable

SJS SDI_1520 Technical Terms (Schaeffer, Mendenhall and Ott) Element –Object on which a measurement is taken Population –A collection of elements about which we wish to make an inference Sampling units –Nonoverlapping collection of elements from the population that cover the entire population Sampling frame –A list of sampling units Sample –Collection of sampling units drawn from a frame

SJS SDI_1521 Probability Sampling Well-defined sampling frame Probabilistic rule for drawing sample Knowledge of rule and sampling frame enables probabilistic statements about the population There are various types of such sample –simple, cluster, stratified

SJS SDI_1522 Simple Random Sample We shall encounter this in more detail in the next lecture. For the moment we note a definition Sampling in which every member of the population has an equal chance of being chosen and successive drawings are independent Mariott, A Dictionary of Statistical Terms Only for simple random sampling is the standard error of the mean equal to / n

SJS SDI_1523 Quota Sampling Sampling frame not used May have rough idea of population composition Sampling carries on until various quotas are fulfilled –e.g 100 males, 100 females Difficult to make probabilistic statements about population

Similar presentations