# SADC Course in Statistics Introduction to Statistical Inference (Session 03)

## Presentation on theme: "SADC Course in Statistics Introduction to Statistical Inference (Session 03)"— Presentation transcript:

SADC Course in Statistics Introduction to Statistical Inference (Session 03)

To put your footer here go to View > Header and Footer 2 Learning Objectives By the end of this session, you will be able to explain what is meant by statistical inference explain what is meant by an estimate of a population parameter explain what is meant by the sampling distribution of an estimate calculate and interpret the standard error of a sample mean from data of a simple random sample

To put your footer here go to View > Header and Footer 3 What is statistical inference? Inference is about drawing conclusions about population characteristics using information gathered from the sample It will be assumed for the remainder of this module that the sample is representative of the population We shall further assume that the sample has been drawn as a simple random sample from an infinite population

To put your footer here go to View > Header and Footer 4 Estimating population parameters PopulationSample Mean Variance 2 s2s2 Std. deviation s Population characteristics (parameters) are unknown, so use greek letters to denote population mean and standard deviation Sample characteristics are measurable and known, so use latin letters. They form estimates of the population values.

To put your footer here go to View > Header and Footer 5 An example of statistical inference What is the mean land holding size owned by rural households in district Kilindi in the Tanga region of Tanzania? Data from 404 households surveyed in this district gave a mean land holding size of 7.62 acres with a standard deviation 6.81. Our best estimate of the mean landholding size in Kilindi district is therefore 7.62 acres. What results are likely if we sampled again with a different set of households?

To put your footer here go to View > Header and Footer 6 A brief return to Practical 2… In practical 2, you sampled 5 Uganda districts twice. Look back at the mean and standard deviation of each sample. You will notice the answers are different each time you sample, i.e. there is variability in the sample means. If we took many more samples, we could produce a histogram of the means of these samples. An example follows…

To put your footer here go to View > Header and Footer 7 The distribution of means Suppose 10 University students were given a standard meal and the time taken to consume the meal was recorded for each. Suppose the 10 values gave: mean = 11.24, with std.dev.= 0.864 Lets assume this exercise was repeated 50 times with different samples of students A histogram of the resulting 500 obs. appears below, followed by a histogram of the 50 means from each sample

To put your footer here go to View > Header and Footer 8 Histogram of raw data The data appear to follow a normal distribution

To put your footer here go to View > Header and Footer 9 Histogram of the 50 sample means The dist n of the sample means is called its Sampling Distribution Notice that the variability of the above dist n is smaller than the variability of the raw data

To put your footer here go to View > Header and Footer 10 Back to estimation… The estimate of the mean landholding size in Kilindi district is 7.62 acres. Is this sufficient for reporting purposes, given that this answer is based on one particular sample? What we have is an estimate based on a sample of size 404. But how good is this estimate? We need a measure of the precision, i.e. variability, of this estimate…

To put your footer here go to View > Header and Footer 11 Sampling Variability The accuracy of the sample mean as an estimate of depends on: (i)the sample size (n) since the more data we collect, the more we know about the population, and the (ii) inherent variability in the data 2 These two quantities must enter the measure of precision of any estimate of a population parameter. We aim for high precision, i.e. low standard error!

To put your footer here go to View > Header and Footer 12 Standard error of the mean Precision of as estimate of is given by: the standard error of the mean. – Also written as s.e.m., or sometimes s.e. Estimate using sample data: s/n For example on landholding size, s.e.=6.81/404 = 6.81/20.1 = 0.339

To put your footer here go to View > Header and Footer 13 Summary If we had repeated samples (same size) taken from the same population: sample means would vary standard error of the mean is a measure of variability of sample means over (hypothetically drawn) repeated samples distribution of sample means over repeated samples is called the sampling distribution of the mean, ~ N(, 2 /n) The lower the value of the standard error, the greater is the precision of the estimate

To put your footer here go to View > Header and Footer 14 References SSC (2000b) Confidence and Significance: Key Concepts of Inferential Statistics. Statistical Guidelines Series supporting DFID Natural Resources Projects, Statistical Services Centre, The University of Reading, UK. www.reading.ac.uk/ssc/publications/guides.html www.reading.ac.uk/ssc/publications/guides.html Owen, F. and Jones, R. (1990). Statistics. 3rd edn. Pitman Publishing, London, pp 480. Clarke, G.M. and Cooke, D. (2004). A Basic Course in Statistics. 5th edn. Edward Arnold.

To put your footer here go to View > Header and Footer 15 Practical work follows to ensure learning objectives are achieved…