Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.

Similar presentations


Presentation on theme: "Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad."— Presentation transcript:

1 Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad

2 Cont.. Important Statistical Concepts 2

3 The Variance In statistics, the variance of a random variable or distribution is the expected (mean) value of the square of the deviation of that variable from its expected value or mean. Thus the variance is a measure of the amount of variation within the values of that variable, taking account of all possible values and their probabilities. If a random variable X has the expected (mean) value E[X]=μ, then the variance of X can be given by:

4 The Variance The above definition of variance encompasses random variables that are discrete or continuous. It can be expanded as follows:

5 Variance is non-negative because the squares are positive or zero. The variance of a constant a is zero, and the variance of a variable in a data set is 0 if and only if all entries have the same value. Variance is invariant with respect to changes in a location parameter. That is, if a constant is added to all values of the variable, the variance is unchanged. If all values are scaled by a constant, the variance is scaled by the square of that constant. The Variance: Properties

6 The Sample Variance If we have a series of n measurements of a random variable X as X i, where i = 1, 2,..., n, then the sample variance, can be used to estimate the population variance of X = (x 1, x 2,..., x n ), The sample variance is calculated as

7 The Sample Variance The denominator, (n-1) is known as the degrees of freedom in calculating : Intuitively, once is known, only n-1 observation values are free to vary, one is predetermined by When n = 1 the variance of a single sample is obviously zero regardless of the true variance. This bias needs to be corrected for when n is small.

8 The Sample Variance For the hypothetical price data for Dec CME Live Cattle futures contract, 67.05, 66.89, 67.45, 67.45, 68.39, 68.39, 70.10, the sample variance can be calculated as

9 The Standard Deviation In statistics, the standard deviation of a random variable or distribution is the square root of its variance. If a random variable X has the expected value (mean) E[X]=μ, then the standard deviation of X can be given by: That is, the standard deviation σ (sigma) is the square root of the average value of (X − μ) 2.

10 The covariance between two real-valued random variables X and Y, with mean (expected values) and, is Cov(X, Y) can be negative, zero, or positive Random variables with covariance is zero are called uncorrelated or independent The Covariance

11 If X and Y are independent, then their covariance is zero. This follows because under independence, Recalling the final form of the covariance derivation given above, and substituting, we get The converse, however, is generally not true: Some pairs of random variables have covariance zero although they are not independent. Covariance

12 If X and Y are real-valued random variables and a and b are constants ("constant" in this context means non-random), then the following facts are a consequence of the definition of covariance: The Covariance: Properties

13 Correlation Coefficient – A disadvantage of the covariance statistic is that its magnitude can not be easily interpreted, since it depends on the units in which we measure X and Y The related and more used correlation coefficient remedies this disadvantage by standardizing the deviations from the mean: The correlation coefficient is symmetric, that is

14 Correlation Coefficient The value of correlation coefficient falls between −1 and 1: r x,y = 0 => X and Y are uncorrelated r x,y = 1 => X and Y are perfectly positively correlated r x,y = −1 => X and Y are perfectly negatively correlated

15 Inferential Statistics Sampling, Probability, Hypothesis Testing and Estimation

16 Review of Sampling Population – group of people, communities, or organizations studied. Includes all possible objects of study. Sampling frame list of people/organizations etc. in the population who can be chosen for participation in the study. Most sampling frames do not include all people in the population (example – phone book) Sample – part of the population. Reduced down to manageable size. Ideally we would want to draw a sample that is representative of the population in terms of certain key characteristics (for example, gender and age). 16

17 Important information about samples For qualitative research, we are looking at specific situations. It may not be important to have a representative sample. We often use nonprobability sampling with qualitative methods (snowball, purposive, or convenience samples). For most types of quantitative research we do want a sample that is representative of the population. We will want to generalize our findings from the sample to the population. 17

18 To generalize means that we can say that we would expect to have the same findings if we studied everyone in the population as we did when we looked at the sample (within a certain degree of probability) 18

19 In studies in which we will generalize from the sample to the population We must have a sample that is similar or the same on specific dimensions as the population. We will want to use inferential statistics to analyze our data so that we can infer that findings from a sample are the same as those we would get from the population. Theoretically, we must have a normal distribution in order to use inferential statistics. We will use sampling methods in which every respondent has a known probability of selection (probability sampling) The best type of sampling method to use with inferential statistics is that in which each participant has an equal probability of selection (random sampling). 19

20 Exceptions to this Rule The population under study is small enough that everyone can be selected for participation (this still allows you to use inferential statistics) Certain types of applied research using quantitative methods such as community needs assessments and some types of surveys in which it is simply important to have as many people respond as possible. However, we will not be able to generalize our findings to the population. 20

21 We can choose random samples by assigning a code number to each respondent and: Pulling numbers out of a hat. Using a table of random numbers from a statistics book. Generating a table of random numbers on a computer. 21

22 Important Definitions Probability – the mathematical likelihood that a certain event will occur. Probabilities can range from 0 to 1.00 Parameters describe the characteristics of a population. (Variables such as age, gender, income, etc.). Statistics describe the characteristics of a sample on the same types of variables. 22

23 We apply some of the ideas of central limit theorem to determining the probability that an event in research will occur The Normal Curve can be viewed as a theoretical frequency as well as a probability distribution for normally distributed ratio and interval data. The area under the curve is equal to 100% or a total probability of 1.00 Probability looks at how many chances out of l00 something will occur. Odds are chances against an event occurring. 23

24 Concepts related to Sampling Error Sampling Error: The degree to which a sample differs on a key variable from the population. Confidence Level: The number of times out of 100 that the true value will fall within the confidence interval. Confidence Interval: A calculated range for the true value, based on the relative sizes of the sample and the population. Why is Confidence Level Important? Confidence levels, which indicate the level of error we are willing to accept, are based on the concept of the normal curve and probabilities. Generally, we set this level of confidence at either 90%, 95% or 99%. At a 95% confidence level, 95 times out of 100 the true value will fall within the confidence interval. 24

25 The term used to describe the difference between sample statistics and population parameters is sampling error. 25

26 Sampling More important than sample size is how the sample was taken. Example: – Imagine if a survey of the 10,000 people and their attitude on the sky train and how often they take the sky train was taken from people as they were entering or exiting from different locations of the sky train. – Imagine the same survey taken of 10,000 people living in Bang Na. – Imagine if the same survey taken of 2,000 people from various randomly chosen locations throughout Bangkok. From the latter examples it is clear that how the data is collected will have a great impact on the findings Which survey results would you trust to represent people living in Bangkok. 26

27 Various sampling designs - Simple Random Sampling (SRS) Simple Random Sampling (SRS) – A simple random sample is a sample in which all units in the sampling frame have an equal probability of selection. – Many statistical tests have certain assumptions that they rely on and these assumptions are often met when a simple random sample is taken. – If the researcher wanted to collect a simple random sample of people in Bangkok, the researcher would need a list of all people in Bangkok. Where would this list come from? A telephone list, is only a list of all people in Bangkok with a telephone. 27

28 Various sampling designs - Stratified Sampling Stratified Sampling – The population is separated in groups or strata and from within each strata a SRS is taken. – Again where would this list come from for each strata to perform a SRS within each strata? 28

29 Various sampling designs - Convenience Sampling Convenience Sampling – A sample collected by what is convenient For example, collecting surveys from a shopping mall, yielding a lot of data at a low price. – Note: statistical tests are inappropriate when performed on a convenience sample 29

30 The role of sampling in quantitative research Statistics is at the heart of quantitative research and sampling is a very important part of statistics. There is an old saying “Garbage in garbage out (G.I.G.O.).” For understanding G.I.G.O. in reference to statistics and sampling the reader can think of how a “garbage” sample would yield “garbage” statistical results. 30

31 The role of sampling in quantitative research For many research projects collecting data takes a large portion of the overall time of the project. – After collecting and entering the data using statistical software packages, such as SPSS or Minitab, the statistics can be calculated within minutes. – A very important fact though is that getting an answer and getting the right answer are not the same thing. Most evident when thinking of exams. Think about G.I.G.O. before deciding how to collect the data. 31


Download ppt "Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad."

Similar presentations


Ads by Google