Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving.

Similar presentations


Presentation on theme: "Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving."— Presentation transcript:

1

2 Normal Distribution And Sampling Dr. Burton

3 Graduate school approach to problem solving.

4 Progression of a histogram into a continuous distribution -4 -3 -2 -1 0 1 2 3 4 z

5 Progression of a histogram into a continuous distribution -4 -3 -2 -1 0 1 2 3 4 z

6 Progression of a histogram into a continuous distribution -4 -3 -2 -1 0 1 2 3 4 z

7 Progression of a histogram into a continuous distribution -4 -3 -2 -1 0 1 2 3 4 z

8 Progression of a histogram into a continuous distribution -4 -3 -2 -1 0 1 2 3 4 z

9 Progression of a histogram into a continuous distribution -4 -3 -2 -1 0 1 2 3 4 z

10 Progression of a histogram into a continuous distribution -4 -3 -2 -1 0 1 2 3 4 z 0.4 0.3 0.2 0.1 0.0

11 Area under the curve -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 = 50% 50%

12 Areas under the curve relating to z scores -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 34.1% 0 to -1 34.1% 0 to +1

13 Areas under the curve relating to z scores -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 68.2% -1 to -2+1 to +2 13.6%

14 Measures of Central Tendency Mean =  x i / n = the sum values observed divided by the number of observations Median = The middle value of all observations collected = 50 th percentile Mode = the most frequently occurring observation of values measured In a normal (gaussian) distribution all the measures are the same

15 Central limit theorem In reasonably large samples (25 or more) the distribution of the means of many samples is normal even though the data in individual samples may have skewness, kurtosis or unevenness. Therefore, a t-test may be computed on almost any set of continuous data, if the observations can be considered random and the sample size is reasonably large.

16 Areas under the curve relating to z scores -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 68.2% 13.6% 95.4%

17 Areas under the curve relating to z scores -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 95.4% 2.1% -2 to -3 +2 to +3

18 Areas under the curve relating to z scores -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 99.6%

19 Areas under the curve relating to +z scores (one tailed tests) -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 84.1% Acceptance area Critical area =15.9%

20 Areas under the curve relating to +z scores (one tailed tests) -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 97.7% Acceptance area Critical area =2.3%

21 Areas under the curve relating to +z scores (one tailed tests) -4 -3 -2 -1 0 1 2 3 4 0.4 0.3 0.2 0.1 0.0 99.8% Acceptance area Critical area =0.2%

22 Asymmetric Distributions -4 -3 -2 -1 0 1 2 3 4 Positively Skewed Right Negatively Skewed Left

23 Distributions (Kurtosis) -4 -3 -2 -1 0 1 2 3 4 Flat curve = Higher level of deviation from the mean High curve = Smaller deviation from the mean

24 Distributions (Bimodal Curve) -4 -3 -2 -1 0 1 2 3 4

25 -3  -2  -- ++ +2  +3  -3 -2 0 123 Z scores Theoretical normal distribution with standard deviations Probability [% of area in the tail(s)] Upper tail.1587.02288.0013 Two-tailed.3173.0455.0027

26 What is the z score for 0.05 probability? (one-tailed test) 1.645 What is the z score for 0.05 probability? (two tailed test) 1.96 What is the z score for 0.01? (one-tail test) 2.326 What is the z score for 0.01 probability? (two tailed test) 2.576

27 The Relationship Between Z and X  =100  =15 X= Z=  Population Mean Standard Deviation 130 – 100 15 2

28 Central limit theorem In reasonably large samples (25 or more) the distribution of the means of many samples is normal even though the data in individual samples may have skewness, kurtosis or unevenness. Therefore, a t-test may be computed on almost any set of continuous data, if the observations can be considered random and the sample size is reasonably large.

29 MEASURES AND VARIATION C. Standard Deviation –1. by far the most widely used measure of variation –2. the square root of the variance of the observations –3. computed by: - squaring each deviation from the mean - adding them up - dividing their sum by less than the sample size

30 MEASURES AND VARIATION or s =

31 xixi (x i - x)(x i - x) 2 1-525 2-416 4-24 7+11 10+416 12+636  x i = 36 0  (x i - x) 2 = 98 N=6 Mean = 6 S 2 =  (x i - x) 2 /n-1 98/6-1 = 98/5 = 19.6 Variance = 19.6 Standard Deviation = 4.43

32 Variance ------Standard Deviation Variance = Standard Deviation 2 Variance = Standard Deviation

33  (x - x) 2 n - 1 s = Student’s t distribution t = x -  s / n Standard deviation

34 Confidence Intervals The sample mean is a point estimate of the population mean. With the additional information provided by the standard error of the mean, we can estimate the limits (interval) within which the true population mean probably lies. Source: Osborn

35 Confidence Intervals This is called the confidence interval which gives a range of values that might reasonably contain the true population mean The confidence interval is represented as:a    b –with a certain degree of confidence - usually 95% or 99% Source: Osborn

36 Confidence Intervals Before calculating the range of the interval, one must specify the desired probability that the interval will include the unknown population parameter - usually 95% or 99%. After determining the values for a and b, probability becomes confidence. The process has generated an interval that either does or does not contain the unknown population parameter; this is a confidence interval. Source: Osborn

37 Confidence Intervals To calculate the Confidence Interval (CI) Source: Osborn

38 Confidence Intervals In the formula,  is equal to 1.96 or 2.58 (from the standard normal distribution) depending on the level of confidence required: –CI 95,  = 1.96 –CI 99,  = 2.58 Source: Osborn

39 68 72 76 85 87 90 93 94 95 97 98 103 105 107 114 117 118 119 123 124 127 151 159 217 76 85 87 93 98 103 105 117 118 119 123 127 151 217 Population Data: Sample 1 X = 114.9 Standard deviation = 34.1 Standard error of the mean = 8.8

40 Confidence Intervals Given a mean of 114.9 and a standard error of 8.8, the CI 95 is calculated: = 114.9 + 17.248 = 97.7, 132.1 Source: Osborn )8.8(96.19.114 )/( 95   nsXCI  Based on this sample it is assumed that 95% of the population values lie between 97.7 and 132.1

41 OUTLINE 2.1Selecting Appropriate Samples Explains why the selection of an appropriate sample has an important bearing on the reliability of inferences about a population 2.2Why Sample? Gives a number of reasons sampling is often preferable to census taking 2.3How Samples are Selected Explains how samples are selected 2.4How to Select a Random Sample Illustrates with a specific example the method of selecting a random sample using a computer statistical package 2.5Effectiveness of a Random Sample Demonstrates the credibility of the random sampling process 2.6Missing and incomplete Data Explains the problem of missing or incomplete data and offers suggestions on how to minimize this problem

42 LEARNING OBJECTIVES 1. Distinguish between a. populations and samples b. parameters and statistics c. various methods of sampling 2. Explain why the method of sampling is important 3. State why samples are used 4. Define random sample 5. Explain why it is important to use random sampling 6. Select a random sample using a computer statistical program 7. Suggest methods for dealing with missing data

43 SELECTING APPROPRIATE SAMPLES A. Population – a set of persons (or objects) having a common observable characteristic B. S ample – a subset of a population C. The WAY a sample is selected is more important than the size of the sample D. An appropriate sample should be representative of the population E. A set of observations may be summarized by a descriptive statistic called a parameter

44 SELECTING APPROPRIATE SAMPLES F. Random sample 1. Every subject has an equal opportunity for being selected 2. Technique most likely to yield a representative sample 3. Obstacles a. Response rate – how many will respond b. S ampling bias – some segment of the population may be over or under represented c. May be too costly

45 WHY SAMPLE? A. Random sampling - Each subject in the population has an equal chance of being selected 1. Avoids known and unknown biases on average 2. Helps convince others that the trial was conducted properly 3. Basis for statistical theory that underlies hypothesis tests and confidence intervals B. Convenience samples 1. selected at will or in a particular program 2. seldom representative of the underlying population 3. used when random samples are virtually impossible to select

46 WHY SAMPLE? C. Systematic sampling 1. used when a sampling frame – a complete, nonoverlapping list of the persons or objects constituting the population is available 2. randomly select a first case then proceed by selecting every case D. Stratified sampling – used when we wish the sample to represent the various strata (subgroups) of the population proportionately or to increase the precision of the estimate E. Cluster sampling 1. select a simple random sample (number of city blocks) 2. More economical than random selection of persons throughout the city

47 HOW TO SELECT A RANDOM SAMPLE Random Numbers Table: http://en.wikipedia.org/wiki/Random_number_table Computer statistical package: SPSS or Excel

48 EFFECTIVENESS OF A RANDOM SAMPLE A. Reliability is usually demonstrated by –1. defining fairly small population –2. selecting from it all conceivable samples of a particular size –3. mean average is computed –4. the variation for the population is observed –5. a comparison of these sample means (statistics) with the population mean (population) neatly demonstrates the credibility of the sampling scheme

49 MISSING AND INCOMPLETE DATA A. Bias may be introduced because of possible differences between respondents and nonrespondents B. L imits the ability to accurately draw inferences about the population C. Subjects may drop out of the study D. Ways to deal with missing data 1. Last observation carry-forward – take the last observed value prior to dropout and treat them as final data

50 Understanding and Reducing Errors Goals of Data Collection and Analysis –Promoting accuracy and precision –Reducing differential and nondifferential errors –Reducing intraobserver and interobserver variablity Accuracy and Usefulness –False-positive and false-negative results –Sensitivity and specificity –Predictive values –Likelihood rations, odds ratios, and cutoff ratios –Receiver operating characteristic (ROC) curves Measuring Agreement –Overall percentage agreement –Kappa test ratio

51 Promoting Precision and Accuracy Accuracy: The ability of a measurement to be correct on the average. Precision: the ability of a measurement to give the same result or a very similar result with repetition of the test. (reproducibility, reliability)

52 Accurate and precise  X X X X X X X X X True Value  Sample means X Target of Sampling:  Sample Means X

53 Precise only  X True Value (  ) X X X X X X Sample means X Target of Sampling:  Sample Means X

54 Accurate only  X X X X X X X X X True Value (  ) Sample means X Target of Sampling:  Sample Means X

55 Neither Accurate nor Precise  X X X X X X X X X True Value (  ) Sample means X Target of Sampling:  Sample Means X

56 Differential and nondifferential error Bias is a differential error –A nonrandom, systematic, or consistent error in which the values tend to be inaccurate in a particular direction. Nondifferential are random errors

57 Bias Three most problematic forms of bias in medicine: –1. Selection (Sampling) Bias: The following are biases that distort results because of the selection process Admission rate (Berkson’s) bias –Distortions in risk ratios occur as a result of different hospital admission rate among cases with the risk factor, cases without the risk factor, and controls with the risk factor –causing greatly different risk-factor probabilities to interfere with the outcome of interest. Nonresponse bias –i.e. noncompliance of people who have scheduled interviews in their home. Lead time bias –A time differential between diagnosis and treatment among sample subjects may result in erroneous attribution of higher survival rates to superior treatment rather than early detection.

58 Bias Three most problematic forms of bias in medicine: –1. Selection (Sampling) Bias Admission rate (Berkson’s) biasAdmission rate (Berkson’s) bias Nonresponse biasNonresponse bias Lead time biasLead time bias –2. Information (misclassification) Bias Recall biasRecall bias –Differentials in memory capabilities of sample subjects Interview biasInterview bias –“blinding of interviewers to diseased and control subjects is often difficult. Unacceptability biasUnacceptability bias –Patients reply with “desirable” answers

59 Bias Three most problematic forms of bias in medicine: –1. Selection (Sampling) Bias Admission rate (Berkson’s) bias Nonresponse bias Lead time bias –2. Information (misclassification) Bias Recall bias Interview bias Unacceptability bias –3. Confounding A confounding variable has a relationship with both the dependent and independent variables that masks or potentiates the effect of the variable on the study.A confounding variable has a relationship with both the dependent and independent variables that masks or potentiates the effect of the variable on the study.

60 Types of Variation Discrete variables –Nominal variables –Dichotomous (Binary) variables Ordinal (Ranked) variables Continuous (Dimensional) variables Ratio variables Risks and Proportions as variables

61 Types of Variation Nominal variablesNominal variables

62 Nominal A O B AB Social Security Number 123 45 6789 312 65 8432 555 44 7777

63 Types of Variation Nominal variablesNominal variables Dichotomous (Binary) variablesDichotomous (Binary) variables

64 Dichotomous (Binary) variables WNL Not WNL Accept Reject Normal Abnormal

65 Types of Variation Nominal variables Dichotomous (Binary) variables Ordinal (Ranked) variablesOrdinal (Ranked) variables

66 Ordinal (Ranked) variables Strongly agree, agree, neutral, disagree, strongly disagree 1 2 3 4 5 6 7 8 The difference in value between each rank is ignored.

67 Types of Variation Nominal variables Dichotomous (Binary) variables Discrete variables Ordinal (Ranked) variables Continuous (Dimensional) variablesContinuous (Dimensional) variables

68 Continuous (Dimensional) variables Height Blood Pressure Weight Temperature 32° F

69 Types of Variation Nominal variables Dichotomous (Binary) variables Discrete variables Ordinal (Ranked) variables Continuous (Dimensional) variables Ratio variablesRatio variables

70 Ratio variables A continuous scale that has a true zero point

71 Types of Variation Nominal variables Dichotomous (Binary) variables Discrete variables Ordinal (Ranked) variables Continuous (Dimensional) variables Ratio variables Risks and Proportions as variablesRisks and Proportions as variables

72 Risks and Proportions as variables Variables created by the ratio of discrete counts in the numerator to counts in the denominator.

73 Table Shell Title Box Head Stub Cell Note Source What are the data? Who? Where are the data? When? Captions or column headings Row captions “The intersection of a column and a row” Explanation References

74 Charts Bar: One or more variables Grouped Bar: From tables w/two or three variables Stacked Bar: A total category w/frequencies within Pie: Percentages Histograms: Continuous data Frequency polygons: Continuous data Line Graphs: Time trends/survival curves Scatter diagrams: two continuous variables

75 Bar Chart

76 Grouped Bar

77 Stacked Bar

78 PIE

79 Histogram 543210543210 5.0 6.0 7.0 8.0 9.0

80 Frequency Polygon 543210543210 5.0 6.0 7.0 8.0 9.0

81 Line Graphs 1997 1998 1999 2000 2001 2002 11000 10000 9000 8000 7000 6000

82 Scatter Diagrams X Y Height Weight 72 71 70 69 68 67 66 65 64 63 62 61 60 100 110 120 130 140 150 160 170 180 190 200 210


Download ppt "Normal Distribution And Sampling Dr. Burton Graduate school approach to problem solving."

Similar presentations


Ads by Google