Download presentation

Presentation is loading. Please wait.

Published byJayda Collyer Modified over 2 years ago

1
INSTRUCTIONS CLICK LEFT SIDE OF MOUSE TO GO FORWARD (or PageDown key) CLICK RIGHT SIDE OF MOUSE TO GO BACK (or PageUp key) Press ESC key (top left of keyboard) to QUIT at any time A CAL COMPANION TO EXCEL PRINT-OUTS MP110 SKILLS FOR LIFE SCIENCES Lecture 1 : Descriptive Statistics These notes are the intellectual property of Dr M E Jakobson in support of his lectures and are solely for bona fide use for study by students registered on courses at UeL. NO other use is permitted without prior permission.

2
MP110 TWO REVISION LECTURES ON STATISTICS Dr Mike Jakobson Lecture 1 : INITIAL EXCEL DATA DESCRIPTION before COMPARING SAMPLES Lecture 2 : EXCEL t TESTS INTERPRETATION when COMPARING SAMPLES

3
Look back at statistics covered so far on MP110 Look forward by using examples of data analysis done by Level 2 students in practicals in PP249 Physiological Function & Dysfunction PP250 Physiological Regulation

4
HANDOUT in YOUR MP110 WORKBOOK Also files on INTRANET DEPARTMENT MENU Directory : Skills for Life Sciences MPA100 Semester B data handling MPA110_MJSTATS FILE ONE_QUESTIONS.doc MPA110_MJSTATS FILE TWO_ANSWERS.doc MPA110_MJSTATS LECTURE 1.pps MPA110_MJSTATS LECTURE 2.pps

5
also see.. MP110 WebCT 10 SELF-TEST MCQ on your EXCEL print-outs

6
Lecture 1 : INITIAL DATA DESCRIPTION before COMPARING SAMPLES OUTCOMES : you should be better able to distinguish between correct and incorrect statements about EXCEL Descriptive Statistics

7
S.D., variance & S.E.M. 95% reference ranges & 95% C.I. normal curve, kurtosis and skew STANDARD DEVIATION STANDARD ERROR OF THE MEAN 95% CONFIDENCE INTERVAL around the MEAN SHAPE OF DATA DISTRIBUTION around the MEAN mean, median, sum, count etc.

8
TABLE 1 on the handout in your workbook Breath-holding after a NORMAL inhalation of air (to be compared later in Table 3 with air inhaled from a Douglas Bag) Questions Q1 to Q5 in this session on Descriptive Statistics

9
Q1Which of the following statements about the Mean in Table 1 is APPROPRIATE ? A The mean and standard error should be reported as follows : 31.2 +/- 9.8 B The mean is calculated by multiplying the Sum of data values by the number in the sample. C Statisticians use the symbol as shorthand for the sample mean D The mean in Table 1 is the true mean of the population E As the mean is not equal to the mode and median, the distribution is slightly asymmetrical

10
AThe mean and standard error should be reported as follows : 31.2 +/- 9.8 27% of students failed to read the values correctly 2.1

11
AThe mean and standard error should be reported as follows : 31.2 +/- 9.8 Do NOT truncate it 31.1 would be WRONG Note : 31.19048 rounds up to 31.2

12
X = x / N BThe mean is calculated by multiplying the Sum of data values by the number in the sample 655 / 21 = 31.19047619 x N “X-bar” = sample mean Sum of all x No. in sample No. ‘numero’ divided by

13
X = x / N BThe mean is calculated by multiplying the Sum of data values by the number in the sample x N Arithmetic mean

14
THE ARITHMETIC MEAN is sensitive to OUTLIERS (unusually large or small data values) The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero.

15
CStatisticians use the symbol as shorthand for the sample mean “x-bar” = sample mean X = x / N “ ” = true population mean = the parameter being estimated by sample mean “true population”

16
CStatisticians use the symbol as shorthand for the sample mean “x-bar” = sample mean X = x / N “ ” = true population mean X +/- an error “an uncertainty” =

17
DThe Mean in Table 1 is the true mean of the population sample Sample mean is a “statistic” estimating a fixed parameter (the population mean, ) Parametric Statistics

18
E As the mean is not equal to the mode and median, the distribution is slightly asymmetrical “Mean” = sample mean X = x / N

19
E As the mean is not equal to the mode and median, the distribution is slightly asymmetrical Median = middle value (middle quartile) (50% of sample have higher values) Lower quartile, 75% higher Upper quartile, 25% higher

20
MEDIAN of 28 N=21 : middle = 11th data value ten above ten below Median and quartiles not affected by OUTLIERS

21
MODE also at 28 MEDIAN of 28 MEAN = 31.2 Sample mean pulled over by outlier ‘normal’ curve bell-shaped symmetrical 3 Measures of CENTRAL TENDENCY

22
Q2The Standard Deviation is a measure of the Asum of the deviations of outlier values from the mean Breliability of the sample mean as an estimate of the true mean Cdifference between the highest data value and the lowest data value Dprobability of the sample mean being different from the Null Hypothesis Ethe variability of the sample data around the estimated mean

23
all squared Q2The Standard Deviation is a measure of the Asum of the deviations of outlier values from the mean S.D. = (x -X) 2 ( N-1) Square root of Asum of the deviations of outlier values from the mean all

24
S.D. = (x -X) 2 ( N-1) NOTE : mean squared deviation is got by dividing by (N-1) = (x - ) 2 N ESTIMATES the ‘true’ standard deviation “Sigma” is another PARAMETER of the population being sampled

25
Q2The Standard Deviation is a measure of the Breliability of the sample mean as an estimate of the true mean This is a description of what is given by the STANDARD ERROR of the MEAN

26
Q2The Standard Deviation is a measure of the Cdifference between the highest data value and the lowest data value the difference between the highest and the lowest data values is the ? RANGE

27
Q2The Standard Deviation is a measure of the Dprobability of the sample mean being different from the Null Hypothesis No NULL HYPOTHESIS has been set up : usually requires TWO sets of sample readings. Sample Descriptive statistics ‘describe what happened’ They do not ‘test hypotheses about what might be true’

28
Q2The Standard Deviation is a measure of the Ethe variability of the sample data around the estimated mean It takes all N data values into account S.D. = (x - x) 2 ( N-1) VARIANCE = (S.D.) 2 (x - x) 2 ( N-1)

29
THE STANDARD DEVIATION What does do ? It tells you how variable your data items are within your sample The smaller the S.D. the more bunched are your data points around the sample mean X

30
THE STANDARD DEVIATION What does do ? It tells you how variable your data items are within your sample The larger the S.D. the more spread out are your data points around the sample mean X

31
Q3The number of students used in the sample in Table 1 was A28 B14 C43 D21 E655 MODE & MEDIAN MINIMUM RANGE SUM = X COUNT = N (number in sample)

32
Q4Only ONE of the following general statements about statistics of the type presented in Table 1 is TRUE. Which is it ? AThe larger the sample size, the smaller will be the ‘Standard Error’ BBy definition, the mean lies at the mid point of the range CThe 95% Reference Range can be calculated by dividing the ‘Standard Deviation’ by approximately 2 (1.96 if sample size is >30) DAs the Median and Mode are the same, the distribution must be a statistically ‘Normal’ bell-shaped symmetrical distribution EThe larger the ‘Skewness’ value, the more symmetrical the distribution

33
AThe larger the sample size, the smaller will be the ‘Standard Error’ = S.D. N S.E.M. The larger N is, the smaller will be the result of this calculation Standard Error of the Mean

34
AThe larger the sample size, the smaller will be the ‘Standard Error’ = S.D. N S.E.M. Standard Error of the Mean The more observations you take, the more likely that your sample mean is close to the population ‘true’ mean

35
BBy definition, the mean lies at the mid point of the range “x-bar” = sample mean X = x / N RANGE = MAXIMUM VALUE - MINIMUM VALUE MEAN is NOT estimated by the two most unreliable values, which may be outliers Mean is the midpoint of the distribution if and only if the distribution is symmetrical (e.g. Normal bell-shaped), where mean= mode = median

36
CThe 95% Reference Range can be calculated by dividing the ‘Standard Deviation’ by approximately 2 (1.96 if sample size is >30) multiplying mean +/- (t (N-1) * S.D.)

37
DESCRIPTIVE STATISTICS IN EXCEL The spread of the data : THE STANDARD DEVIATION If data is a RANDOM sample from a population which fits the ‘expected’ ‘Normal’ distribution, then 95% Reference Range can be calculated Mean 31.19048 Standard Deviation 9.841845 Sample Variance= (S.D.) 2 96.8619

38
NORM breath-holding (s) Frequency probability function f(x) mean REFERENCE RANGES the spread of data in an ideal normal distribution 0 102030405060 Expected distribution of data points in a ‘normal’ curve

39
NORM breath-holding (s) Frequency probability function f(x) mean -1 S.D. + 1 S.D. 68% REFERENCE RANGE the spread of data in an ideal normal distribution 0 102030405060 31.12 +/- 9.84 68% of all data points

40
NORM breath-holding (s) Frequency probability function f(x) mean -2 S.D. + 2 S.D. 95% REFERENCE RANGE the spread of data in an ideal normal distribution 0 102030405060 31.12 +/- 19.68 95% of all data points

41
More about this rough factor ‘2’ It can calculated exactly. For a normal distribution (Gaussian curve), the theoretical value is known to be 1.96 1.96 x S.D. includes 95% of the area under the curve (1 S.D. includes 68% of the area) REFERENCE RANGE : the spread of individual data items If the distribution is ‘NORMAL’ bell-shaped, then 95% of sample data lie between MEAN +/- about (2 x S.D.)

42
FOR THE NORM DATA (s) N = 21, Mean = 31.2, S.D. = 9.8 the exact calculation for Reference Range is mean +/- (t (N-1) * S.D.) t value depends on sample size (N). for 20 d.f. t= 2.09 so Lower Limit = 31.2 - (2.09 * 9.8) = 51.7 s Upper Limit = 31.2 + (2.09 * 9.8) = 10.7 s 95% REFERENCE RANGE of data in samples

43
N N-1 t for (n-1) degrees of freedom infinite 1.96 (ideal) 228227 1.97 (i.e. almost ideal) 21 20 2.09 (small sample) 3 2 4.30 (really too small!) 95% REFERENCE RANGE MEAN +/- t x S.D. the ‘true’ distribution stays the same, but rather the ‘wobble’ factor of uncertainty due to small sample size increases.

44
The 95% Reference Range is yet not commonly reported in research literature. Usually the basic statistics are given: NMean +/- S.D 228 3.63 +/- 0.97 leaving the reader to do the calculations if they wish

45
DAs the Median and Mode are the same, the distribution must be a statistically ‘Normal’ bell-shaped symmetrical distribution In a ‘Normal’ distribution, median = mode = mean

46
MODE also at 28 MEDIAN of 28 MEAN = 31.2

47
EThe larger the ‘Skewness’ value, the more symmetrical the distribution No, the larger the skew, the LESS symmetrical

48
Median < mean 0 0.1 0.2 0.3 0.4 0.5 SKEW is POSITIVE or NEGATIVE POSITIVE VALUE right tail longer mean

49
left tail longer 0 0.1 0.2 0.3 0.4 0.5 SKEW is POSITIVE or NEGATIVE NEGATIVE VALUE Median > mean mean

50
MODE also at 28 MEDIAN of 28 MEAN = 31.2 Sample mean pulled over by skewed outlier

51
Kurtosis SIZE expected A positive value = more peaked, more data close to mean than expected 0.1 0.2 0.3 0.4 0.5 f(x)

52
Kurtosis SIZE expected 0.1 0.2 0.3 0.4 0.5 f(x) A negative value = less peaked, more data away from mean than expected

53
A14 and 57 seconds B27.0 and 35.4 seconds C26.9 and 35.3 seconds D4.21 and 31.19 seconds E29.0 and 33.3 seconds Past students A 16% B 23% C 18% D 23% E 20% Q5The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between:

54
A14 and 57 seconds Past students A 16% Q5The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: Mistakenly used MINIMUM & MAXIMUM

55
B27.0 and 35.4 seconds Past students B 23% Q5The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: Mean +/- 95% Confidence Level 31.19048 - 4.209343 = 26.981137 31.19048 + 4.209343 = 35.399823

56
C26.9 and 35.3 seconds Past students C 18% Q5The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: 31-19048 - 4.209343 = 26.981137 31-19048 + 4.209343 = 35.399823 Mean +/- 95% Confidence Level WRONGLY TRUNCATED rather than ROUNDING UP & DOWN

57
D4.21 and 31.19 seconds Past students D 23% Q5The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: Mistakenly fail to do calculation: mean +/- 95% Confidence Level

58
E29.0 and 33.3 seconds Past students E 20% Q5The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: Mistakenly have calculated MEAN +/- S.E.M. this is not a 95% C.I. …but what % 68% C.I.

59
NORM breath-holding (s) Frequency probability function f(x) mean -1 S.E.M + 1 S.E.M. 68% CONFIDENCE INTERVAL the spread of data in an ideal normal distribution 0 0.01 0.02 0.03 0.04 29.033.3 31.12 +/- 2.15 (S.E.M.) 68% chance of including

60
NORM breath-holding (s) Frequency probability function f(x) mean -2.1 S.E.M + 2.1 S.E.M. 95% CONFIDENCE INTERVAL the spread of data in an ideal normal distribution 0 0.01 0.02 0.03 0.04 27.035.4 31.12 +/- 4.21 95% chance of including

61
Lecture 1 : INITIAL DATA DESCRIPTION before COMPARING SAMPLES OUTCOMES : you should NOW be better able to distinguish between correct and incorrect statements about EXCEL Descriptive Statistics

62
S.D., variance & S.E.M. 95% reference ranges & 95% C.I. normal curve, kurtosis and skew mean, median, sum, count etc.

63
MPA110 TWO REVISION LECTURES ON STATISTICS Dr Mike Jakobson Lecture 1 : INITIAL EXCEL DATA DESCRIPTION before COMPARING SAMPLES Lecture 2 : EXCEL t TESTS INTERPRETATION when COMPARING SAMPLES

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google