# A CAL COMPANION TO EXCEL PRINT-OUTS MP110 SKILLS FOR LIFE SCIENCES

## Presentation on theme: "A CAL COMPANION TO EXCEL PRINT-OUTS MP110 SKILLS FOR LIFE SCIENCES"— Presentation transcript:

A CAL COMPANION TO EXCEL PRINT-OUTS MP110 SKILLS FOR LIFE SCIENCES
Lecture 1 : Descriptive Statistics INSTRUCTIONS CLICK LEFT SIDE OF MOUSE TO GO FORWARD (or PageDown key) CLICK RIGHT SIDE OF MOUSE TO GO BACK (or PageUp key) Press ESC key (top left of keyboard) to QUIT at any time These notes are the intellectual property of Dr M E Jakobson in support of his lectures and are solely for bona fide use for study by students registered on courses at UeL. NO other use is permitted without prior permission.

TWO REVISION LECTURES ON STATISTICS
MP110 TWO REVISION LECTURES ON STATISTICS Dr Mike Jakobson Lecture 1 : INITIAL EXCEL DATA DESCRIPTION before COMPARING SAMPLES Lecture 2 : EXCEL t TESTS INTERPRETATION when COMPARING SAMPLES

Look back at statistics covered so far on MP110
Look forward by using examples of data analysis done by Level 2 students in practicals in PP249 Physiological Function & Dysfunction PP250 Physiological Regulation

Directory : Skills for Life Sciences MPA100 Semester B data handling
HANDOUT in YOUR MP110 WORKBOOK Also files on INTRANET DEPARTMENT MENU Directory : Skills for Life Sciences MPA100 Semester B data handling MPA110_MJSTATS FILE ONE_QUESTIONS.doc MPA110_MJSTATS FILE TWO_ANSWERS.doc MPA110_MJSTATS LECTURE 1.pps MPA110_MJSTATS LECTURE 2.pps

also see.. MP110 WebCT 10 SELF-TEST MCQ on your EXCEL print-outs

you should be better able to distinguish between
Lecture 1 : INITIAL DATA DESCRIPTION before COMPARING SAMPLES OUTCOMES : you should be better able to distinguish between correct and incorrect statements about EXCEL Descriptive Statistics

EXCEL Descriptive Statistics
mean, median, sum, count etc. S.D. , variance & S.E.M. STANDARD DEVIATION STANDARD ERROR OF THE MEAN 95% reference ranges & 95% C.I. 95% CONFIDENCE INTERVAL around the MEAN normal curve, kurtosis and skew SHAPE OF DATA DISTRIBUTION around the MEAN

Descriptive Statistics
TABLE 1 on the handout in your workbook Breath-holding after a NORMAL inhalation of air (to be compared later in Table 3 with air inhaled from a Douglas Bag) Questions Q1 to Q5 in this session on Descriptive Statistics

Q1 Which of the following statements about
the Mean in Table 1 is APPROPRIATE ? A The mean and standard error should be reported as follows : /- 9.8 B The mean is calculated by multiplying the Sum of data values by the number in the sample. C Statisticians use the symbol m as shorthand for the sample mean D The mean in Table 1 is the true mean of the population E As the mean is not equal to the mode and median, the distribution is slightly asymmetrical

A The mean and standard error should be reported as follows : /- 9.8 2.1 27% of students failed to read the values correctly

Note : 31.19048 rounds up to 31.2 Do NOT truncate it
A The mean and standard error should be reported as follows : /- 9.8 Note : rounds up to 31.2 Do NOT truncate it 31.1 would be WRONG

X = S x / N 655 / 21 = 31.19047619 No. ‘numero’ “X-bar” = sample mean
divided by B The mean is calculated by multiplying the Sum of data values by the number in the sample S x N 655 / 21 = No. ‘numero’ X = S x / N “X-bar” = sample mean Sum of all x No. in sample

X = S x / N Arithmetic mean B The mean is calculated by multiplying
the Sum of data values by the number in the sample S x N Arithmetic mean X = S x / N

is sensitive to OUTLIERS (unusually large or small data values)
THE ARITHMETIC MEAN is sensitive to OUTLIERS (unusually large or small data values) The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero.

X = S x / N m = the parameter being estimated by sample mean
C Statisticians use the symbol m as shorthand for the sample mean “true population” “x-bar” = sample mean X = S x / N “m” = true population mean m = the parameter being estimated by sample mean

X = S x / N X +/- an error “an uncertainty” m =
C Statisticians use the symbol m as shorthand for the sample mean “x-bar” = sample mean X = S x / N “m” = true population mean X +/- an error “an uncertainty” m =

Sample mean is a “statistic” estimating a fixed parameter
D The Mean in Table 1 is the true mean of the population sample Sample mean is a “statistic” estimating a fixed parameter (the population mean, m) Parametric Statistics

X = S x / N “Mean” = sample mean E As the mean is not equal to the
mode and median, the distribution is slightly asymmetrical “Mean” = sample mean X = S x / N

(50% of sample have higher values)
E As the mean is not equal to the mode and median, the distribution is slightly asymmetrical Median = middle value (middle quartile) (50% of sample have higher values) Lower quartile, 75% higher Upper quartile, 25% higher

N=21 : Median middle = and quartiles 11th data value not affected by
ten above ten below Median and quartiles not affected by OUTLIERS MEDIAN of 28

3 Measures of CENTRAL TENDENCY MODE also MEAN = 31.2 MEDIAN at 28
Sample mean pulled over by outlier ‘normal’ curve bell-shaped symmetrical MEAN = 31.2 3 Measures of CENTRAL TENDENCY MEDIAN of 28

Q2 The Standard Deviation is a measure of the
A sum of the deviations of outlier values from the mean B reliability of the sample mean as an estimate of the true mean C difference between the highest data value and the lowest data value D probability of the sample mean being different from the Null Hypothesis E the variability of the sample data around the estimated mean

S.D. = all (x -X)2 all (N-1)
Q2 The Standard Deviation is a measure of the A sum of the deviations of outlier values from the mean all S.D. = (x -X)2 (N-1) Square root of A sum of the deviations of outlier values from the mean all squared

s = S.D. = N NOTE : mean squared deviation is got by dividing by (N-1)
(x -X)2 (N-1) s is another PARAMETER of the population being sampled ESTIMATES the ‘true’ standard deviation s = (x -m)2 N “Sigma”

This is a description of STANDARD ERROR of the MEAN
Q2 The Standard Deviation is a measure of the This is a description of what is given by the STANDARD ERROR of the MEAN B reliability of the sample mean as an estimate of the true mean

the difference between the highest and the lowest data values
Q2 The Standard Deviation is a measure of the C difference between the highest data value and the lowest data value the difference between the highest and the lowest data values is the ? RANGE

Q2 The Standard Deviation is a measure of the
No NULL HYPOTHESIS has been set up : usually requires TWO sets of sample readings. Sample Descriptive statistics ‘describe what happened’ They do not ‘test hypotheses about what might be true’ D probability of the sample mean being different from the Null Hypothesis

It takes all N data values into account
Q2 The Standard Deviation is a measure of the E the variability of the sample data around the estimated mean It takes all N data values into account S.D. = (x - x)2 (N-1) VARIANCE = (S.D.)2 (x - x)2 (N-1)

The smaller the S.D. the more bunched are your data points around
THE STANDARD DEVIATION What does do ? It tells you how variable your data items are within your sample The smaller the S.D. the more bunched are your data points around the sample mean X

The larger the S.D. the more spread out are your data points around
THE STANDARD DEVIATION What does do ? It tells you how variable your data items are within your sample The larger the S.D. the more spread out are your data points around the sample mean X

COUNT = N (number in sample)
Q3 The number of students used in the sample in Table 1 was A 28 B 14 C 43 D 21 E 655 MODE & MEDIAN MINIMUM RANGE COUNT = N (number in sample) SUM = S X

Q4 Only ONE of the following general statements
about statistics of the type presented in Table 1 is TRUE. Which is it ? A The larger the sample size, the smaller will be the ‘Standard Error’ B By definition, the mean lies at the mid point of the range C The 95% Reference Range can be calculated by dividing the ‘Standard Deviation’ by approximately 2 (1.96 if sample size is >30) D As the Median and Mode are the same, the distribution must be a statistically ‘Normal’ bell-shaped symmetrical distribution E The larger the ‘Skewness’ value, the more symmetrical the distribution

result of this calculation
A The larger the sample size, the smaller will be the ‘Standard Error’ = S.D. N S.E.M. Standard Error of the Mean The larger N is, the smaller will be the result of this calculation

S.D. S.E.M. = N Standard Error of the Mean
A The larger the sample size, the smaller will be the ‘Standard Error’ = S.D. N S.E.M. Standard Error of the Mean The more observations you take, the more likely that your sample mean is close to the population ‘true’ mean m

by the two most unreliable values, which may be outliers
B By definition, the mean lies at the mid point of the range “x-bar” = sample mean X = S x / N Mean is the midpoint of the distribution if and only if the distribution is symmetrical (e.g. Normal bell-shaped), where mean= mode = median MEAN is NOT estimated by the two most unreliable values, which may be outliers RANGE = MAXIMUM VALUE - MINIMUM VALUE

mean +/- (t(N-1) * S.D.) multiplying
C The 95% Reference Range can be calculated by dividing the ‘Standard Deviation’ by approximately 2 (1.96 if sample size is >30) multiplying mean +/- (t(N-1) * S.D.)

DESCRIPTIVE STATISTICS IN EXCEL THE STANDARD DEVIATION
The spread of the data : THE STANDARD DEVIATION Mean Standard Deviation Sample Variance= (S.D.) If data is a RANDOM sample from a population which fits the ‘expected’ ‘Normal’ distribution, then 95% Reference Range can be calculated

the spread of data in an ideal normal distribution
REFERENCE RANGES the spread of data in an ideal normal distribution mean Frequency probability function f(x) Expected distribution of data points in a ‘normal’ curve 10 20 30 40 50 60 NORM breath-holding (s)

the spread of data in an ideal normal distribution
68% REFERENCE RANGE the spread of data in an ideal normal distribution mean -1 S.D S.D. Frequency probability function f(x) 31.12 +/- 9.84 68% of all data points 10 20 30 40 50 60 NORM breath-holding (s)

the spread of data in an ideal normal distribution
95% REFERENCE RANGE the spread of data in an ideal normal distribution mean -2 S.D S.D. Frequency probability function f(x) 31.12 +/- 19.68 95% of all data points 10 20 30 40 50 60 NORM breath-holding (s)

REFERENCE RANGE : the spread of individual data items If the distribution is ‘NORMAL’ bell-shaped, then 95% of sample data lie between MEAN +/- about (2 x S.D.) More about this rough factor ‘2’ It can calculated exactly. For a normal distribution (Gaussian curve), the theoretical value is known to be 1.96 1.96 x S.D. includes 95% of the area under the curve (1 S.D. includes 68% of the area)

the exact calculation for Reference Range is
of data in samples FOR THE NORM DATA (s) N = 21, Mean = 31.2, S.D. = 9.8 the exact calculation for Reference Range is mean +/- (t(N-1) * S.D.) t value depends on sample size (N). for 20 d.f. t= 2.09 so Lower Limit = (2.09 * 9.8) = 51.7 s Upper Limit = (2.09 * 9.8) = 10.7 s

95% REFERENCE RANGE MEAN +/- t x S.D.
N N t for (n-1) degrees of freedom infinite (ideal) (i.e. almost ideal) (small sample) (really too small!) the ‘true’ distribution stays the same, but rather the ‘wobble’ factor of uncertainty due to small sample size increases.

reported in research literature.
The 95% Reference Range is yet not commonly reported in research literature. Usually the basic statistics are given: N Mean +/- S.D /- 0.97 leaving the reader to do the calculations if they wish

In a ‘Normal’ distribution, median = mode = mean
D As the Median and Mode are the same, the distribution must be a statistically ‘Normal’ bell-shaped symmetrical distribution In a ‘Normal’ distribution, median = mode = mean

MODE also at 28 MEAN = 31.2 MEDIAN of 28

No, the larger the skew, the LESS symmetrical
E The larger the ‘Skewness’ value, the more symmetrical the distribution No, the larger the skew, the LESS symmetrical

SKEW is POSITIVE or NEGATIVE
VALUE Median < mean right tail longer 0.1 0.2 0.3 0.4 0.5 mean

SKEW is POSITIVE or NEGATIVE
VALUE left tail longer Median > mean 0.1 0.2 0.3 0.4 0.5 mean

MODE also MEAN = 31.2 MEDIAN at 28 Sample mean pulled over by skewed
outlier MEAN = 31.2 MEDIAN of 28

Kurtosis A positive value = more peaked, more data close to mean than
expected 0.1 0.2 0.3 0.4 0.5 f(x) expected SIZE

Kurtosis A negative value = less peaked, more data away from mean than
expected SIZE expected 0.1 0.2 0.3 0.4 0.5 f(x)

Q5 The true mean of the population described by the sample in Table 1
can be reported with 95% confidence as being between: Past students A 16% B 23% C 18% D 23% E 20% A 14 and 57 seconds B 27.0 and 35.4 seconds C 26.9 and 35.3 seconds D 4.21 and seconds E 29.0 and 33.3 seconds

Mistakenly used MINIMUM & MAXIMUM
Q5 The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: Past students A 16% A 14 and 57 seconds Mistakenly used MINIMUM & MAXIMUM

Mean +/- 95% Confidence Level
Q5 The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: Past students B 23% B 27.0 and 35.4 seconds = Mean +/- 95% Confidence Level =

Mean +/- 95% Confidence Level
Q5 The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: Past students C 18% C 26.9 and 35.3 seconds Mean +/- 95% Confidence Level WRONGLY TRUNCATED rather than ROUNDING UP & DOWN = =

D 4.21 and 31.19 seconds Q5 The true mean of the population
described by the sample in Table 1 can be reported with 95% confidence as being between: Past students D 23% D 4.21 and seconds Mistakenly fail to do calculation: mean +/- 95% Confidence Level

Mistakenly have calculated MEAN +/- S.E.M.
Q5 The true mean of the population described by the sample in Table 1 can be reported with 95% confidence as being between: Past students E 20% E 29.0 and 33.3 seconds Mistakenly have calculated MEAN +/- S.E.M. this is not a 95% C.I. …but what % 68% C.I.

the spread of data in an ideal normal distribution
68% CONFIDENCE INTERVAL the spread of data in an ideal normal distribution mean -1 S.E.M S.E.M. Frequency probability function f(x) 31.12 +/- 2.15 (S.E.M.) 0.04 68% chance of including m 0.03 0.02 0.01 29.0 33.3 NORM breath-holding (s)

the spread of data in an ideal normal distribution
95% CONFIDENCE INTERVAL the spread of data in an ideal normal distribution mean -2.1 S.E.M S.E.M. Frequency probability function f(x) 31.12 +/- 4.21 0.04 95% chance of including m 0.03 0.02 0.01 27.0 35.4 NORM breath-holding (s)

you should NOW be better able to distinguish between
Lecture 1 : INITIAL DATA DESCRIPTION before COMPARING SAMPLES OUTCOMES : you should NOW be better able to distinguish between correct and incorrect statements about EXCEL Descriptive Statistics

EXCEL Descriptive Statistics
mean, median, sum, count etc. S.D. , variance & S.E.M. 95% reference ranges & 95% C.I. normal curve, kurtosis and skew

TWO REVISION LECTURES ON STATISTICS
MPA110 TWO REVISION LECTURES ON STATISTICS Dr Mike Jakobson Lecture 1 : INITIAL EXCEL DATA DESCRIPTION before COMPARING SAMPLES Lecture 2 : EXCEL t TESTS INTERPRETATION when COMPARING SAMPLES

Similar presentations