Download presentation
Presentation is loading. Please wait.
1
Introduction to Biostatistics
Benjamin Kamala (MD, MPH)
2
Literature Lecture notice in Biostatistics: Kazura, Makwaya, Masanja, Mpembeni Medical statistics: Betty Kirkwood Scientific Calculator Statistical Tables: Normal Distribution Tables Chi-square tables T-test tables
3
Definitions Statistics: science of collecting, organizing, presenting, analyzing and interpreting numerical data to assist decision making Statistics: data to the items themselves Biostatistics: when applied to biological sciences and medicine Medical statistics: branch of Biostatistics dealing with medical data
4
Terms Data: The raw material of statistics. Data generally consists of numbers of measurement or counting of a population sample. E.g. record of patient temperature Population: largest collection of entities in which we have an interest. e.g. MD1 student at HKMU Sample: Part of a population Parameter: A descriptive measure computed from the data of a population
5
Commonly Used Symbols in Statistics
μ: Population mean σ: Population standard deviation x̄: Sample mean s: Sample standard deviation x̃: Median Σ : sum of
6
Need for Biostatistics
Consider, for example, the following general questions: What is the normal blood pressure in the human body? What is the amount of hemoglobin in blood? Some specific questions to medical specialists could be as follows: Mr. Physician, what are the limits of error in your blood pressure measurements? Mr. Radiologist, what is the probability that your colleague’s reports on these X – ray films would agree with yours? Mr. Pathologist, what proportions of your diagnoses are correct at post mortem?
7
Application of Biostatistics Methods
8
Application of Biostatistics Methods
Official health statistics: cases of a disease over time Epidemiology, e.g. association exposure and outcome Clinical studies, e.g. comparison of treatments in clinical trials Microbiology, e.g. growth pattern Laboratory studies e.g. dose-response studies Health service administration, e.g. with limited resources, there may be need to prioritize target groups for necessary interventions
9
Types of statistics Descriptive: method of organizing and presenting data in informative ways Inferential: decision making estimation, prediction or generalization on population on basis of sample data
10
Descriptive statistics
Variable: A term for a characteristic that is different in different members of a population or sample, such as height This measurement is not constant but vary hence: variable
11
Example Height (cm) Weight (kg) Parity Outcome of disease
Variable Possible Values Height (cm) Weight (kg) Parity Outcome of disease Marital status Age (years) Hemoglobin (g/dl) Number of AIDS cases 158, 169.3, 170, etc. 10.2, 50, 69.4, 84, etc. 0, 1, 6, 8, 10, etc. Recovery, chronic illness, death Single, married, widowed, separated, cohabiting 1, 5, 30, 36, etc. 8.9, 14.2, 12.7, etc. 278, 301, 313, 350, etc.
12
Types of Variables Qualitative variables: do not take numerical values
Sometimes referred as categorical variables Outcome of disease (recovery, chronic illness, death) Hair color (black, blonde) Area of residence: Kinondoni, Ilala and Temeke
13
Quantitative variables
Take numerical values Age (years): 10, 19, 45, 60 Height (cms):140, 50.6, 200 Number of years in school: 0, 1, 2, 3, 4, 5, 6 Hemoglobin (g/dl): 16.3, 8.9, 12.7
14
Types of quantitative variables
Continuous variables take any value within meaningful extremes, for example: Height (cm): 159, 25, Weight (kg): 71.12, 80.56 Discrete variables take only fixed values, in most cases whole numbers, for example: Parity: 0, 1, 2, 3, 4, 5, 6, 10 Age: 5, 19, 45, 90 Hospital admissions: 1, 2, 3, 4, 5, 9 Number of infant malaria cases: 100, 10000, 34278
15
Levels of measurement Nominal Measurement
The nominal scale classifies persons or things based on a qualitative assessment of the characteristic being assessed. It neither includes information on quantity or amount nor does it indicate ‘more than’ or ‘less than’ E.g. gender, religion, setting Mutually exclusive and exhaustive
16
Ordinal Measurement The ordinal scale also classifies persons or things based on the characteristic being assessed but does indicate ‘more than’ or ‘less than’. being poor, average, good, or excellent However, it does not indicate how much better an excellent performance is compared to a good one.
17
Interval Same as ordinal but but the interval scale indicates how much more than or less than Do not have a true zero E.g. temperature
18
Ratio Measurement The ratio scale includes all the characteristics of the interval scale but does indicate a true zero point E.g. height
19
4. A condition or characteristic that can take on different values or categories is called
___. a. a constant b. a variable c. a cause-and-effect relationship d. a descriptive relationship 5. A variable that is presumed to cause a change in another variable is called a(n): a. categorical variable b. dependent variable c. independent variable d. intervening variable
20
20. Which of the following can best be described as a categorical variable?
a. age b. annual income c. grade point average d. religion 21. In research, something that does not "vary" is called a ___________. a. variable b. method c. constant d. control group
21
Presentation of quantitative data: Frequency distributions are also used to summarize quantitative data. Figure 1: Distribution of Number of Counts of Trypanosome in the Blood of a Rat’s Tail Count Frequency Relative Frequency Cumulative Frequency 4 3.1 1 27 21.1 24.2 2 45.3 3 20 15.6 60.9 16 12.5 73.4 5 17 13.3 86.7 6 12 9.4 96.1 7 1.6 97.7 8 0.8 98.5 9 100 Total 128 100.0
22
Frequency of No. of Membranes (f)
Grouped data: Frequency Distribution of No. of Lesions caused by Smallpox Virus in an Egg Membrane No. of Lesions Frequency of No. of Membranes (f) Class Mid- Point(x) (fx) 0 - 1 5 10 - 6 15 90 20 - 14 25 350 30 - 35 490 40 - 17 45 765 50 - 8 55 440 60 - 9 65 585 70 - 3 75 225 80 - 85 510 90 - 95 100 - 105 110 - 115 Total 80 3670 Note that the dash symbol (-) means ‘up to but not including’ the next tabulated value. (That is, according to the table in Figure 2, 10- means 10 is the lower limit while 19 is the upper limit. The value 15 is therefore the midpoint for the class interval 10.)
24
Line Graphs
25
DUME FLEXIP LADY PEPETA COUNT ONE 12 9 3 TWO 14 4 THREE 2 FOUR FIVE 1
26
Measures of Location Central tendency Spread Mean Median Mode Range
Variance Standard deviation
27
Mean Conventional avarage
The sum of observations devided by the number of observation
28
Worked Example The mean height is calculated by adding the heights for the ten men and dividing the sum by 10. Arithmetic mean = 10 x̄ =1710 = 171
29
frequency distributions
The arithmetic mean can also be calculated from frequency distributions x̄ = ∑ xi fi ∑ fi Example:
30
Grouped data The class midpoint should be used when calculating the mean Example:
31
The following table shows the hourly wage rates of eight sampled construction workers.
1 2 3 4 5 6 7 8 Hourly wage rate ( ) $35 38 46 60 65 69 72 78 Calculate the mean, median and mode
32
The following table shows the daily wages of a random sample of Health workers. Calculate its mean, median and mode. Daily Wages ($) Number of Workers 5 15 25 30 18 7 Total 100
33
Daily Wages ($) Number of Workers Class Mark 5 299.5 1,497.5 15 499.5 7,492.5 25 699.5 17,489.5 30 899.5 26,985.5 18 1,099.5 19,791.0 7 1,299.5 9,096.5 Total 100 82,350.0
34
Mean Advantages All values in the distribution are used in its calculation, so it can be regarded as more representative than the other two measures. Its method of calculation is simple and most people understand the meaning of its result. Its result can easily be used in further analysis. Disadvantages Its result can be easily distorted by extreme values. As such, its result may be rather lower or higher than the bulk of the values and becomes unrepresentative. E.g. duration of stay in hospital (in days): 5, 5, 5, 7, 10, 20, 102. (Mean = 22). This does not reflect the mean duration of stay In case of open end classes, mean can be calculated only if their class marks are determined. If such classes contain a large proportion of the values, then the mean may be subjected to substantial error.
35
median The median is the 50th percentile i.e. Middle number
Generally, when ‘n’ (number of observations) is odd the median is: 1/2 (n+1)th observations. But when ‘n’ is even, there is no middle observation, and the median is the mean of the two middle observations
36
Less efficient than the mean because it takes no account of the magnitude of most of the observations. The median is much less amenable than the mean to mathematical treatments so it is less used in more elaborate statistical techniques. Good measure if the data are distributed asymmetrically
37
Mode Number that is found most frequently in a set of numbers
8, 7, 8, 8, 9, 6, 5, 6, 4, 6, 7 Two modes 6 and 8 5, 4, 9, 7, 6, 3, 8 No mode
38
Mode Advantages Disadvantages
Its result will not be affected by extreme values and open end classes. If data are not grouped, it can be determined easily. Disadvantages It has to be supplemented by other statistics. It is difficult to obtain an accurate estimate of the mode if the values are classified into a frequency distribution.
39
How to select a suitable measure
Always select the mean whenever there is no special reason for choosing the other two measures. Select the median is the distribution consists of substantial amount of extreme large or small values. Select the mode if integral result is preferred as in cases the data are in ordinal scales.
40
Variability measure The Range
difference between the maximum value and the minimum value Seldom used in statistical analysis: two extreme values, faulty
41
Variance Squared amount of spread or variability
The symbol s2 is used when we are referring to the variance of a sample and the symbol σ2 when we are referring to the variance of a population. Example:
42
Similar formular
43
∑(xi – x) is always zero That is why it is squared For population
44
Standard deviation Calculate the standard deviation.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.