Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Biostatistics

Similar presentations


Presentation on theme: "Introduction to Biostatistics"— Presentation transcript:

1 Introduction to Biostatistics
Benjamin Kamala (MD, MPH)

2 Literature Lecture notice in Biostatistics: Kazura, Makwaya, Masanja, Mpembeni Medical statistics: Betty Kirkwood Scientific Calculator Statistical Tables: Normal Distribution Tables Chi-square tables T-test tables

3 Definitions Statistics: science of collecting, organizing, presenting, analyzing and interpreting numerical data to assist decision making Statistics: data to the items themselves Biostatistics: when applied to biological sciences and medicine Medical statistics: branch of Biostatistics dealing with medical data

4 Terms Data: The raw material of statistics. Data generally consists of numbers of measurement or counting of a population sample. E.g. record of patient temperature Population: largest collection of entities in which we have an interest. e.g. MD1 student at HKMU Sample: Part of a population Parameter: A descriptive measure computed from the data of a population

5 Commonly Used Symbols in Statistics
μ: Population mean σ: Population standard deviation x̄: Sample mean s: Sample standard deviation x̃: Median  Σ : sum of

6 Need for Biostatistics
Consider, for example, the following general questions: What is the normal blood pressure in the human body? What is the amount of hemoglobin in blood? Some specific questions to medical specialists could be as follows: Mr. Physician, what are the limits of error in your blood pressure measurements? Mr. Radiologist, what is the probability that your colleague’s reports on these X – ray films would agree with yours? Mr. Pathologist, what proportions of your diagnoses are correct at post mortem?

7 Application of Biostatistics Methods

8 Application of Biostatistics Methods
Official health statistics: cases of a disease over time Epidemiology, e.g. association exposure and outcome Clinical studies, e.g. comparison of treatments in clinical trials Microbiology, e.g. growth pattern Laboratory studies e.g. dose-response studies Health service administration, e.g. with limited resources, there may be need to prioritize target groups for necessary interventions

9 Types of statistics Descriptive: method of organizing and presenting data in informative ways Inferential: decision making estimation, prediction or generalization on population on basis of sample data

10 Descriptive statistics
Variable: A term for a characteristic that is different in different members of a population or sample, such as height This measurement is not constant but vary hence: variable

11 Example Height (cm) Weight (kg) Parity Outcome of disease
Variable Possible Values Height (cm) Weight (kg) Parity Outcome of disease Marital status Age (years) Hemoglobin (g/dl) Number of AIDS cases 158, 169.3, 170, etc. 10.2, 50, 69.4, 84, etc. 0, 1, 6, 8, 10, etc. Recovery, chronic illness, death Single, married, widowed, separated, cohabiting 1, 5, 30, 36, etc. 8.9, 14.2, 12.7, etc. 278, 301, 313, 350, etc.

12 Types of Variables Qualitative variables: do not take numerical values
Sometimes referred as categorical variables Outcome of disease (recovery, chronic illness, death) Hair color (black, blonde) Area of residence: Kinondoni, Ilala and Temeke

13 Quantitative variables
Take numerical values Age (years): 10, 19, 45, 60 Height (cms):140, 50.6, 200 Number of years in school: 0, 1, 2, 3, 4, 5, 6 Hemoglobin (g/dl): 16.3, 8.9, 12.7

14 Types of quantitative variables
Continuous variables take any value within meaningful extremes, for example: Height (cm): 159, 25, Weight (kg): 71.12, 80.56 Discrete variables take only fixed values, in most cases whole numbers, for example: Parity: 0, 1, 2, 3, 4, 5, 6, 10 Age: 5, 19, 45, 90 Hospital admissions: 1, 2, 3, 4, 5, 9 Number of infant malaria cases: 100, 10000, 34278

15 Levels of measurement Nominal Measurement
The nominal scale classifies persons or things based on a qualitative assessment of the characteristic being assessed. It neither includes information on quantity or amount nor does it indicate ‘more than’ or ‘less than’ E.g. gender, religion, setting Mutually exclusive and exhaustive

16 Ordinal Measurement The ordinal scale also classifies persons or things based on the characteristic being assessed but does indicate ‘more than’ or ‘less than’. being poor, average, good, or excellent However, it does not indicate how much better an excellent performance is compared to a good one.

17 Interval Same as ordinal but but the interval scale indicates how much more than or less than Do not have a true zero E.g. temperature

18 Ratio Measurement The ratio scale includes all the characteristics of the interval scale but does indicate a true zero point E.g. height

19 4. A condition or characteristic that can take on different values or categories is called
___. a. a constant b. a variable c. a cause-and-effect relationship d. a descriptive relationship 5. A variable that is presumed to cause a change in another variable is called a(n): a. categorical variable b. dependent variable c. independent variable d. intervening variable

20 20. Which of the following can best be described as a categorical variable?
a. age b. annual income c. grade point average d. religion 21. In research, something that does not "vary" is called a ___________. a. variable b. method c. constant d. control group

21 Presentation of quantitative data: Frequency distributions are also used to summarize quantitative data. Figure 1: Distribution of Number of Counts of Trypanosome in the Blood of a Rat’s Tail Count Frequency Relative Frequency Cumulative Frequency 4 3.1 1 27 21.1 24.2 2 45.3 3 20 15.6 60.9 16 12.5 73.4 5 17 13.3 86.7 6 12 9.4 96.1 7 1.6 97.7 8 0.8 98.5 9 100 Total 128 100.0

22 Frequency of No. of Membranes (f)
Grouped data: Frequency Distribution of No. of Lesions caused by Smallpox Virus in an Egg Membrane No. of Lesions Frequency of No. of Membranes (f) Class Mid- Point(x) (fx) 0 - 1 5 10 - 6 15 90 20 - 14 25 350 30 - 35 490 40 - 17 45 765 50 - 8 55 440 60 - 9 65 585 70 - 3 75 225 80 - 85 510 90 - 95 100 - 105 110 - 115 Total 80 3670 Note that the dash symbol (-) means ‘up to but not including’ the next tabulated value. (That is, according to the table in Figure 2, 10- means 10 is the lower limit while 19 is the upper limit. The value 15 is therefore the midpoint for the class interval 10.)

23

24 Line Graphs

25 DUME FLEXIP LADY PEPETA COUNT ONE 12 9 3 TWO 14 4 THREE 2 FOUR FIVE 1

26 Measures of Location Central tendency Spread Mean Median Mode Range
Variance Standard deviation

27 Mean Conventional avarage
The sum of observations devided by the number of observation

28 Worked Example The mean height is calculated by adding the heights for the ten men and dividing the sum by 10. Arithmetic mean = 10 x̄ =1710 = 171

29 frequency distributions
The arithmetic mean can also be calculated from frequency distributions x̄ = ∑ xi fi ∑ fi Example:

30 Grouped data The class midpoint should be used when calculating the mean Example:

31 The following table shows the hourly wage rates of eight sampled construction workers.
1 2 3 4 5 6 7 8 Hourly wage rate ( ) $35 38 46 60 65 69 72 78 Calculate the mean, median and mode

32 The following table shows the daily wages of a random sample of Health workers. Calculate its mean, median and mode. Daily Wages ($) Number of Workers 5 15 25 30 18 7 Total 100

33 Daily Wages ($) Number of Workers Class Mark 5 299.5 1,497.5 15 499.5 7,492.5 25 699.5 17,489.5 30 899.5 26,985.5 18 1,099.5 19,791.0 7 1,299.5 9,096.5 Total 100 82,350.0

34 Mean Advantages All values in the distribution are used in its calculation, so it can be regarded as more representative than the other two measures. Its method of calculation is simple and most people understand the meaning of its result. Its result can easily be used in further analysis. Disadvantages Its result can be easily distorted by extreme values. As such, its result may be rather lower or higher than the bulk of the values and becomes unrepresentative. E.g. duration of stay in hospital (in days): 5, 5, 5, 7, 10, 20, 102. (Mean = 22). This does not reflect the mean duration of stay In case of open end classes, mean can be calculated only if their class marks are determined. If such classes contain a large proportion of the values, then the mean may be subjected to substantial error.

35 median The median is the 50th percentile i.e. Middle number
Generally, when ‘n’ (number of observations) is odd the median is: 1/2 (n+1)th observations. But when ‘n’ is even, there is no middle observation, and the median is the mean of the two middle observations

36 Less efficient than the mean because it takes no account of the magnitude of most of the observations. The median is much less amenable than the mean to mathematical treatments so it is less used in more elaborate statistical techniques. Good measure if the data are distributed asymmetrically

37 Mode Number that is found most frequently in a set of numbers
8, 7, 8, 8, 9, 6, 5, 6, 4, 6, 7 Two modes 6 and 8 5, 4, 9, 7, 6, 3, 8 No mode

38 Mode Advantages Disadvantages
Its result will not be affected by extreme values and open end classes. If data are not grouped, it can be determined easily. Disadvantages It has to be supplemented by other statistics. It is difficult to obtain an accurate estimate of the mode if the values are classified into a frequency distribution.

39 How to select a suitable measure
 Always select the mean whenever there is no special reason for choosing the other two measures.  Select the median is the distribution consists of substantial amount of extreme large or small values. Select the mode if integral result is preferred as in cases the data are in ordinal scales.

40 Variability measure The Range
difference between the maximum value and the minimum value Seldom used in statistical analysis: two extreme values, faulty

41 Variance Squared amount of spread or variability
The symbol s2 is used when we are referring to the variance of a sample and the symbol σ2 when we are referring to the variance of a population. Example:

42 Similar formular

43 ∑(xi – x) is always zero That is why it is squared For population

44 Standard deviation Calculate the standard deviation.


Download ppt "Introduction to Biostatistics"

Similar presentations


Ads by Google