Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST.

Similar presentations


Presentation on theme: "Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST."— Presentation transcript:

1 Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST

2 Introduction Types of data Sampling Data collection Descriptive statistics(Numerical & Graphical) Tests of statistical significance Correlation and Regression

3 Descriptive statistics Used to summarize data in a form that permits the clearest presentation of the information and facilitates useful comparisons between study groups or populations Various descriptive statistics are available Frequently used methods for presenting and summarizing data – numerical & graphical

4 Data Presentation Numerical Tables (Simple & Frequency distribution) Measures of Central Tendency Measures of Spread or Variability(Dispersion) Graphical

5 Tables Devices for presenting data from masses of statistical data Can be simple or complex depending upon the number or measurements

6 General principles in designing Tables Tables should be numbered Title must be given (brief & self explanatory) Headings of columns or rows should be clear & concise Data must be presented according to importance (chronologically, alphabetically or geographically)

7 General principles in designing Tables If percentages or averages are to be compared, they should be placed as close as possible Tables should not be too large Most people find it easier to scan the data from top to bottom (vertical) Foot notes may be given for providing explanatory notes or additional information

8 Simple table StatesPopulation 2010 Johor3,305,000 Kedah1,966,000 Kelantan1,670,000 Melaka771,000 TABLE 1 Population of some states of Malaysia *Source: Department of Statistics Malaysia

9 Frequency distribution Table A frequency distribution lists, for each value (or small range of values) of a variable, the number of proportion of times that observation occurs in the study population.

10 Frequency distribution table Age (Years)Number of patients 0 -435 5 - 918 10 - 1411 15 -198 TABLE 2 Age distribution of polio patients

11 Two-by-two table (r-by-c contingency table) OC use Breast cancer CasesControls Ever2732641 Never7167260 Total 2914 7976 10,890 Total 989 9901 Two-by-two table summarizing data from a case-control study of oral contraceptive (OC) use and breast cancer

12 Summary Statistics Tables & Graphs Discrete variable: proportion of individuals falling within each category Continuous variable: - Measures of Central Tendency - Measure of Variability(Dispersion) or Spread

13 Data Presentation Numerical Tables (Simple & Frequency distribution) Measures of Central Tendency Measures of Spread or Variability(Dispersion) Graphical

14 Measures of Central Tendency Mean Median Mode

15 Mean (Arithmetic Mean) Commonly used measure of central tendency Calculated simply by adding all the observed values and dividing by the total sample size of the group Mean (“X bar”) =

16 Advantages It is familiar to most people It reflects the inclusion of every item in the data Utilize all values It is easily used with other statistical measurements The mean is the center of gravity of the data and, easy to understand and to calculate Important for statistical analyses and its applications

17 Disadvantages It can be affected by extreme values in the data set, called outliers, and therefore be biased Loss of accuracy when the distribution is skewed Including or excluding a data (number) will change the mean Manually, more tedious to calculate

18 Median Is the middle observation point (50th percentile) It is the point at which half of the observations are smaller and half are larger Calculate - Arranging the observations from smallest to largest - Find the middle value e.g. 9, 7, 6, 5, 3, 1, 1

19 Calculation Odd Number of Measurements (n=odd value) The median is the value of middle observations in ascending order. x = [ 1 2 3 4 5 6 7 ] n =7 Median = 4 (4th observation)

20 Calculation Even Number of Measurements (n=even value) The median is the average value of the two middle- most observations in ascending order. x = [ 1 2 3 4 5 6 7 8 ] n=8 Median = (4+5)/2= 4.5

21 Formula… If odd number of observations, median observation = (n+1)/2 OR If even number of observations, median

22 Advantages Fairly easy to calculate Relatively easy to interpret - half of the sample (normally) lies above/below the median Is not affected by extreme data values Used when distribution of data is skewed Can be used with ordinal observations because calculation does not use actual values of the observations Do not need a complete data set to calculate the rank

23 Disadvantages Manually tedious to find for a large sample which is not in order (Requires ordering) Does not utilize all data values

24 Mode The mode of a set of observations is the specific value that occurs with the greatest frequency. There may be more than one mode in a set of observations, if there are several values that all occur with the greatest frequency A mode may also not exist; this is true if all the observations occur with the same frequency

25 Mode Arrange the numbers in order by size Determine the number of instances of each numerical value The numerical value that has the most instances is the mode What is the mode for the following data? 2, 4, 5, 5, 5, 7, 8, 8, 9, 12 Mode = ?

26 Mode Advantages Quick and easy to calculate Unaffected by extreme values Disadvantages May not be representative of the whole sample as they do not use all values Seldom gives statistical significance

27 Data Presentation Numerical Tables (Simple & Frequency distribution) Measures of Central Tendency Measures of Spread or Variability(Dispersion) Graphical

28 MEASURES OF DISPERSION (VARIATION)

29 Measures of Dispersion(Variation) Dispersion refers to the spread of the values around the central tendency These are characteristics that are used to describe the Variations and Scatter of a series of Values The series can consist of a sample of observations or a total population The values can be Grouped or Ungrouped

30 Measures of Dispersion Range Mean Deviation /Variance Standard Deviation Coefficient of Variation

31 Range The range is simply the highest value minus the lowest value. In our example distribution, 15,15,15,20,20,21,25,36 the high value is 36 and the low is 15, so the range is 36 - 15 = 21. The Range is used to measure Data Spread Not of practical importance, because it indicates only the extreme values and nothing about the dispersion of values between the two extreme values.

32 Variance/Mean Deviation The variance is a measure of how spread out a distribution is The average of squared deviations of the data points from the mean

33 Variance The formula for the variance in a population is where µ=mean and N=number of observations / scores The formula for the variance in a sample is

34 Standard deviation Simply the square root of the variance The SD is most commonly used measure of dispersion with medical and health data Measure of the spread of data about their mean (very important in statistical inference) The Standard Deviation shows the relation that set of scores has to the mean of the sample SD can only be appreciated when we study it with reference to normal curve

35 Normal Distribution (curve) Or ‘normal curve’ is an important concept in statistics Shape of the curve will depend upon the mean & standard deviation Limits on either side of the mean are called “confidence limits”

36 Standard Normal Curve

37 Skewness A distribution is skewed if one of its tails is longer than the other The first distribution shown has a positive skew. This means that it has a long tail in the positive direction. The second distribution has a negative skew since it has a long tail in the negative direction. The third distribution is symmetric and has no skew. Distributions with positive skew are sometimes called "skewed to the right" whereas distributions with negative skew are called "skewed to the left."

38 Coefficient of variation SD is the variability around the mean of the distribution A direct comparison of standard deviations for samples with different means would not be informative One measure that takes this into account is coefficient of variation (CV) and is calculated as CV = (SD/mean) X 100

39 Thank you


Download ppt "Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST."

Similar presentations


Ads by Google