Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST.

Slides:



Advertisements
Similar presentations
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Advertisements

Measures of Dispersion
Introduction to Summary Statistics
Basic Statistical Concepts
Measures of Dispersion or Measures of Variability
Measures of Central Tendency. Central Tendency “Values that describe the middle, or central, characteristics of a set of data” Terms used to describe.
Calculating & Reporting Healthcare Statistics
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Edpsy 511 Homework 1: Due 2/6.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Describing Data: Numerical
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
What is statistics? STATISTICS BOOT CAMP Study of the collection, organization, analysis, and interpretation of data Help us see what the unaided eye misses.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Measures of Central Tendency or Measures of Location or Measures of Averages.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Descriptive Statistics Roger L. Brown, Ph.D. Medical Research Consulting Middleton, WI Online Course #1.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Descriptive Statistics: Numerical Methods
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
Descriptive Statistics
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
INVESTIGATION 1.
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
FREQUANCY DISTRIBUTION 8, 24, 18, 5, 6, 12, 4, 3, 3, 2, 3, 23, 9, 18, 16, 1, 2, 3, 5, 11, 13, 15, 9, 11, 11, 7, 10, 6, 5, 16, 20, 4, 3, 3, 3, 10, 3, 2,
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Experimental Methods: Statistics & Correlation
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
CHAPTER 2: Basic Summary Statistics
Chapter 2 Describing and Presenting a Distribution of Scores.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 18.
Chapter 3 Numerical Descriptive Measures. 3.1 Measures of central tendency for ungrouped data A measure of central tendency gives the center of a histogram.
An Introduction to Statistics
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Central Tendency and Variability
Descriptive Statistics
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
Numerical Descriptive Measures
Basic Statistical Terms
Numerical Descriptive Measures
Summary descriptive statistics: means and standard deviations:
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
CHAPTER 2: Basic Summary Statistics
Numerical Descriptive Measures
Presentation transcript:

Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST

Introduction Types of data Sampling Data collection Descriptive statistics(Numerical & Graphical) Tests of statistical significance Correlation and Regression

Descriptive statistics Used to summarize data in a form that permits the clearest presentation of the information and facilitates useful comparisons between study groups or populations Various descriptive statistics are available Frequently used methods for presenting and summarizing data – numerical & graphical

Data Presentation Numerical Tables (Simple & Frequency distribution) Measures of Central Tendency Measures of Spread or Variability(Dispersion) Graphical

Tables Devices for presenting data from masses of statistical data Can be simple or complex depending upon the number or measurements

General principles in designing Tables Tables should be numbered Title must be given (brief & self explanatory) Headings of columns or rows should be clear & concise Data must be presented according to importance (chronologically, alphabetically or geographically)

General principles in designing Tables If percentages or averages are to be compared, they should be placed as close as possible Tables should not be too large Most people find it easier to scan the data from top to bottom (vertical) Foot notes may be given for providing explanatory notes or additional information

Simple table StatesPopulation 2010 Johor3,305,000 Kedah1,966,000 Kelantan1,670,000 Melaka771,000 TABLE 1 Population of some states of Malaysia *Source: Department of Statistics Malaysia

Frequency distribution Table A frequency distribution lists, for each value (or small range of values) of a variable, the number of proportion of times that observation occurs in the study population.

Frequency distribution table Age (Years)Number of patients TABLE 2 Age distribution of polio patients

Two-by-two table (r-by-c contingency table) OC use Breast cancer CasesControls Ever Never Total ,890 Total Two-by-two table summarizing data from a case-control study of oral contraceptive (OC) use and breast cancer

Summary Statistics Tables & Graphs Discrete variable: proportion of individuals falling within each category Continuous variable: - Measures of Central Tendency - Measure of Variability(Dispersion) or Spread

Data Presentation Numerical Tables (Simple & Frequency distribution) Measures of Central Tendency Measures of Spread or Variability(Dispersion) Graphical

Measures of Central Tendency Mean Median Mode

Mean (Arithmetic Mean) Commonly used measure of central tendency Calculated simply by adding all the observed values and dividing by the total sample size of the group Mean (“X bar”) =

Advantages It is familiar to most people It reflects the inclusion of every item in the data Utilize all values It is easily used with other statistical measurements The mean is the center of gravity of the data and, easy to understand and to calculate Important for statistical analyses and its applications

Disadvantages It can be affected by extreme values in the data set, called outliers, and therefore be biased Loss of accuracy when the distribution is skewed Including or excluding a data (number) will change the mean Manually, more tedious to calculate

Median Is the middle observation point (50th percentile) It is the point at which half of the observations are smaller and half are larger Calculate - Arranging the observations from smallest to largest - Find the middle value e.g. 9, 7, 6, 5, 3, 1, 1

Calculation Odd Number of Measurements (n=odd value) The median is the value of middle observations in ascending order. x = [ ] n =7 Median = 4 (4th observation)

Calculation Even Number of Measurements (n=even value) The median is the average value of the two middle- most observations in ascending order. x = [ ] n=8 Median = (4+5)/2= 4.5

Formula… If odd number of observations, median observation = (n+1)/2 OR If even number of observations, median

Advantages Fairly easy to calculate Relatively easy to interpret - half of the sample (normally) lies above/below the median Is not affected by extreme data values Used when distribution of data is skewed Can be used with ordinal observations because calculation does not use actual values of the observations Do not need a complete data set to calculate the rank

Disadvantages Manually tedious to find for a large sample which is not in order (Requires ordering) Does not utilize all data values

Mode The mode of a set of observations is the specific value that occurs with the greatest frequency. There may be more than one mode in a set of observations, if there are several values that all occur with the greatest frequency A mode may also not exist; this is true if all the observations occur with the same frequency

Mode Arrange the numbers in order by size Determine the number of instances of each numerical value The numerical value that has the most instances is the mode What is the mode for the following data? 2, 4, 5, 5, 5, 7, 8, 8, 9, 12 Mode = ?

Mode Advantages Quick and easy to calculate Unaffected by extreme values Disadvantages May not be representative of the whole sample as they do not use all values Seldom gives statistical significance

Data Presentation Numerical Tables (Simple & Frequency distribution) Measures of Central Tendency Measures of Spread or Variability(Dispersion) Graphical

MEASURES OF DISPERSION (VARIATION)

Measures of Dispersion(Variation) Dispersion refers to the spread of the values around the central tendency These are characteristics that are used to describe the Variations and Scatter of a series of Values The series can consist of a sample of observations or a total population The values can be Grouped or Ungrouped

Measures of Dispersion Range Mean Deviation /Variance Standard Deviation Coefficient of Variation

Range The range is simply the highest value minus the lowest value. In our example distribution, 15,15,15,20,20,21,25,36 the high value is 36 and the low is 15, so the range is = 21. The Range is used to measure Data Spread Not of practical importance, because it indicates only the extreme values and nothing about the dispersion of values between the two extreme values.

Variance/Mean Deviation The variance is a measure of how spread out a distribution is The average of squared deviations of the data points from the mean

Variance The formula for the variance in a population is where µ=mean and N=number of observations / scores The formula for the variance in a sample is

Standard deviation Simply the square root of the variance The SD is most commonly used measure of dispersion with medical and health data Measure of the spread of data about their mean (very important in statistical inference) The Standard Deviation shows the relation that set of scores has to the mean of the sample SD can only be appreciated when we study it with reference to normal curve

Normal Distribution (curve) Or ‘normal curve’ is an important concept in statistics Shape of the curve will depend upon the mean & standard deviation Limits on either side of the mean are called “confidence limits”

Standard Normal Curve

Skewness A distribution is skewed if one of its tails is longer than the other The first distribution shown has a positive skew. This means that it has a long tail in the positive direction. The second distribution has a negative skew since it has a long tail in the negative direction. The third distribution is symmetric and has no skew. Distributions with positive skew are sometimes called "skewed to the right" whereas distributions with negative skew are called "skewed to the left."

Coefficient of variation SD is the variability around the mean of the distribution A direct comparison of standard deviations for samples with different means would not be informative One measure that takes this into account is coefficient of variation (CV) and is calculated as CV = (SD/mean) X 100

Thank you