BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
Agricultural and Biological Statistics
Descriptive Statistics
Calculating & Reporting Healthcare Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
PSY 307 – Statistics for the Behavioral Sciences
Descriptive Statistics
SOC 3155 SPSS CODING/GRAPHS & CHARTS CENTRAL TENDENCY & DISPERSION.
Slides by JOHN LOUCKS St. Edward’s University.
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Data observation and Descriptive Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
1 Measures of Central Tendency Greg C Elvers, Ph.D.
Measures of Central Tendency
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Today: Central Tendency & Dispersion
Measures of Central Tendency
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Psychometrics.
Chapter 3 – Descriptive Statistics
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data.
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Descriptive Statistics
Describing Data Lesson 3. Psychology & Statistics n Goals of Psychology l Describe, predict, influence behavior & cognitive processes n Role of statistics.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Measures of Central Tendency: The Mean, Median, and Mode
Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Central Tendency & Dispersion
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Summary Statistics: Measures of Location and Dispersion.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Chapter 2 Describing and Presenting a Distribution of Scores.
Descriptive Statistics(Summary and Variability measures)
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Numerical Measures: Centrality and Variability
Description of Data (Summary and Variability measures)
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
MEASURES OF CENTRAL TENDENCY
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

BIOSTATISTICS II

RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL NOMINAL /ORDINAL NUMERICAL DISCRETE/CONTINOUS/INTERVAL scale/RATIO scale REFERENCE TO SUMMARY STATISTICS

VARIABLES Dependent / independent Qualitative / quantitative ordinal Nominal dichotomous continuous discrete

NUMERICAL DATA EXAMINED THROUGH Frequency distribution Percentages, proportions, ratios, rates Figures Measures of central tendency Measures of dispersion

LEARNING OBJECTIVES From frequency tables to distributions Types of Distributions: Normal, Skewed Central Tendency: Mode, Median, Mean Dispersion: Variance, Standard Deviation

Descriptive statistics are concerned with describing the characteristics of frequency distributions Where is the center? What is the range? What is the shape [of the distribution?

Frequency Distributions Simple depiction of all the data Graphic — easy to understand Problems Not always precisely measured Not summarized in one number or datum

Frequency Table Test Scores ObservationFrequency

Frequency Distributions Test Score Frequency

Voter Turnout in last election

Voter Turnout in election

Normally Distributed Curve

Skewed Distributions

Characteristics of the Normal Distribution It is symmetrical -- Half the cases are to one side of the center; the other half is on the other side. The distribution is single peaked, not bimodal or multi-modal Most of the cases will fall in the center portion of the curve and as values of the variable become more extreme they become less frequent, with “outliers” at each of the “tails” of the distribution few in number. It is only one of many frequency distributions but the one we will focus on for most of this discussion. The Mean, Median, and Mode are the same. Percentage of cases in any range of the curve can be calculated.

Family of Normal Curves

Summarizing Distributions Two key characteristics of a frequency distribution are especially important when summarizing data or when making a prediction from one set of results to another: Central Tendency What is in the “Middle”? What is most common? What would we use to predict? Dispersion How Spread out is the distribution? What Shape is it?

Measures of Central Tendency The goal of measures of central tendency is to come up with the one single number that best describes a distribution of scores. Lets us know if the distribution of scores tends to be composed of high scores or low scores.

Three measures of central tendency are commonly used in statistical analysis - the mode, the median, and the mean Each measure is designed to represent a typical score The choice of which measure to use depends on: the shape of the distribution (whether normal or skewed), and the variable’s “level of measurement” (data are nominal, ordinal or interval).

Appropriate Measures of Central Tendency Nominal variables Mode Ordinal variables Median Interval level variables Mean - If the distribution is normal (median is better with skewed distribution)

Mode Most Common Outcome Male Female

Measures of Central Tendency Mode The most common observation in a group of scores. Distributions can be unimodal, bimodal, or multimodal. If the data is categorical (measured on the nominal scale) then only the mode can be calculated. The most frequently occurring score (mode) is Vanilla. Flavorf Vanilla28 Chocolate22 Strawberry15 Neapolitan8 Butter Pecan12 Rocky Road9 Fudge Ripple6

Measures of Central Tendency Mode The mode can also be calculated with ordinal and higher data, but it often is not appropriate. If other measures can be calculated, the mode would never be the first choice! 7, 7, 7, 20, 23, 23, 24, 25, 26 has a mode of 7, but obviously it doesn’t make much sense.

Median Middle-most Value 50% of observations are above the Median, 50% are below it The difference in magnitude between the observations does not matter Therefore, it is not sensitive to outliers Formula Median = n + 1 / 2

To compute the median  first you rank order the values of X from low to high:  85, 94, 94, 96, 96, 96, 96, 97, 97, 98  then count number of observations = 10.  add 1 = 11.  divide by 2 to get the middle score  the 5 ½ score here 96 is the middle score score

Median Find the Median Find the Median Find the Median ,000

Measures of Central Tendency Median The number that divides a distribution of scores exactly in half. The median is the same as the 50th percentile. Better than mode because only one score can be median and the median will usually be around where most scores fall. If data are perfectly normal, the mode is the median. The median is computed when data are ordinal scale or when they are highly skewed.

Mean - Average Most common measure of central tendency Best for making predictions Applicable under two conditions: 1. scores are measured at the interval level, and 2. distribution is more or less normal [symmetrical]. Symbolized as: for the mean of a sample μ for the mean of a population

Measures of Central Tendency Mean The arithmetic average, computed simply by adding together all scores and dividing by the number of scores. It uses information from every single score. For a population: For a Sample:

Finding the Mean X = (Σ X) / N If X = {3, 5, 10, 4, 3} X = ( ) / 5 = 25 / 5 = 5

Find the Mean Q: 4, 5, 8, 7 A: 6 Median: 6 Q: 4, 5, 8, 1000 A: Median: 6.5

IF THE DISTRIBUTION IS NORMAL Mean is the best measure of central tendency Most scores “bunched up” in middle Extreme scores less frequent  don’t move mean around.

Measures of Central Tendency ;Mean If data are perfectly normal, then the mean, median and mode are exactly the same. I would prefer to use the mean whenever possible since it uses information from EVERY score.

Family of Normal Distribution Curves

Measures of Central Tendency The Shape of Distributions With perfectly bell shaped distributions, the mean, median, and mode are identical. With positively skewed data, the mode is lowest, followed by the median and mean. With negatively skewed data, the mean is lowest, followed by the median and mode.

Means Consider these means for weekly candy bar consumption. X = {7, 8, 6, 7, 7, 6, 8, 7} X = ( )/8 X = 7 X = {12, 2, 0, 14, 10, 9, 5, 4} X = ( )/8 X = 7 What is the difference?

Measures of Central Tendency Using the Mean to Interpret Data Describing the Population Mean Remember, we usually want to know population parameters, but populations are too large. So, we use the sample mean to estimate the population mean.

How well does the mean represent the scores in a distribution? The logic here is to determine how much spread is in the scores. How much do the scores "deviate" from the mean? Think of the mean as the true score or as your best guess. If every X were very close to the Mean, the mean would be a very good predictor. If the distribution is very sharply peaked then the mean is a good measure of central tendency and if you were to use the mean to make predictions you would be right or close much of the time.

Why can’t the mean tell us everything? Mean describes Central Tendency, what the average outcome is. We also want to know something about how accurate the mean is when making predictions. The question becomes how good a representation of the distribution is the mean? How good is the mean as a description of central tendency -- or how good is the mean as a predictor? Answer -- it depends on the shape of the distribution. Is the distribution normal or skewed?

What if scores are widely distributed? The mean is still your best measure and your best predictor, but your predictive power would be less. How do we describe this? Measures of variability Mean Deviation Variance Standard Deviation

Measures of Variability Central Tendency doesn’t tell us everything Dispersion/Deviation/Spread tells us a lot about how a variable is distributed. We are most interested in Standard Deviations (σ) and Variance (σ 2 )

Dispersion Once you determine that the variable of interest is normally distributed, ideally by producing a histogram of the scores, the next question to be asked about the NDC is its dispersion: how spread out are the scores around the mean. Dispersion is a key concept in statistical thinking. The basic question being asked is how much do the scores deviate around the Mean? The more “ bunched up ” around the mean the better your ability to make accurate predictions.

Mean Deviation The key concept for describing normal distributions and making predictions from them is called deviation from the mean. We could just calculate the average distance between each observation and the mean. We must take the absolute value of the distance, otherwise they would just cancel out to zero! Formula:

Mean Deviation: An Example 1. Compute X (Average) 2. Compute X – X and take the Absolute Value to get Absolute Deviations 3. Sum the Absolute Deviations 4. Divide the sum of the absolute deviations by N X – X i Abs. Dev. 7 – 61 7 – – 52 7 – 43 7 – 92 7 – 81 Data: X = {6, 10, 5, 4, 9, 8}X = 42 / 6 = 7 Total: / 6 = 2

What Does it Mean? On Average, each observation is two units away from the mean. Is it Really that Easy? No! Absolute values are difficult to manipulate algebraically Absolute values cause enormous problems for calculus (Discontinuity) We need something else…

Variance and Standard Deviation Instead of taking the absolute value, we square the deviations from the mean. This yields a positive value. This will result in measures we call the Variance and the Standard Deviation Sample- Population- s: Standard Deviation σ: Standard Deviation s 2 : Variance σ 2 : Variance

Calculating the Variance and/or Standard Deviation Formulae: Variance: Examples Follow... Standard Deviation:

Example: Data: X = {6, 10, 5, 4, 9, 8}; N = 6 Total: 42Total: 28 Standard Deviation: Mean: Variance:

IN A NORMAL CURVE AREA CORRESPONDING TO 1 SD WILL COMPRISE 68% OF TOTAL AREA 2 SD WILL COMPRISE 95% OF TOTAL AREA 3 SD WILL COMPRISE 99.7% OF TOTAL AREA ( THE RULE)

COEFFICIENT OF VARIANCE Measures the spread the spread of data set as a proportion of its mean Expressed as percentage It is ratio of sample standard deviation to sample mean. CV of population is based on expected value and SD of a random variable CV = standard deviation/mean x 100

PERCENTILES Give variability of the distribution The p’th percentile of distribution is the value such that p% of observations fall at or below it Median is the 50 th percentile Used in calculation of growth charts for nutritional surveillance and monitoring

QUARTILES Values that divide the data into four groups containing equal numbers of observations Quartiles are the 25 th and 75 th percentiles First quartile is the median of observations below the median of the complete data set, Third quartile is the median of observations above the median of the complete data.

RANGE The range of a sample /data set is the difference between the largest and smallest observed value of some quantifiable characteristic. A simple summary measure but crude Like mean it is affected by extreme values Data: 2,3,4,5,6,6,6,7,7,8,9 RANGE 2- 9= 7

INTERQUARTILE RANGE(IQR) Calculated by taking difference between upper and lower quartiles IQR is the width of an interval which contains middle 50% of sample Smaller than range and less affected by outliers. Data: 2,3,4,56,6,6,7,7,8,9 Upper quartile=7, lower quartile=4, IQR=3

QUESTIONS ARE WELCOME FEELING READY FOR RESEARCH AND APPROPRIATE DATA COLLECTION ????????

THERE WILL BE A CLASS TEST OF 50 MCQs OUT OF SUBJECTS STUDIED SO FAR ON 15 TH OCT 2012