© 2006 1 Biostatistics Basics An introduction to an expansive and complex field.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
Introduction to Summary Statistics
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Statistics for the Social Sciences
QUANTITATIVE DATA ANALYSIS
Calculating & Reporting Healthcare Statistics
Lecture 2 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
PSY 307 – Statistics for the Behavioral Sciences
Descriptive Statistics
Analysis of Research Data
Introduction to Educational Statistics
Data observation and Descriptive Statistics
Measures of Central Tendency
Today: Central Tendency & Dispersion
Introduction to Statistics February 21, Statistics and Research Design Statistics: Theory and method of analyzing quantitative data from samples.
Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Psychometrics.
Statistics and Research methods Wiskunde voor HMI Betsy van Dijk.
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Measures of Central Tendency or Measures of Location or Measures of Averages.
1.3 Psychology Statistics AP Psychology Mr. Loomis.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
© 2006 Dr Rotimi Adigun. Recommended text  High Yield Biostatistics, Epidemiology and Public Health,Michael Glaser, Fourth edition.  Pre-Test Preventive.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
Chapter Eleven A Primer for Descriptive Statistics.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Descriptive Statistics
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Descriptive Statistics
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency.
Central Tendency & Dispersion
Chapter Eight: Using Statistics to Answer Questions.
Statistics Without Fear! AP Ψ. An Introduction Statistics-means of organizing/analyzing data Descriptive-organize to communicate Inferential-Determine.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Data Analysis.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Chapter 3: Central Tendency 1. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Chapter 2 Describing and Presenting a Distribution of Scores.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Description of Data (Summary and Variability measures)
Descriptive Statistics
Introduction to Statistics
Basic Statistical Terms
An introduction to an expansive and complex field
Chapter Nine: Using Statistics to Answer Questions
Descriptive Statistics
Presentation transcript:

© Biostatistics Basics An introduction to an expansive and complex field

© 2006 Evidence-based Chiropractic 2 Common statistical terms Data –Measurements or observations of a variable Variable –A characteristic that is observed or manipulated –Can take on different values

© 2006 Evidence-based Chiropractic 3 Statistical terms (cont.) Independent variables –Precede dependent variables in time –Are often manipulated by the researcher –The treatment or intervention that is used in a study Dependent variables –What is measured as an outcome in a study –Values depend on the independent variable

© 2006 Evidence-based Chiropractic 4 Statistical terms (cont.) Parameters –Summary data from a population Statistics –Summary data from a sample

© 2006 Evidence-based Chiropractic 5 Populations A population is the group from which a sample is drawn –e.g., headache patients in a chiropractic office; automobile crash victims in an emergency room In research, it is not practical to include all members of a population Thus, a sample (a subset of a population) is taken

© 2006 Evidence-based Chiropractic 6 Random samples Subjects are selected from a population so that each individual has an equal chance of being selected Random samples are representative of the source population Non-random samples are not representative –May be biased regarding age, severity of the condition, socioeconomic status etc.

© 2006 Evidence-based Chiropractic 7 Random samples (cont.) Random samples are rarely utilized in health care research Instead, patients are randomly assigned to treatment and control groups –Each person has an equal chance of being assigned to either of the groups Random assignment is also known as randomization

© 2006 Evidence-based Chiropractic 8 Descriptive statistics ( DSs) A way to summarize data from a sample or a population DSs illustrate the shape, central tendency, and variability of a set of data –The shape of data has to do with the frequencies of the values of observations

© 2006 Evidence-based Chiropractic 9 DSs (cont.) –Central tendency describes the location of the middle of the data –Variability is the extent values are spread above and below the middle values a.k.a., Dispersion DSs can be distinguished from inferential statistics –DSs are not capable of testing hypotheses

© 2006 Evidence-based Chiropractic 10 Hypothetical study data (partial from book) Case # Visits Distribution provides a summary of: –Frequencies of each of the values 2 – 3 3 – 4 4 – 3 5 – 1 6 – 1 7 – 2 –Ranges of values Lowest = 2 Highest = 7 etc.

© 2006 Evidence-based Chiropractic 11 Frequency distribution table Frequency PercentCumulative %

© 2006 Evidence-based Chiropractic 12 Frequency distributions are often depicted by a histogram

© 2006 Evidence-based Chiropractic 13 Histograms (cont.) A histogram is a type of bar chart, but there are no spaces between the bars Histograms are used to visually depict frequency distributions of continuous data Bar charts are used to depict categorical information –e.g., Male–Female, Mild–Moderate–Severe, etc.

© 2006 Evidence-based Chiropractic 14 Measures of central tendency Mean (a.k.a., average) –The most commonly used DS To calculate the mean –Add all values of a series of numbers and then divided by the total number of elements

© 2006 Evidence-based Chiropractic 15 Formula to calculate the mean Mean of a sample Mean of a population  (X bar) refers to the mean of a sample and refers to the mean of a population  EX is a command that adds all of the X values  n is the total number of values in the series of a sample and N is the same for a population

© 2006 Evidence-based Chiropractic 16 Measures of central tendency (cont.) Mode –The most frequently occurring value in a series –The modal value is the highest bar in a histogram Mode

© 2006 Evidence-based Chiropractic 17 Measures of central tendency (cont.) Median –The value that divides a series of values in half when they are all listed in order –When there are an odd number of values The median is the middle value –When there are an even number of values Count from each end of the series toward the middle and then average the 2 middle values

© 2006 Evidence-based Chiropractic 18 Measures of central tendency (cont.) Each of the three methods of measuring central tendency has certain advantages and disadvantages Which method should be used? –It depends on the type of data that is being analyzed –e.g., categorical, continuous, and the level of measurement that is involved

© 2006 Evidence-based Chiropractic 19 Levels of measurement There are 4 levels of measurement –Nominal, ordinal, interval, and ratio 1.Nominal –Data are coded by a number, name, or letter that is assigned to a category or group –Examples Gender (e.g., male, female) Treatment preference (e.g., manipulation, mobilization, massage)

© 2006 Evidence-based Chiropractic 20 Levels of measurement (cont.) 2.Ordinal –Is similar to nominal because the measurements involve categories –However, the categories are ordered by rank –Examples Pain level (e.g., mild, moderate, severe) Military rank (e.g., lieutenant, captain, major, colonel, general)

© 2006 Evidence-based Chiropractic 21 Levels of measurement (cont.) Ordinal values only describe order, not quantity –Thus, severe pain is not the same as 2 times mild pain The only mathematical operations allowed for nominal and ordinal data are counting of categories –e.g., 25 males and 30 females

© 2006 Evidence-based Chiropractic 22 Levels of measurement (cont.) 3.Interval –Measurements are ordered (like ordinal data) –Have equal intervals –Does not have a true zero –Examples The Fahrenheit scale, where 0° does not correspond to an absence of heat (no true zero) In contrast to Kelvin, which does have a true zero

© 2006 Evidence-based Chiropractic 23 Levels of measurement (cont.) 4.Ratio –Measurements have equal intervals –There is a true zero –Ratio is the most advanced level of measurement, which can handle most types of mathematical operations

© 2006 Evidence-based Chiropractic 24 Levels of measurement (cont.) Ratio examples –Range of motion No movement corresponds to zero degrees The interval between 10 and 20 degrees is the same as between 40 and 50 degrees –Lifting capacity A person who is unable to lift scores zero A person who lifts 30 kg can lift twice as much as one who lifts 15 kg

© 2006 Evidence-based Chiropractic 25 Levels of measurement (cont.) NOIR is a mnemonic to help remember the names and order of the levels of measurement –Nominal Ordinal Interval Ratio

© 2006 Evidence-based Chiropractic 26 Levels of measurement (cont.) Measurement scale Permissible mathematic operations Best measure of central tendency NominalCountingMode Ordinal Greater or less than operations Median IntervalAddition and subtraction Symmetrical – Mean Skewed – Median Ratio Addition, subtraction, multiplication and division Symmetrical – Mean Skewed – Median

© 2006 Evidence-based Chiropractic 27 The shape of data Histograms of frequency distributions have shape Distributions are often symmetrical with most scores falling in the middle and fewer toward the extremes Most biological data are symmetrically distributed and form a normal curve (a.k.a, bell-shaped curve)

© 2006 Evidence-based Chiropractic 28 The shape of data (cont.) Line depicting the shape of the data

© 2006 Evidence-based Chiropractic 29 The normal distribution The area under a normal curve has a normal distribution (a.k.a., Gaussian distribution) Properties of a normal distribution –It is symmetric about its mean –The highest point is at its mean –The height of the curve decreases as one moves away from the mean in either direction, approaching, but never reaching zero

© 2006 Evidence-based Chiropractic 30 The normal distribution (cont.) Mean A normal distribution is symmetric about its mean As one moves away from the mean in either direction the height of the curve decreases, approaching, but never reaching zero As one moves away from the mean in either direction the height of the curve decreases, approaching, but never reaching zero The highest point of the overlying normal curve is at the mean

© 2006 Evidence-based Chiropractic 31 The normal distribution (cont.) Mean = Median = Mode

© 2006 Evidence-based Chiropractic 32 Skewed distributions The data are not distributed symmetrically in skewed distributions –Consequently, the mean, median, and mode are not equal and are in different positions –Scores are clustered at one end of the distribution –A small number of extreme values are located in the limits of the opposite end

© 2006 Evidence-based Chiropractic 33 Skewed distributions (cont.) Skew is always toward the direction of the longer tail –Positive if skewed to the right –Negative if to the left The mean is shifted the most

© 2006 Evidence-based Chiropractic 34 Skewed distributions (cont.) Because the mean is shifted so much, it is not the best estimate of the average score for skewed distributions The median is a better estimate of the center of skewed distributions –It will be the central point of any distribution –50% of the values are above and 50% below the median

© 2006 Evidence-based Chiropractic 35 More properties of normal curves About 68.3% of the area under a normal curve is within one standard deviation (SD) of the mean About 95.5% is within two SDs About 99.7% is within three SDs

© 2006 Evidence-based Chiropractic 36 More properties of normal curves (cont.)

© 2006 Evidence-based Chiropractic 37 Standard deviation (SD) SD is a measure of the variability of a set of data The mean represents the average of a group of scores, with some of the scores being above the mean and some below –This range of scores is referred to as variability or spread Variance (S 2 ) is another measure of spread

© 2006 Evidence-based Chiropractic 38 SD (cont.) In effect, SD is the average amount of spread in a distribution of scores The next slide is a group of 10 patients whose mean age is 40 years –Some are older than 40 and some younger

© 2006 Evidence-based Chiropractic 39 SD (cont.) Ages are spread out along an X axis The amount ages are spread out is known as dispersion or spread

© 2006 Evidence-based Chiropractic 40 Distances ages deviate above and below the mean Adding deviations always equals zero Etc.

© 2006 Evidence-based Chiropractic 41 Calculating S 2 To find the average, one would normally total the scores above and below the mean, add them together, and then divide by the number of values However, the total always equals zero –Values must first be squared, which cancels the negative signs

© 2006 Evidence-based Chiropractic 42 Calculating S 2 cont. Symbol for SD of a sample  for a population S 2 is not in the same units (age), but SD is

© 2006 Evidence-based Chiropractic 43 Calculating SD with Excel Enter values in a column

© 2006 Evidence-based Chiropractic 44 SD with Excel (cont.) Click Data Analysis on the Tools menu

© 2006 Evidence-based Chiropractic 45 SD with Excel (cont.) Select Descriptive Statistics and click OK

© 2006 Evidence-based Chiropractic 46 SD with Excel (cont.) Click Input Range icon

© 2006 Evidence-based Chiropractic 47 SD with Excel (cont.) Highlight all the values in the column

© 2006 Evidence-based Chiropractic 48 SD with Excel (cont.) Check if labels are in the first row Check Summary Statistics Click OK

© 2006 Evidence-based Chiropractic 49 SD with Excel (cont.) SD is calculated precisely Plus several other DSs SD is calculated precisely Plus several other DSs

© 2006 Evidence-based Chiropractic 50 Wide spread results in higher SDs narrow spread in lower SDs

© 2006 Evidence-based Chiropractic 51 Spread is important when comparing 2 or more group means It is more difficult to see a clear distinction between groups in the upper example because the spread is wider, even though the means are the same

© 2006 Evidence-based Chiropractic 52 z-scores The number of SDs that a specific score is above or below the mean in a distribution Raw scores can be converted to z-scores by subtracting the mean from the raw score then dividing the difference by the SD

© 2006 Evidence-based Chiropractic 53 z-scores (cont.) Standardization –The process of converting raw to z-scores –The resulting distribution of z-scores will always have a mean of zero, a SD of one, and an area under the curve equal to one The proportion of scores that are higher or lower than a specific z-score can be determined by referring to a z-table

© 2006 Evidence-based Chiropractic 54 z-scores (cont.) Refer to a z-table to find proportion under the curve

© 2006 Evidence-based Chiropractic 55 z-scores (cont.) Partial z-table (to z = 1.5) showing proportions of the area under a normal curve for different values of z. Z Corresponds to the area under the curve in black Corresponds to the area under the curve in black