EHS 655 Lecture 4: Descriptive statistics, censored data

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9.
Descriptive Statistics
Descriptive Statistics
Analysis of Research Data
Measures of Dispersion
Social Research Methods
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing distributions with numbers
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Descriptive Statistics
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Measures of Dispersion How far the data is spread out.
DESCRIPTIVE STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
INVESTIGATION 1.
Chapter 8 Making Sense of Data in Six Sigma and Lean
INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics Measures of Location Measures of Dispersion Measures.
Numerical Measures of Variability
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Chapter 11 Summarizing & Reporting Descriptive Data.
Important Properties of Distributions:
Descriptive Statistics ( )
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Exploratory Data Analysis
Different Types of Data
Statistics for Managers Using Microsoft® Excel 5th Edition
Business and Economics 6th Edition
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
IB Psychology Today’s Agenda: Turn in:
IB Psychology Today’s Agenda: Turn in:
Description of Data (Summary and Variability measures)
Univariate Descriptive Statistics
Univariate Descriptive Statistics
Social Research Methods
STATS DAY First a few review questions.
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Introduction to Statistics
Basic Statistical Terms
Descriptive and inferential statistics. Confidence interval
Univariate Statistics
Numerical Descriptive Measures
Numerical Descriptive Measures
Honors Statistics Review Chapters 4 - 5
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Advanced Algebra Unit 1 Vocabulary
Statistics Standard: S-ID
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Business and Economics 7th Edition
Numerical Descriptive Measures
Central Tendency & Variability
Presentation transcript:

EHS 655 Lecture 4: Descriptive statistics, censored data

What we’ll cover today Descriptive analysis and visualization Distribution Central tendency Dispersion Censored data Stata – basic commands

DESCRIPTIVE ANALYSIS Before we can make inference from data we must thoroughly examine variables Catch mistakes Look for patterns Find violations of statistical assumptions Generate hypotheses Avoid headaches later

Scope of dataset/analysis Univariate Measurements on one variable per subject Bivariate Measurements on two variables per subject Multivariate Measurements on many variables per subject Today’s focus

UNIVARIATE ANALYSES Characteristics of single variable Typically Distribution (frequency distribution) Central tendency (mean, median, mode) Dispersion (range, quartiles, absolute deviation, variance, standard deviation)

Distribution: categorical (table) Stata: “tab varname”

Distribution: categorical (ordinal) Stata: “graph bar (percent), over(varname)”

Distribution – quantitative (ratio) histogram Stata: “histogram varname, freq” (add “normal” to superimpose normal curve) http://www.inchem.org/documents/ehc/ehc/ehc214.htm

Distribution: cumulative distribution Stata: “cumul varname, gen (newvar) line newvar varname, sort”

Distribution: exceedance fraction http://depts.washington.edu/occnoise/content/generaltradesIDweb.pdf

Central tendency: mean, median, mode Use: identify “center” around which data are distributed Mean: Best for symmetric, non-skewed distributions Median: Best for skewed distribution or data with outliers Mode: Dataset may be bimodal, or may lack mode

Examples of central tendency Symmetrical, unimodal Symmetrial, bimodal Positively skewed, unimodal Negatively skewed, unimodal

When to use mean, median, mode Stata: “tabstat varname, stat (mean median)” Note: Stata does not have an easy way to identify mode Type of variable Best measure of central tendency Nominal Mode Ordinal Median Interval/ratio (not skewed) Mean Interval/ratio (skewed)

Dispersion Measures which identify spread of data (i.e., how far measurements are from “center”) Range Quartiles Standard deviation (SD) Variance Coefficient of variation Stata: “sum varname” Provides n, range, SD or Stata: “sum varname, detail” Provides n, range, quartiles, SD, variance Stata: “tabstat varname, stat(mean sd median range iqr cv)

Dispersion: range Simplest measure of dispersion Range Maximum - minimum Range

Dispersion: quartiles 3 points that divide data set into 4 equal groups 1st quartile (Q1) marks lowest 25% of data = 25th percentile 2nd quartile (Q2) splits data set in half = 50th percentile 3rd quartile (Q3) marks highest 25% of data = 75th percentile Upper – lower quartile is interquartile range (IQR)

Dispersion: boxplot Stata: “graph box varname1, over(varname2)” http://www.inchem.org/documents/ehc/ehc/ehc214.htm

Dispersion: standard deviation Variation of data Not dependent on n Not affected by number of measurements Expressed in same units as data Commonly used in exposure analysis

Variance Square of standard deviation Squaring eliminates negative values Unit is square of measurement unit (!) Values farther from mean contribute more to variance Commonly used in exposure analysis

Coefficient of variation Normalized measure of dispersion Dimensionless, often expressed as % Allows comparison of datasets with different units or means Unlike σ, cannot be used to construct confidence intervals around mean Mean close to 0 = Cv will approach infinity

BIVARIATE ANALYSES Allow us to begin to explore relationships between variables Scatter plot Correlation Cross-tabulation

Bivariate: scatter plot Stata: “scatter varname1 varname2” http://www.inchem.org/documents/ehc/ehc/ehc214.htm

Bivariate: Pearson correlation r (Pearson’s correlation coefficient) is amount of change in one value you expect from change in another value Assumptions: Both variables interval or ratio data Both variables normally distributed Absence of outliers Linear relationship Homeskedasticity Stata: “pwcorr varname1 varname2, sig”

Bivariate: correlation

Bivariate: Spearman correlation Spearman’s rank correlation coefficient (rs or ρ) Pearson correlation coefficient between ranked variables Raw scores Xi, Yi converted to ranks xi, yi Nonparametric (no distributional assumptions) Assumptions: Ordinal, interval, or ratio data Monotonic relationship Stata: “spearman varname1 varname2, stats(rho p)”

Bivariate: spearman vs. Pearson correlation coefficient examples

Bivariate: Cross-tabulation Stata: tab varname1 varname2

CENSORED DATA Uncensored/complete Left censored Interval censored Value of each sample unit observed/known Default assumption Left censored Data <max value Interval censored Data between min and maxi value Right censored Data >max value

Exercise Come up with one example of exposure data where you might find each type of censoring Right censoring Left censoring Interval censoring

Common approach #1 to dealing with censored data Assign all censored data ½ LOD Assumes data uniformly distributed below LOD

Common approach #2 to dealing with censored data Hornung and Reed (1990) More accurate than LOD/2 when data normally or lognormally distributed Okay if <~50% data censored if low to moderate variability Less accurate than LOD/2 for highly skewed data

On to Stata Basic data manipulation commands Define label/name for a variable (“label variable”) Create labels (“label define”) Assign labels to variable (“label values”) Rename variable (“rename”) Generate a new variable (“generate”) Replace an existing variable (“replace”)

On to Stata Anyone have to use the “Break” button?