 # Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C.

## Presentation on theme: "Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C."— Presentation transcript:

Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C

Two Broad Areas of Statistics Descriptive Statistics - Numerical descriptors - Graphical devices - Tabular displays Inferential Statistics - Hypothesis testing - Confidence intervals - Model building/selection

Descriptive Statistics When computed for a population of values, numerical descriptors are called Parameters When computed for a sample of values, numerical descriptors are called Statistics

Descriptive Statistics Two important aspects of any population Magnitude of the responses Spread among population members

Descriptive Statistics Measures of Central Tendency (magnitude) Mean - most widely used - uses all the data - best statistical properties - susceptible to outliers Median - does not use all the data - resistant to outliers

Descriptive Statistics Measures of Spread (variability) range - simple to compute - does not use all the data variance - uses all the data - best statistical properties - measures average distance of values from a reference point

Properties of Statistics Unbiasedness - On target Minimum variance - Most reliable If an estimator possesses both properties then it is a MINVUE = MINimum Variance Unbiased Estimator Sample Mean and Variance are UMVUE = Uniformly MINimum Variance Unbiased Estimator

Inferential Statistics - Hypothesis Testing - Interval Estimation

Hypothesis Testing Specifying hypotheses: H 0 : “null” or no effect hypothesis H 1 : research or alternative hypothesis Note: Only H 0 (null) is tested.

Errors in Hypothesis Testing Reality   Decision H 0 TrueH 0 False Fail to Reject H 0  Reject H 0 

Hypothesis Testing In parametric tests, actual parameter values are specified for H 0 and H 1. H 0 : µ < 120 H 1 : µ > 120

Hypothesis Testing Another example of explicitly specifying H 0 and H 1. H 0 :  = 0 H 1 :   0

Hypothesis Testing General framework: Specify null & alternative hypotheses Specify test statistic State rejection rule (RR) Compute test statistic and compare to RR State conclusion

Common Statistical Tests

Common Statistical Tests (cont.)

P-Values p = Probability of obtaining a result at least this extreme given the null is true.  P-values are probabilities  0 < p < 1  Computed from distribution of the test statistic

Rate a proportion, specifically a fraction, where The numerator, c, is included in the denominator: -Useful for comparing groups of unequal size Example: Epidemiological Concepts

Measures of Morbidity: Incidence Rate: # new cases occurring during a given time interval divided by population at risk at the beginning of that period. Prevalence Rate: total # cases at a given time divided by population at risk at that time. Epidemiological Concepts

Most people think in terms of probability (p) of an event as a natural way to quantify the chance an event will occur => 0<=p<=1 0 = event will certainly not occur 1 = event certain to occur But there are other ways of quantifying the chances that an event will occur…. Epidemiological Concepts

Odds and Odds Ratio: For example, O = 4 means we expect 4 times as many occurrences as non-occurrences of an event. In gambling, we say, the odds are 5 to 2. This corresponds to the single number 5/2 = Odds. Epidemiological Concepts

The relationship between probability & odds Epidemiological Concepts

ProbabilityOdds.1.11.2.25.3.43.4.67.51.00.61.50.72.33.84.00.99.00 Odds<1 correspond To probabilities<0.5 0<Odds< 

BlacksNonblacksTotal Death282250 Life455297 Total7374147 Death sentence by race of defendant in 147 trials Example 1: Odds Ratio

Odds of death sentence = 50/97 = 0.52 For Blacks:O = 28/45 = 0.62 For Nonblacks:O = 22/52 = 0.42 Ratio of Black Odds to Nonblack Odds = 1.47 This is called the Odds Ratio Example 2: Odds Ratio

Odds ratios are directly related to the parameters of the logit (logistic regression) model. Logistic Regression is a statistical method that models binary (e.g., Yes/No; T/F; Success/Failure) data as a function of one or more explanatory variables. We would like a model that predicts the probability of a success, ie, P(Y=1) using a linear function. Logistic Regression

Problem: Probabilities are bounded by 0 and 1. But linear functions are inherently unbounded. Solution: Transform P(Y=1) = p to an odds. If we take the log of the odds the lower bound is also removed. Setting this result equal to a linear function of the explanatory variables gives us the logit model. Logistic Regression

Logit or Logistic Regression Model Where p i is the probability that y i = 1. The expression on the left is called the logit or log odds. Logistic Regression

Probability of success: Odds Ratio for Each Explanatory Variable: Logistic Regression

Suppose a new screening test for herpes virus has been developed and the following summary for 1000 individuals has been compiled: Has Herpes Does Not Have Herpes Screened Positive4510 Screened Negative5940 Screening Tests

How do we evaluate the usefulness of such a test? Diagnostics: sensitivity specificity False positive rate False negative rate predictive value positive predictive value negative Screening Tests

Generic Screening Test Table With Disease Without Disease Total Screened Positive aba+b Screened Negative cdc+d Totala+cb+dN

Screening Tests

Interval Estimation Statistics such as the sample mean, median, variance, etc., are called point estimates -vary from sample to sample -do not incorporate precision

Interval Estimation Take as an example the sample mean: X ——————>  (pop n mean) Or the sample variance: S 2 ——————>  2 (pop n variance) Estimates

Interval Estimation Recall Example 1, a one-sample t-test on the population mean. The test statistic was This can be rewritten to yield:

Interval Estimation Which can be rearranged to give a (1-  )100% Confidence Interval for  : Form: Estimate ± Multiple of Std Error of the Est.

Interval Estimation Example 1: Standing SBP Mean = 140.8, s.d. = 9.5, N = 12 95% CI for  : 140.8 ± 2.201 (9.5/sqrt(12)) 140.8 ± 6.036 (134.8, 146.8)

Download ppt "Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C."

Similar presentations