Advanced Quantitative Techniques

Advanced Quantitative Techniques
Lab 2: Normality, Graphing Distributions, Confidence Intervals

Normal distribution

What are the Characteristics of a Normal Distribution?
Unimodal Bell shaped Symmetric Mean = Mode = Median Skewness = 0 Kurtosis = 3 68 – 95 – 99.7 rule

If population has a Normal distribution
68.2% of dataset is within 1 standard deviation of the mean 95.4% of dataset is within 2 standard deviations of the mean 99.7% of dataset is within 3 standard deviations of the mean

More about Normal distribution
Probability of any event is the area under the density curve. Total area under curve = 1 (collectively exhaustive) Normal distributions are idealized description of data Total area is approximate; never precisely calculated because the line never touches x-axis.

Is population normal distributed?
use calls_311.dta histogram POP2010, width (600) frequency normal

Is population normal distributed?
sum POP2010, detail

Variance vs. Standard Deviation
(σ2) Standard Deviation (σ) Average of squared differences from the mean Square root of the variance

Skewness is a measure of symmetry
Where is the tail? Mean > Median Mean = Median Mean < Median STATA: Skewness > 0 Skewness = 0 Skewness < 0

Skewness

Kurtosis Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. (Kurtosis > 3) (Kurtosis = 3) (Kurtosis < 3)

Example of Normal distribution
use Lab_2_Data.16.dta histogram bwt, width (400) frequency normal

Example of Normal distribution
sum bwt, detail

Sampling Population – a group that includes all the cases (individuals, objects, or groups) in which the researcher is interested. Sample – a relatively small subset from a population.

Sampling Random sample
Stratified sample: divide the population into groups and draw a random sample from each group Cluster sample: group the population into small clusters, draws a simple random sample of clusters, and sample everything in the clusters

Sampling Parameter – A measure used to describe a population distribution. Statistic – A measure used to describe a sample distribution. Estimation – A process whereby we select a random sample from a population and use a sample statistic to estimate a population parameter.

Inference

Inferential Statistics
We generally don’t know anything about the population distribution We have a sample of data from the population We assume that the average/mean is the most appropriate description of population (no more median because we assume normal distribution) The sample is to be random and representative (“large enough”)

Inferential Statistics
What can we infer about the population based on a sample? From now on, we’re estimating the population mean (μ) with the sample mean ( ). We are no longer talking about individual behavior; we’re talking about average behavior

Distribution of Means Take a random sample over, and over, and over again (random means each data point has an equal chance of being chosen). You get many sample means Plot the sampling distribution of these means: you get a distribution of averages (not raw data points!)

Distribution of Means Sampling Distribution of Means: Frequency distribution (histogram) of the sample means, not of the data themselves. Distribution of all possible sample means **This is not the distribution of x** Freq If we sample randomly from a large enough population, the distribution of the averages of the data (not the population data!) is a bell curve (normal distribution). This is the case regardless of what the population distribution looks like.

Confidence Intervals The goal of calculating confidence intervals is to determine how sure we are that the true population mean, μ, is approximated by the sample mean .

Confidence Intervals Confidence Level – The likelihood, expressed as a percentage or a probability, that a specified interval will contain the population parameter. – 95% confidence level – there is a .95 probability that a specified interval DOES contain the population mean. – 99% confidence level – there is 1 chance out of 100 that the interval DOES NOT contain the population mean.

STATA: ci Command Open Stata and calls_311.dta
. Ci means calls_per_thousand, level(90) Significance Level Sample Mean Sample Size Standard Error = Lower Bound of the CI Upper Bound of the CI

Build a 95% CI for 311 calls per thousand people.
The default CI for the CI command in Stata is 95%. Precise Confident

Build a CI for Bronx calls/1,000pps that leaves a 10% chance of overestimation error.
ci means calls_per_thousand if county=="005", level(80) Build a CI for Manhattan calls/1,000pps that leaves a 20% chance that the population mean is not captured by the interval. ci means calls_per_thousand if county=="061", level(80) Are they significantly different?

Confidence intervals in a Normal distribution

Advanced Quantitative Techniques

Similar presentations

Presentation on theme: "Advanced Quantitative Techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced Quantitative Techniques

Similar presentations

Presentation on theme: "Advanced Quantitative Techniques"— Presentation transcript:

Similar presentations

About project

Feedback