Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Quantitative Techniques

Similar presentations


Presentation on theme: "Advanced Quantitative Techniques"— Presentation transcript:

1 Advanced Quantitative Techniques
Lab 2: Normality, Graphing Distributions, Confidence Intervals

2 Normal distribution

3 What are the Characteristics of a Normal Distribution?
Unimodal Bell shaped Symmetric Mean = Mode = Median Skewness = 0 Kurtosis = 3 68 – 95 – 99.7 rule

4 If population has a Normal distribution
68.2% of dataset is within 1 standard deviation of the mean 95.4% of dataset is within 2 standard deviations of the mean 99.7% of dataset is within 3 standard deviations of the mean

5 More about Normal distribution
Probability of any event is the area under the density curve. Total area under curve = 1 (collectively exhaustive)  Normal distributions are idealized description of data Total area is approximate; never precisely calculated because the line never touches x-axis.

6 Is population normal distributed?
use calls_311.dta histogram POP2010, width (600) frequency normal

7 Is population normal distributed?
sum POP2010, detail

8 Variance vs. Standard Deviation
(σ2) Standard Deviation (σ) Average of squared differences from the mean Square root of the variance

9 Skewness is a measure of symmetry
Where is the tail? Mean > Median Mean = Median Mean < Median STATA: Skewness > 0 Skewness = 0 Skewness < 0

10 Skewness

11 Kurtosis Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. (Kurtosis > 3) (Kurtosis = 3) (Kurtosis < 3)

12 Example of Normal distribution
use Lab_2_Data.16.dta histogram bwt, width (400) frequency normal

13 Example of Normal distribution
sum bwt, detail

14 Sampling Population – a group that includes all the cases (individuals, objects, or groups) in which the researcher is interested. Sample – a relatively small subset from a population.

15 Sampling Random sample
Stratified sample: divide the population into groups and draw a random sample from each group Cluster sample: group the population into small clusters, draws a simple random sample of clusters, and sample everything in the clusters

16 Sampling Parameter – A measure used to describe a population distribution. Statistic – A measure used to describe a sample distribution. Estimation – A process whereby we select a random sample from a population and use a sample statistic to estimate a population parameter.

17 Inference

18 Inferential Statistics
We generally don’t know anything about the population distribution We have a sample of data from the population We assume that the average/mean is the most appropriate description of population (no more median because we assume normal distribution) The sample is to be random and representative (“large enough”)

19 Inferential Statistics
What can we infer about the population based on a sample? From now on, we’re estimating the population mean (μ) with the sample mean ( ). We are no longer talking about individual behavior; we’re talking about average behavior

20 Distribution of Means Take a random sample over, and over, and over again (random means each data point has an equal chance of being chosen). You get many sample means Plot the sampling distribution of these means: you get a distribution of averages (not raw data points!)

21 Distribution of Means Sampling Distribution of Means: Frequency distribution (histogram) of the sample means, not of the data themselves. Distribution of all possible sample means **This is not the distribution of x** Freq If we sample randomly from a large enough population, the distribution of the averages of the data (not the population data!) is a bell curve (normal distribution). This is the case regardless of what the population distribution looks like.

22 Confidence Intervals The goal of calculating confidence intervals is to determine how sure we are that the true population mean, μ, is approximated by the sample mean .

23 Confidence Intervals Confidence Level – The likelihood, expressed as a percentage or a probability, that a specified interval will contain the population parameter. – 95% confidence level – there is a .95 probability that a specified interval DOES contain the population mean. – 99% confidence level – there is 1 chance out of 100 that the interval DOES NOT contain the population mean.

24 STATA: ci Command Open Stata and calls_311.dta
. Ci means calls_per_thousand, level(90) Significance Level Sample Mean Sample Size Standard Error = Lower Bound of the CI Upper Bound of the CI

25 Build a 95% CI for 311 calls per thousand people.
The default CI for the CI command in Stata is 95%. Precise Confident

26 Build a CI for Bronx calls/1,000pps that leaves a 10% chance of overestimation error.
ci means calls_per_thousand if county=="005", level(80) Build a CI for Manhattan calls/1,000pps that leaves a 20% chance that the population mean is not captured by the interval. ci means calls_per_thousand if county=="061", level(80) Are they significantly different?

27 Confidence intervals in a Normal distribution


Download ppt "Advanced Quantitative Techniques"

Similar presentations


Ads by Google