Download presentation
Presentation is loading. Please wait.
1
Advanced Quantitative Techniques
Lab 6 October 13th 2016
2
Agenda today DSSC presentation: available data & how they can help you. Use their resources now Final project ideas activity Quick recap / midterm review intro to regression in STATA (tbc next week)
3
Final project peer brainstorming
Divide into groups of 4 Turn to person next to you & describe your ideas thus far in 2 minutes. (5 mins) Regroup and describe your partner’s idea to the 4 person group (15 mins) 1 person from each group report back on sticky points – Ideas? Data? Tools?
4
Midterm review, cont. Big concepts: get them right
Problems: plug in carefully Explaining findings: don’t forget. Use precise language. Logistics: Blue book, hand-written. worth 20% of semester grade Done in-class
5
Midterm review: A: Big concepts: get them right
e.g. error types: type a, type b Confidence: 90% confidant that this interval contains the true value of y. If we repeated the experiment 100 times, in 95, our value would fall inside of this range. hypotheses Power Dependent & independent variables
6
If something doesn’t make sense, look up other explanations
If something doesn’t make sense, look up other explanations.. (I like graphics and cartoons) But…be careful of source. Universities = more reliable. Wikipedia = stats section is over-run by mathematicians and more complicated than we need.
7
B: Manipulating equations
homework problems + in-class examples Standard distribution, CI, t-test, zstat Knowing mean + SD, what % of observations fall below X value? Calculate the input that you’re missing (either the sample or the population or the SD). Plug into m-m/SD. Look up value on z table. Remember to subtract if one-tailed. Use normal to approximate binomial if needed. Calculate a mean, build a CI around it. Mean +- se*Tcrit. Usually you’ll have to calculate SE from SD. Pretty straight-forward. Make sure you have the formulas, and are careful about plugging in. Don’t use SD when you need SE.
8
C: Explaining findings: don’t forget
Make sure to write a concluding sentence. Hint: look back at the question. What puzzle are you trying to unravel?
9
Make a formula sheet in your own words, e.g
standard error = SD / sq root of sample size [sample] [pop estimate] [sample] t statistic = sample mean – pop mean / standard error Confidence interval : mean plus or minus the z (or t) stat multiplied by standard error. Quick ref for the important Z scores..
10
Coyotes & poison s or σ = √p*(1-p) = √.28*(1-.28) =√.2016 = 0.45
Hypothesis (H1): <28% of coyotes will survive the winter. Null Hypothesis(H2): ≥28% more of coyotes will survive the winter. We want to see where the actual survival last year (51/214) =24% survival maps on the overall survival percentage (~population p/mean) s or σ = √p*(1-p) = √.28*(1-.28) =√.2016 = 0.45 s.e. = s/√n = .45/√214 = .031 t = = /.031 = 1.33 (+-) p = … so yes to 90% but no to 95% significance ‘not effective’ is not enough: tell what level of significance. Less than 95%? Less than 90%? Note about class: yes, weights don’t overlap. (according to that data…didn’t include me, I guess!)
11
Intro to regression in STATA - tbc
Open the 311 data
12
Command: Scatterplot Relation between 311 calls & vacancy rate?
calls_per_thousand & vacant generate vacant_rate= vacant/ HSE_UNIT*100 twoway (scatter calls_per_thousand vacant_rate) (lfit calls_per_thousand vacant_rate)
13
Command: correlate (corr)
corr calls_per_thousand vacant_rate Get a basic idea first with correlation…or check to see if your variables might be confounding.
14
Linear Regression Describes a relationship between an explained variable (y) and an explanatory variable (x). You “regress y on x.” Attempts to explain this relationship with a straight line fit. Simple linear regression has one input (x) and one output (y) The ideal formula to approximate the regression: Intercept Slope Error term
15
What are ‘residuals’ (error terms)?
Residuals (or error terms) are the difference between an observed value of the response variable and the value predicted by the regression line. Residual = observed y – predicted y Residuals represent the ‘leftover’ or ‘unexplained’ variation in the response variable after fitting the regression line.
16
Command: regress (reg)
reg calls_per_thousand vacant_rate
17
Interpreting the Output
1. Slope: The coefficient of the independent variable (ß1) is the slope of the regression line. Slope is the amount of increase in the dependent variable for every unit increase in the independent variable. 2. Y-Intercept: The constant (ß0).
18
Interpreting the Output
3. The p-value and CI of the Coefficients: P-value corresponds to the coefficient of the independent variable. If the p-value is less than alpha, you can conclude there is a statistically significant relationship between the independent variable and the dependent variable. Or, you can examine whether zero is in the confidence interval of the independent variable. If zero is in the interval, then the coefficient is not statistically different from zero at 95% confidence.
19
How to Read Stata Output?
SS – Sum of Squares associated with three sources of variance: Model, Residual, and Total MS – Mean of Squares, the SS divided by the respective degrees of freedom. MS represents the sample, error and model variance respectively F-statistic – this is the MS Model divided by the MS Residual; the numbers in brackets are the respective df Prob>F – this is the p-value associated with F-statistic. It tests the hypothesis that all the model coefficients are 0 R-squared – the proportion of variance in y explained by the independent variables. Adjusted R-squared – in which the addition of extraneous variables to the model is penalized. It is always less than R-squared and increases only if the addition of one more explanatory variable improves the model more than what would be expected by chance Root MSE – the Root of the MS Residual. This is the standard deviation of the residuals
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.