Presentation on theme: "STAT 101 Dr. Kari Lock Morgan"— Presentation transcript:
1 STAT 101 Dr. Kari Lock Morgan SynthesisBig Picture Essential SynthesisReviewSpeed Dating
2 Final Monday, April 28th, 2 – 5pm No make-ups, no excuses 30% of your course gradeCumulative from the entire courseOpen only to a calculator and 3 double-sided pages of notes prepared only by you
3 Help Before Final Wednesday, 4/23: Thursday, 4/24: Friday, 4/25: 3 – 4pm, Prof Morgan, Old Chem 2164 – 9pm, Stat Ed Help, Old Chem 211AThursday, 4/24:5 – 7pm, Yating, Old Chem 211AFriday, 4/25:1 – 3pm, Prof Morgan, Old Chem 2163 – 4 pm, REVIEW SESSION, room tbdSunday, 4/27:4 – 6pm, Tori, Old Chem 211A6 – 7pm, Stat Ed Help, Old Chem 211A7 – 9pm, David, Old Chem 211AMonday, 4/28:12:30 – 1:30, Prof Morgan, Old Chem 216
4 Review What is Bayes Rule? A way of getting from P(A if B) to P(B if A)A way of calculating P(A and B)A way of calculating P(A or B)
5 Data CollectionThe way the data are/were collected determines the scope of inferenceFor generalizing to the population: was it a random sample? Was there sampling bias?For assessing causality: was it a randomized experiment?Collecting good data is crucial to making good inferences based on the data
6 Exploratory Data Analysis Before doing inference, always explore your data with descriptive statisticsAlways visualize your data! Visualize your variables and relationships between variablesCalculate summary statistics for variables and relationships between variables – these will be key for later inferenceThe type of visualization and summary statistics depends on whether the variable(s) are categorical or quantitative
7 EstimationFor good estimation, provide not just a point estimate, but an interval estimate which takes into account the uncertainty of the statisticConfidence intervals are designed to capture the true parameter for a specified proportion of all samplesA P% confidence interval can be created bybootstrapping (sampling with replacement from the sample) and using the middle P% of bootstrap statistics
8 Hypothesis TestingA p-value is the probability of getting a statistic as extreme as observed, if H0 is trueThe p-value measures the strength of the evidence the data provide against H0“If the p-value is low, the H0 must go”If the p-value is not low, then you can not reject H0 and have an inconclusive test
9 p-value A p-value can be calculated by A randomization test: simulate statistics assuming H0 is true, and see what proportion of simulated statistics are as extreme as that observedCalculating a test statistic and comparing that to a theoretical reference distribution (normal, t, 2, F)
10 Hypothesis Tests Variables Appropriate Test One Quantitative Single mean (t)One CategoricalSingle proportion (normal)Chi-square Goodness of FitTwo CategoricalDifference in proportions (normal)Chi-square Test for AssociationOne Quantitative,Difference in means (t)Matched pairs (t)ANOVA (F)Two QuantitativeCorrelation (t)Slope in Simple Linear Regression (t)More than twoMultiple Regression (t, F)
11 RegressionRegression is a way to predict one response variable with multiple explanatory variablesRegression fits the coefficients of the modelThe model can be used toAnalyze relationships between the explanatory variables and the responsePredict Y based on the explanatory variablesAdjust for confounding variables
13 Romance Do these variables differ for males and females? What variables help to predict romantic interest?Do these variables differ for males and females?All we need to figure this out is DATA!(For all of you, being almost done with STAT 101, this is the case for many interesting questions!)
14 Speed DatingWe will use data from speed dating conducted at Columbia University,276 males and 276 females from Columbia’s various graduate and professional schoolsEach person met with people of the opposite sex for 4 minutes eachAfter each encounter each person said either “yes” (they would like to be put in touch with that partner) or “no”
15 Speed Dating Data What are the cases? Students participating in speed datingSpeed datesRatings of each student
16 Speed Dating What is the population? Ideal population? More realistic population?
17 Speed DatingIt is randomly determined who the students will be paired with for the speed dates.We find that people are significantly more likely to say “yes” to people they think are more intelligent.Can we infer causality between perceived intelligence and wanting a second date?YesNo
18 Successful Speed Date?What is the probability that a speed date is successful (results in both people wanting a second date)?To best answer this question, we should useDescriptive statisticsConfidence IntervalHypothesis TestRegressionBayes Rule
19 Successful Speed Date?63 of the 276 speed dates were deemed successful (both male and female said yes).A 95% confidence interval for the true proportion of successful speed dates is(0.2, 0.3)(0.18, 0.28)(0.21, 0.25)(0.13, 0.33)
20 Pickiness and GenderAre males or females more picky when it comes to saying yes?Guesses?MalesFemales
21 Pickiness and GenderYesNoMales146130Females127149Are males or females more picky when it comes to saying yes? How could you answer this?Test for a single proportionTest for a difference in proportionsChi-square test for associationANOVAEither (b) or (c)
22 Pickiness and GenderDo males and females differ in their pickiness? Using α = 0.05, how would you answer this?a) Yes b) No c) Not enough information
23 ReciprocityMale says YesMale says NoFemale says Yes6364Female says No8366Are people more likely to say yes to someone who says yes back? How would you best answer this?Descriptive statisticsConfidence IntervalHypothesis TestRegressionBayes Rule
24 ReciprocityMale says YesMale says NoFemale says Yes6364Female says No8366Are people more likely to say yes to someone who says yes back? How could you answer this?Test for a single proportionTest for a difference in proportionsChi-square test for associationANOVAEither (b) or (c)
25 ReciprocityAre people more likely to say yes to someone who says yes back?p-value =Based on this data, we cannot determine whether people are more likely to say yes to someone who says yes back.
26 Race and Response: Females Does the chance of females saying yes to males differ by race?How could you answer this question?Test for a single proportionTest for a difference in proportionsChi-square goodness of fitChi-square test for associationANOVAAsianBlackCaucasianLatinoOther0.500.570.420.480.53
27 Race and Response: Males Each person rated their date on a scale of based on how much they liked them overall.Does how much males like females differ by race?How would you test this?Chi-square testt-test for a difference in meansMatched pairs testANOVAEither (b) or (d)
28 Physical Attractiveness Each person also rated their date from 1-10 on the physical attractiveness. Do males rate females higher, or do females rate males higher?Which tool would you use to answer this question?Two-sample difference in meansMatched pair difference in meansChi-SquareANOVACorrelation
29 Physical Attractiveness The histogram shown is of thedatabootstrap distributionrandomization distributionsampling distribution𝑥 𝑀 − 𝑥 𝐹 =0.40695% CI: (0.10, 0.71)p-value =0.01
30 Other RatingsEach person also rated their date from 1-10 on the following attributes:AttractivenessSincerityIntelligenceHow fun the person seemsAmbitionShared interestsWhich of these best predict how much someone will like their date?
32 Ambition and LikingDo people prefer their dates to be less ambitious???How does the perceived ambition of a date relate to how much the date is liked?How would you answer this question?Inference for difference in meansANOVAInference for correlationInference for simple linear regressionEither (b), (c) or (d)
33 Simple Linear Regression MALES RATING FEMALES:FEMALES RATING MALES:
34 Ambition and Liking r = 0.44, SE = 0.05 Find a 95% CI for . Test whether 1 differs from 0.
35 ALL YOU NEED IS DATA!!! After taking STAT 101: Thank You!!! If you have a question that needs answering…ALL YOU NEED IS DATA!!!Thank You!!!