2 Exploring Data Variables can be categorical or quantitative Discrete or continuousFor categorical data, we use bar chartsNumerical data can be displayed using a dotplot, stemplot, box-and-whisker plot, histogram or cumulative frequency plotRemember histograms have no spaces (unless a category has none)Must include key with stemplotAlways label axes and make sure you read the axes when interpreting a graph.
3 Commenting on a graph Shape: symmetric, skewed, unimodal, uniform Center: Mean and medianSpread: Range, standard deviation, Iqr, gaps, outliers (1.5x iqr) added to quartile
4 Effect of changing units Changing units will change measures of center and spread by the same ratio as the multiplier.Adding or subtracting the same constant will change measures of center in a similar manner but will not change measures of spread.Trial Run 1
5 Scatterplots Bivariate, explanatory, response Correlation coefficient (r) -1 to 1R does not change when you switch x and y, nor will it change when you multiply or addOnly measures strength of linear relationshipAffected by outliersLurking variablesDanger of extrapolation
6 Coefficient of determination (r2) Residuals (observed – predicted) Influential pointsTransformationsTrial run
8 Experimental designs: completely randomized, blocks, matched pairs Trial run
9 ProbabilityLaw of large numbers: long-term relative frequency gets closer to true freq. as # trials increasesDisjoint (mutually exclusive): cannot occur simultaneouslyMand and ortConditional probability:Independence: knowing one has occurred doesn’t change chance of the other
10 Probability distributions Matches all possible values of variable with probability of it happeningAll probabilities must be between 0 and 1Total of probabilities must be 1Mean:Variance
11 Binomial Random Variables Fixed number of trials, success or failureP remains constant each trialEach trial is independent(nCr) pr (1-p)n-rMean: npVariance: np(1-p)
12 Geometric Random Variable Success or failureP constant, each trial independentHow many times until ….Probability k trials occur before …p (1-p)k-1Trial run
13 Combining Variables Mean (x+y) = mean (x) + mean (y) If independent: variance (x+y)= var(x)+var(y)
14 Normal distributions Z-score Standardize endpoints, find area under curveTrial run
15 Sampling distributions All possible random samples are taken and used to create a sampling distribution of the sample meanStandard dev. :Central Limit Theorem: as the size of an SRS increases, the shape of the sampling dist. tends toward normal
16 Hypothesis Testing Sample Proportion Ho: Ha: Test Statistic Pvalue Assumptions: p is from a random sampleSample size is large (np>10 and n(1-p)>10)Sample no more than 10% of population
17 Sample Mean Ho: Ha: Test Statistic P value Assumptions: from a random sampleSample size is large (>30) or population distribution is approximately normal
18 Hypothesis Testing Difference in 2 sample proportions: Ho: Ha: Test statistic:P valueAssumptions: independently chosen random samples or treatments were assigned at random to individualsBoth sample sizes are large (np>10, n(1-p)>10 works for both of them
19 Hypothesis Testing Difference in two sample means Ho: Ha: Test StatisticP valueAssumptions: 2 sample are independently selected random samplesSample size large (>30) or population distributions are approximately normal
20 Hypothesis Testing Paired t test comparing 2 population means Ho: µd = hypothesized valueHa: µd < > ≠ hypothesized valueTest statistic:Pvalue:Assumptions: Samples are pairedRandom samples from a pop. Of differencesSample size is large (>30) or population distribution of differences is about normal
21 Hypothesis Testing Chi-Square GOF Ho: Ha: Test Statistic P value Assumptions: based on random sampleSample size is large – every expected cell count at least 5Degrees of freedom?
22 Hypothesis TestingChi-Square Test of Homogeneity or Independence (2 way table)Ho: There is no relationship between __and _Ha: Ho not trueTest Statistic:P valueAssumptions: independently chosen random samples or random assignation to groupsAll expected cell counts are at least 5Degrees of freedom?
23 Hypothesis Testing (last one!!) Chi-square test for slopeHo:Ha:Test statistic:P valueAssumptions: dist. of e has mean value=0, std. dev. of e does not depend on x, dist. of e is normal, random dev. of e are independent of each otherDegrees of freedom: n-2
24 Confidence Intervals Statistic ± margin of error(also called bound) Margin of error is combination of 2 numbers:(Critical value ) (standard error)