Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 11, 2008Stat 111 - Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics 111 - Lecture 10.

Similar presentations


Presentation on theme: "June 11, 2008Stat 111 - Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics 111 - Lecture 10."— Presentation transcript:

1 June 11, 2008Stat 111 - Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics 111 - Lecture 10

2 June 11, 2008Stat 111 - Lecture 10 - Review2 Administrative Notes Homework 3 is due Monday –Covers material from Chapter 5, so worth doing as practice for the midterm! Exam on Monday –Starts exactly at 10:40 – get here early

3 June 11, 2008Stat 111 - Lecture 10 - Review3 Some Topics Not Covered on Midterm Continuity correction for binomial calculations (chapter 5) Normal quintile plots(chapter 1)

4 June 11, 2008Stat 111 - Lecture 10 - Review4 Experiments Used to examine effect of a treatment eg. medical trials, education interventions Different from an observational study, where no treatment is imposed Observational studies can only examine associations between variables, whereas experiments try to establish causal effects Experiments can still be biased though! Population Experimental Units Treatment Group Control Group Treatment No Treatment Result 1 234

5 June 11, 2008Stat 111 - Lecture 10 - Review5 Just like in experiments, we must be cautious of potential sources of bias in our sampling results Voluntary response samples, undercoverage, non- response, untrue-response, wording of questions Simple Random Sampling: less biased since each individual in the population has an equal chance of being included in the sample Population Sample Parameter Statistic Sampling Inference Estimation ? Sampling and Surveys

6 June 11, 2008Stat 111 - Lecture 10 - Review6 Distributions A distribution describes what values a variable takes and how frequently these values occur Boxplots are good for center and spread, but don’t indicate shape of a distribution Histograms much more effective at displaying the shape of a distribution

7 June 11, 2008Stat 111 - Lecture 10 - Review7 Numerical Measures of Center Mean: Median: “middle number in distribution” Mean is more affected by large outliers and asymmetry than the median Symmetric: Mean ≈ Median Skewed Left: Mean<Median Skewed Right: Mean>Median

8 June 11, 2008Stat 111 - Lecture 10 - Review8 Variance: average of the squared deviations of each observation Standard Deviation = Inter-Quartile Range: IQR = Q3 - Q1 First Quartile (Q1) is the median of the smaller half of data Third Quartile (Q3) is the median of the larger half of data With outliers or asymmetry, median and IQR are better but we will use mean and SD more since most distributions we use (eg. normal distribution) are symmetric with no outliers Numerical Measures of Spread

9 June 11, 2008Stat 111 - Lecture 10 - Review9 Scatterplots of two variables Positiveassociation vs Negative association Some associations are not just positive or negative, but also appear to be linear Correlation is a measure of the strength of linear relationship between variables X and Y r near 1 or -1 means strong linear relationship r near 0 means weak linear relationship Negative r means negative association

10 June 11, 2008Stat 111 - Lecture 10 - Review10 Linear Regression Best fit line between X and Y: Y = a + b·X The slope b( ): average change you get in the Y variable if you increased the X variable by one The intercept a ( ):average value of the Y variable when the X variable is equal to zero Regression equation used to predict response variable Y for a value of our explanatory variable X

11 June 11, 2008Stat 111 - Lecture 10 - Review11 Probability Random process: outcome not known exactly, but have probability distribution of possible outcomes Event: outcome of random process with prob. P(A) Additive Rule for Disjoint Events: P(A or B) = P(A) + P(B) if A and B are disjoint Multiplication Rule for Independent Events: P(A and B) = P(A) x P(B) if A and B are independent Need to combine different rules (Eg. Lecture 8)

12 June 11, 2008Stat 111 - Lecture 10 - Review12 Probability and Random Variables Conditional Probability: Random variable: numerical outcome or summary of a random process A discrete random variable has a finite number of distinct values Continuous random variables can have a non- countable number of values

13 June 11, 2008Stat 111 - Lecture 10 - Review13 Discrete vs. Continuous RV’s Probability histogram for distribution of discrete r.v. Calculate probabilities by adding up bars of histogram Density curve used for distribution of continuous r.v. Calculate probabilities by integrating area under curve

14 June 11, 2008Stat 111 - Lecture 10 - Review14 Linear Transformations of Variables Same rules for both data and random variables: mean(a·X + c) = a·mean(X) + c variance(a·X + c) = a 2 ·variance(X) SD(a·X + c) = |a|· SD(X) Adding constants does not change spread measures Can also do combinations of more than one variable: If X and Y are variables and Z = a·X + b·Y + c mean(Z) = a·mean(X) + b·mean(Y) + c If X and Y are also independent then Variance(Z) = a 2 ·Variance(X) + b 2 ·Variance(Y)

15 June 11, 2008Stat 111 - Lecture 10 - Review15 The Normal Distribution The Normal distribution has the shape of a “bell curve” with parameters  and  2,denoted N( ,  2 ) StandardNormal:  = 0 and  2 = 1 Normal distribution follows the 68-95-99.7 rule: 68% of observations are between  -  and  +  95% of observations are between  - 2  and  + 2  99.7% of observations are between  - 3  and  + 3  Have tables for any probability from the standard normal distribution N(0,1) N(2,1) N(0,2) N(-1,2)

16 June 11, 2008Stat 111 - Lecture 10 - Review16 Standardization For non-standard normal probabilities, need to transform to a standard normal distribution If X has a N( ,  2 ) distribution, then we can convert to Z which follows a N(0,1) distribution: Can then calculate P(Z < k) using table Reverse standardization: converting a standard normal Z into a non-standard normal X X = σZ + μ Practice makes perfect!

17 June 11, 2008Stat 111 - Lecture 10 - Review17 Inference for Continuous Data Continuous data is summarized by sample mean Sample mean is used as our estimate of the population mean, but how does sample mean vary between samples? Population Parameters:  and  2 Distribution of these values? Sample 1 of size n x Sample 2 of size n x Sample 3 of size n x Sample 4 of size n x Sample 5 of size n x Sample 6 of size n x.

18 June 11, 2008Stat 111 - Lecture 10 - Review18 Sampling Distribution of Sample Mean The center of the sampling distribution of the sample mean is the population mean: Over all samples, the sample mean will, on average, be equal to the population mean (no guarantees for 1 sample!) The spread of the sampling distribution of the sample mean is As sample size increases, variance of the sample mean decreases! Central Limit Theorem: if the sample size is large enough, then the sample mean X has an approximately Normal distribution

19 June 11, 2008Stat 111 - Lecture 10 - Review19 Inference for Count Data Goal for count data is to estimate the population proportion p From a sample of size n, we can calculate two statistics: 1. sample count Y 2. sample proportion Use sample proportion as our estimate of population proportion p Sampling Distribution of the Sample Proportion how does sample proportion change over different samples? Population Parameter: p Distribution of these values? Sample 1 of size n Sample 2 of size n Sample 3 of size n Sample 4 of size n Sample 5 of size n Sample 6 of size n.

20 June 11, 2008Stat 111 - Lecture 10 - Review20 Sampling Distribution for Proportion For small samples, use the Binomial distribution to calculate probabilities for the sample count or sample proportion Definition of “small”: n·p < 10 or n·(1-p) < 10 For large samples, we use the Normal approximation to the Binomial distribution for the sample count or sample proportion

21 June 11, 2008Stat 111 - Lecture 10 - Review21 Next Week - Lecture 11 Chapter 6 Good luck on midterm next Monday!


Download ppt "June 11, 2008Stat 111 - Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics 111 - Lecture 10."

Similar presentations


Ads by Google