Download presentation
Presentation is loading. Please wait.
Published bySpencer McDowell Modified over 8 years ago
1
Stats 244.3(02) Review
2
Summarizing Data Graphical Methods
3
Histogram Stem-Leaf Diagram Grouped Freq Table Box-whisker Plot
4
Summary Numerical Measures
5
Measure of Central Location 1.Mean Center of gravity 2.Median “middle” observation
6
Measure of Non-Central Location 1.Percentiles 2.Quartiles 1.Lower quartile (Q 1 ) (25 th percentile) (lower mid-hinge) 2.median (Q 2 ) (50 th percentile) (hinge) 3.Upper quartile (Q 3 ) (75 th percentile) (upper mid-hinge)
7
Measure of Variability (Dispersion, Spread) 1.Range 2.Inter-Quartile Range 3.Variance, standard deviation 4.Pseudo-standard deviation
8
1.Range R = Range = max - min 2.Inter-Quartile Range (IQR) Inter-Quartile Range = IQR = Q 3 - Q 1
9
The Sample Variance Is defined as the quantity: and is denoted by the symbol
10
The Sample Standard Deviation s Definition: The Sample Standard Deviation is defined by: Hence the Sample Standard Deviation, s, is the square root of the sample variance.
11
Interpretations of s In Normal distributions –Approximately 2/3 of the observations will lie within one standard deviation of the mean –Approximately 95% of the observations lie within two standard deviations of the mean –In a histogram of the Normal distribution, the standard deviation is approximately the distance from the mode to the inflection point
12
s Inflection point Mode
13
s 2/3 s
14
2s
15
Computing formulae for s and s 2 The sum of squares of deviations from the the mean can also be computed using the following identity:
16
Then:
18
A quick (rough) calculation of s The reason for this is that approximately all (95%) of the observations are between and Thus
19
The Pseudo Standard Deviation (PSD) Definition: The Pseudo Standard Deviation (PSD) is defined by:
20
Properties For Normal distributions the magnitude of the pseudo standard deviation (PSD) and the standard deviation (s) will be approximately the same value For leptokurtic distributions the standard deviation (s) will be larger than the pseudo standard deviation (PSD) For platykurtic distributions the standard deviation (s) will be smaller than the pseudo standard deviation (PSD)
21
Measures of Shape
22
Skewness Kurtosis
23
Skewness – based on the sum of cubes Kurtosis – based on the sum of 4 th powers
24
The Measure of Skewness
25
The Measure of Kurtosis
26
Interpretations of Measures of Shape Skewness Kurtosis g 1 > 0g 1 = 0 g 1 < 0 g 2 < 0 g 2 = 0 g 2 > 0
27
Inferential Statistics Making decisions regarding the population base on a sample
28
Estimation by Confidence Intervals Definition –An (100) P% confidence interval of an unknown parameter is a pair of sample statistics (t 1 and t 2 ) having the following properties: 1. P[t 1 < t 2 ] = 1. That is t 1 is always smaller than t 2. 2. P[the unknown parameter lies between t 1 and t 2 ] = P. the statistics t 1 and t 2 are random variables Property 2. states that the probability that the unknown parameter is bounded by the two statistics t 1 and t 2 is P.
29
Confidence Interval for a Proportion
30
The sample size that will estimate p with an Error Bound B and level of confidence P = 1 – is: where: B is the desired Error Bound z is the /2 critical value for the standard normal distribution p* is some preliminary estimate of p. Determination of Sample Size
31
Confidence Intervals for the mean of a Normal Population,
32
The sample size that will estimate with an Error Bound B and level of confidence P = 1 – is: where: B is the desired Error Bound z is the /2 critical value for the standard normal distribution s* is some preliminary estimate of s. Determination of Sample Size
33
Hypothesis Testing An important area of statistical inference
34
Definition Hypothesis (H) –Statement about the parameters of the population In hypothesis testing there are two hypotheses of interest. –The null hypothesis (H 0 ) –The alternative hypothesis (H A )
35
Type I, Type II Errors 1.Rejecting the null hypothesis when it is true. (type I error) 2.accepting the null hypothesis when it is false (type II error)
36
Decision Table showing types of Error H 0 is TrueH 0 is False Correct Decision Type I Error Type II Error Accept H 0 Reject H 0
37
To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test statistic into two parts The Acceptance Region The Critical Region
38
To perform a statistical Test we 1.Collect the data. 2.Compute the value of the test statistic. 3.Make the Decision: If the value of the test statistic is in the Acceptance Region we decide to accept H 0. If the value of the test statistic is in the Critical Region we decide to reject H 0.
39
Probability ofhe two types of error Definitions: For any statistical testing procedure define = P[Rejecting the null hypothesis when it is true] = P[ type I error] = P[accepting the null hypothesis when it is false] = P[ type II error]
40
Determining the Critical Region 1.The Critical Region should consist of values of the test statistic that indicate that H A is true. (hence H 0 should be rejected). 2.The size of the Critical Region is determined so that the probability of making a type I error, , is at some pre-determined level. (usually 0.05 or 0.01). This value is called the significance level of the test. Significance level = P[test makes type I error]
41
To find the Critical Region 1.Find the sampling distribution of the test statistic when is H 0 true. 2.Locate the Critical Region in the tails (either left or right or both) of the sampling distribution of the test statistic when is H 0 true. Whether you locate the critical region in the left tail or right tail or both tails depends on which values indicate H A is true. The tails chosen = values indicating H A.
42
3.the size of the Critical Region is chosen so that the area over the critical region and under the sampling distribution of the test statistic when is H 0 true is the desired level of =P[type I error] Sampling distribution of test statistic when H 0 is true Critical Region - Area =
43
The z-tests Testing the probability of success Testing the mean of a Normal Population
44
The Alternative Hypothesis H A The Critical Region Critical Regions for testing the probability of success, p
45
The Alternative Hypothesis H A The Critical Region Critical Regions for testing mean, of a normal population
46
You can compare a statistical test to a meter Value of test statistic Acceptance Region Critical Region Critical Region Critical Region is the red zone of the meter
47
Value of test statistic Acceptance Region Critical Region Critical Region Accept H 0
48
Value of test statistic Acceptance Region Critical Region Critical Region Reject H 0
49
Acceptance Region Critical Region Sometimes the critical region is located on one side. These tests are called one tailed tests.
50
Whether you use a one tailed test or a two tailed test depends on: 1.The hypotheses being tested (H 0 and H A ). 2.The test statistic.
51
If only large positive values of the test statistic indicate H A then the critical region should be located in the positive tail. (1 tailed test) If only large negative values of the test statistic indicate H A then the critical region should be located in the negative tail. (1 tailed test) If both large positive and large negative values of the test statistic indicate H A then the critical region should be located both the positive and negative tail. (2 tailed test)
52
Usually 1 tailed tests are appropriate if H A is one-sided. Two tailed tests are appropriate if H A is two - sided. But not always
53
The p-value approach to Hypothesis Testing
54
Definition – Once the test statistic has been computed form the data the p-value is defined to be: p-value = P[the test statistic is as or more extreme than the observed value of the test statistic when H 0 is true] more extreme means giving stronger evidence to rejecting H 0
55
Properties of the p -value 1.If the p-value is small (<0.05 or 0.01) H 0 should be rejected. 2.The p-value measures the plausibility of H 0. 3.If the test is two tailed the p-value should be two tailed. 4. If the test is one tailed the p-value should be one tailed. 5.It is customary to report p-values when reporting the results. This gives the reader some idea of the strength of the evidence for rejecting H 0
56
Summary A common way to report statistical tests is to compute the p-value. If the p-value is small ( < 0.05 or < 0.01) then H 0 is rejected. If the p-value is extremely small this gives a strong indication that H A is true. If the p-value is marginally above the threshold 0.05 then we cannot reject H 0 but there would be a suspicion that H 0 is false.
57
“Students” t-test
58
The Situation Let x 1, x 2, x 3, …, x n denote a sample from a normal population with mean and standard deviation . Both and are unknown. Let we want to test if the mean, , is equal to some given value 0.
59
The Test Statistic The sampling distribution of the test statistic is the t distribution with n - 1 degrees of freedom
60
The Alternative Hypothesis H A The Critical Region t and t /2 are critical values under the t distribution with n – 1 degrees of freedom
61
Critical values for the t-distribution or /2
62
Confidence Intervals using the t distribution
63
Confidence Intervals for the mean of a Normal Population, , using the Standard Normal distribution Confidence Intervals for the mean of a Normal Population, , using the t distribution
64
Testing and Estimation of Variances
65
The statistic has a 2 distribution with n – 1 degrees of freedom Sampling Theory
66
Critical Points of the 2 distribution
67
Hence (1 – )100% confidence limits for 2 are: and (1 – )100% confidence limits for are: Confidence intervals for 2 and
68
Testing Hypotheses for 2 and . Suppose we want to test: The test statistic: If H 0 is true the test statistic, U, has a 2 distribution with n – 1 degrees of freedom: Thus we reject H 0 if
69
/2 Accept Reject
70
One-tailed Tests for 2 and . Suppose we want to test: The test statistic: We reject H 0 if
71
Accept Reject
72
Or suppose we want to test: The test statistic: We reject H 0 if
73
Accept Reject
74
Comparing Populations Proportions and means
75
Comparing proportions
76
Comparing two binomial probabilities p 1 and p 2 where The test statistic
77
The Alternative Hypothesis H A The Critical Region
78
100(1 – ) % Confidence Interval for = p 1 – p 2 :
79
Sample size determination Confidence Interval for = p 1 – p 2 : Again we want to choose n 1 and n 2 to set B at some predetermined level with a fixed level of confidence 1 – .
80
then Special solutions - case 1: n 1 = n 2 = n.
81
Special solutions - case 2: Choose n 1 and n 2 to minimize N = n 1 + n 2 = total sample size then
82
Note: Special solutions - case 3: Choose n 1 and n 2 to minimize C = C 0 + c 1 n 1 + c 2 n 2 = total cost of the study C 0 = fixed (set-up) costs c 1 = cost per unit in population 1 c 2 = cost per unit in population 2 then
83
Comparing Means
84
The z-test n and m large
85
Confidence Interval for = 1 – 2 :
86
Sample size determination The sample sizes required, n 1 and n 2, to estimate 1 – 2 within an error bound B with level of confidence 1 – are : Minimizing the total sample size N = n 1 + n 2. Equal sample sizes Minimizing the total cost C = C 0 + c 1 n 1 + c 2 n 2.
87
The t test – for comparing means – small samples (equal variances) Situation We have two normal populations (1 and 2) Let 1 and denote the mean and standard deviation of population 1. Let 2 and denote the mean and standard deviation of population 1. Note: we assume that the standard deviation for each population is the same. 1 = 2 =
88
The t test for comparing means – small samples (equal variances)
89
The Alternative Hypothesis H A The Critical Region are critical points under the t distribution with degrees of freedom n + m –2.
90
Confidence intervals for the difference in two means of normal populations (small sample sizes equal variances) (1 – )100% confidence limits for 1 – 2 where
91
Tests, Confidence intervals for the difference in two means of normal populations (small sample sizes, unequal variances)
92
The approximate test for a comparing two means of Normal Populations (unequal variances) Null HypothesisAlt. HypothesisCritical Region H 0 : 1 = 2 H 0 : 1 ≠ 2 t t H 0 : 1 > 2 t > t H 0 : 1 < 2 t < -t Test statistic
93
Confidence intervals for the difference in two means of normal populations (small samples, unequal variances) (1 – )100% confidence limits for 1 – 2 with
94
The paired t-test An example of improved experimental design
95
The matched pair experimental design (The paired sample experiment) Prior to assigning the treatments the subjects are grouped into pairs of similar subjects. Suppose that there are n such pairs (Total of 2n = n + n subjects or cases), The two treatments are then randomly assigned to each pair. One member of a pair will receive treatment 1, while the other receives treatment 2. The data collected is as follows: –(x 1, y 1 ), (x 2,y 2 ), (x 3,y 3 ),, …, (x n, y n ). x i = the response for the case in pair i that receives treatment 1. y i = the response for the case in pair i that receives treatment 2.
96
Let x i = the measurement of the response for the subject in pair i that received treatment 1. Let y i = the measurement of the response for the subject in pair i that received treatment 2. x1y1x1y1 The data x2y2x2y2 x3y3x3y3 … xnynxnyn
97
To test H 0 : 1 = 2 is equivalent to testing H 0 : d = 0. (we have converted the two sample problem into a single sample problem). The test statistic is the single sample t-test on the differences d 1, d 2, d 3, …, d n namely df = n - 1
98
Testing for the equality of variances The F test
99
The test statistic (F) The sampling distribution of the test statistic If the Null Hypothesis (H 0 ) is true then the sampling distribution of F is called the F-distribution with 1 = n - 1 degrees in the numerator and 2 = m - 1 degrees in the denominator
100
The F distribution 1 = n - 1 degrees in the numerator 2 = m - 1 degrees in the denominator F ( 1, 2 )
101
(Two sided alternative) Reject H 0 if or Critical region for the test:
102
Reject H 0 if Critical region for the test (one tailed): (one sided alternative)
103
Summary of Tests
104
One Sample Tests p = p 0 p > p 0 p ≠ p 0 p < p 0
105
Two Sample Tests
106
Two Sample Tests - continued SituationTest statisticH0H0 HAHA Critical Region Two independent Normal samples with unknown means and variances (unequal) ≠≠ t t df = * >> t > t df = * << t < - t df = * Two independent Normal samples with unknown means and variances (unequal) ≠≠ F > F (n-1, m -1) or 1/F > F (m-1, n -1) >> F > F (n-1, m -1) << 1/F > F (m-1, n -1) * = 1 1 2 2 1 1 1 n2n2 n2n2 n2n2
107
The paired t test SituationTest statisticH0H0 HAHA Critical Region n matched pair of subjects are treated with two treatments. d i = x i – y i has mean = – ≠≠ t t df = n - 1 >> t > t df = n - 1 << t < - t df = n - 1 Independent samples Treat 1 Treat 2 Matched Pairs Pair 1 Treat 2 Pair 2 Pair 3 Pair n Treat 1 Possibly equal numbers
108
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
109
The F test
110
The F test – for comparing k means Situation We have k normal populations Let i and denote the mean and standard deviation of population i. i = 1, 2, 3, … k. Note: we assume that the standard deviation for each population is the same. 1 = 2 = … = k =
111
We want to test against
112
To test against use the test statistic
113
is called the Between Sum of Squares and is denoted by SS Between It measures the variability between samples the statistic k – 1 is known as the Between degrees of freedom and is called the Between Mean Square and is denoted by MS Between
114
is called the Within Sum of Squares and is denoted by SS Within the statistic is known as the Within degrees of freedom and is called the Within Mean Square and is denoted by MS Within
115
then
116
The Computing formula for F: Compute 1) 2) 3) 4) 5)
117
Then 1) 2) 3)
118
We reject if F is the critical point under the F distribution with 1 = k - 1degrees of freedom in the numerator and 2 = N – k degrees of freedom in the denominator The critical region for the F test
119
The ANOVA Table A convenient method for displaying the calculations for the F-test
120
Sourced.f.Sum of Squares Mean Square F-ratio Betweenk - 1SS Between MS Between MS B /MS W WithinN - kSS Within MS Within TotalN - 1SS Total Anova Table
121
Fishers LSD (least significant difference) procedure: 1.Test H 0 : 1 = 2 = 3 = … = k against H A : at least one pair of means are different, using the ANOVA F-test 2.If H 0 is accepted we know that all means are equal (not significantly different). Then stop in this case 3.If H 0 is rejected we conclude that at least one pair of means is significantly different, then follow this by using two sample t tests to determine which pairs means are significantly different
122
Comparing k Populations Proportions The 2 test for independence
123
12cTotal 1x 11 x 12 R1R1 2 x 21 x 22 R2R2 RrRr TotalC1C1 C2C2 CcCc N 1.The no. of populations (columns) k (or c) 2.The number of categories (rows) from 2 to r.
124
The 2 test for independence
125
Situation We have two categorical variables R and C. The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count x ij = the number of subjects for which R = i and C = j. R = rows, C = columns
126
The 2 test for independence Define = Expected frequency in the (i,j) th cell in the case of independence.
127
Use test statistic E ij = Expected frequency in the (i,j) th cell in the case of independence. H 0 : R and C are independent against H A : R and C are not independent Then to test x ij = observed frequency in the (i,j) th cell
128
Sampling distribution of test statistic when H 0 is true - 2 distribution with degrees of freedom = (r - 1)(c - 1) Critical and Acceptance Region Reject H 0 if : Accept H 0 if :
129
Linear Regression Hypothesis testing and Estimation
130
Assume that we have collected data on two variables X and Y. Let ( x 1, y 1 ) ( x 2, y 2 ) ( x 3, y 3 ) … ( x n, y n ) denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)
131
The Statistical Model
132
Each y i is assumed to be randomly generated from a normal distribution with mean i = + x i and standard deviation . ( , and are unknown) yiyi + x i xixi Y = + X slope =
133
The Data The Linear Regression Model The data falls roughly about a straight line. Y = + X unseen
134
The Least Squares Line Fitting the best straight line to “linear” data
135
Let Y = a + b X denote an arbitrary equation of a straight line. a and b are known values. This equation can be used to predict for each value of X, the value of Y. For example, if X = x i (as for the i th case) then the predicted value of Y is:
136
The residual can be computed for each case in the sample, The residual sum of squares (RSS) is a measure of the “goodness of fit of the line Y = a + bX to the data
137
The optimal choice of a and b will result in the residual sum of squares attaining a minimum. If this is the case than the line: Y = a + bX is called the Least Squares Line
138
Comments and are the slope and intercept of the regression line (unseen) b and a are the slope and intercept of the least squares line (calculated from the data They represent the same quantities
139
The equation for the least squares line Let
140
Computing Formulae:
141
Then the slope of the least squares line can be shown to be:
142
and the intercept of the least squares line can be shown to be:
143
The residual sum of Squares Computing formula
144
Estimating , the standard deviation in the regression model : This estimate of is said to be based on n – 2 degrees of freedom Computing formula
145
Sampling distributions of the estimators
146
The sampling distribution slope of the least squares line : It can be shown that b has a normal distribution with mean and standard deviation
147
Thus has a standard normal distribution, and has a t distribution with df = n - 2
148
(1 – )100% Confidence Limits for slope : t /2 critical value for the t-distribution with n – 2 degrees of freedom
149
Testing the slope The test statistic is: - has a t distribution with df = n – 2 if H 0 is true.
150
The Critical Region Reject df = n – 2 This is a two tailed tests. One tailed tests are also possible
151
The sampling distribution intercept of the least squares line : It can be shown that a has a normal distribution with mean and standard deviation
152
Thus has a standard normal distribution and has a t distribution with df = n - 2
153
(1 – )100% Confidence Limits for intercept : t /2 critical value for the t-distribution with n – 2 degrees of freedom
154
Testing the intercept The test statistic is: - has a t distribution with df = n – 2 if H 0 is true.
155
The Critical Region Reject df = n – 2
156
Confidence Limits for Points on the Regression Line The intercept is a specific point on the regression line. It is the y – coordinate of the point on the regression line when x = 0. It is the predicted value of y when x = 0. We may also be interested in other points on the regression line. e.g. when x = x 0 In this case the y – coordinate of the point on the regression line when x = x 0 is + x 0
157
x0x0 + x 0 y = + x
158
(1- )100% Confidence Limits for + x 0 : t /2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
159
Prediction Limits for new values of the Dependent variable y An important application of the regression line is prediction. Knowing the value of x (x 0 ) what is the value of y? The predicted value of y when x = x 0 is: This in turn can be estimated by:.
160
The predictor Gives only a single value for y. A more appropriate piece of information would be a range of values. A range of values that has a fixed probability of capturing the value for y. A (1- )100% prediction interval for y.
161
(1- )100% Prediction Limits for y when x = x 0 : t /2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom
162
Correlation
163
The statistic: Definition is called Pearsons correlation coefficient
164
1.-1 ≤ r ≤ 1, |r| ≤ 1, r 2 ≤ 1 2.|r| = 1 (r = +1 or -1) if the points (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ) lie along a straight line. (positive slope for +1, negative slope for -1) Properties
165
The test for independence (zero correlation) The test statistic: Reject H 0 if |t| > t a/2 (df = n – 2) H 0 : X and Y are independent H A : X and Y are correlated The Critical region This is a two-tailed critical region, the critical region could also be one-tailed
166
Spearman’s rank correlation coefficient (rho)
167
Spearman’s rank correlation coefficient (rho) Spearman’s rank correlation coefficient is computed as follows: Arrange the observations on X in increasing order and assign them the ranks 1, 2, 3, …, n Arrange the observations on Y in increasing order and assign them the ranks 1, 2, 3, …, n. For any case (i) let ( x i, y i ) denote the observations on X and Y and let ( r i, s i ) denote the ranks on X and Y.
168
Spearman’s rank correlation coefficient is defined as follows: For each case let d i = r i – s i = difference in the two ranks. Then Spearman’s rank correlation coefficient ( ) is defined as follows:
169
Properties of Spearman’s rank correlation coefficient 1.The value of is always between –1 and +1. 2.If the relationship between X and Y is positive, then will be positive. 3.If the relationship between X and Y is negative, then will be negative. 4.If there is no relationship between X and Y, then will be zero. 5.The value of will be +1 if the ranks of X completely agree with the ranks of Y. 6.The value of will be -1 if the ranks of X are in reverse order to the ranks of Y.
170
Relationship between Regression and Correlation
171
Recall Also since Thus the slope of the least squares line is simply the ratio of the standard deviations × the correlation coefficient
172
The coefficient of Determination
173
Sums of Squares associated with Linear Regresssion = SS unexplained
174
It can be shown: (Total variability in Y) = (variability in Y explained by X) + (variability in Y unexplained by X)
175
It can also be shown: = proportion variability in Y explained by X. = the coefficient of determination
176
Further: = proportion variability in Y that is unexplained by X.
177
Regression (in general)
178
In many experiments we would have collected data on a single variable Y (the dependent variable ) and on p (say) other variables X 1, X 2, X 3,..., X p (the independent variables). One is interested in determining a model that describes the relationship between Y (the response (dependent) variable) and X 1, X 2, …, X p (the predictor (independent) variables. This model can be used for –Prediction –Controlling Y by manipulating X 1, X 2, …, X p
179
The Model: is an equation of the form Y = f(X 1, X 2,...,X p | 1, 2,..., q ) + where 1, 2,..., q are unknown parameters of the function f and is a random disturbance (usually assumed to have a normal distribution with mean 0 and standard deviation .
180
The Multiple Linear Regression Model
181
In Multiple Linear Regression we assume the following model Y = 0 + 1 X 1 + 2 X 2 +... + p X p + This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where 0, 1, 2,..., p are unknown parameters and is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation .
182
Summary of the Statistics used in Multiple Regression
183
The Least Squares Estimates: - the values that minimize
184
The Analysis of Variance Table Entries a) Adjusted Total Sum of Squares (SS Total ) b) Residual Sum of Squares (SS Error ) c) Regression Sum of Squares (SS Reg ) Note: i.e. SS Total = SS Reg +SS Error
185
The Analysis of Variance Table SourceSum of Squaresd.f.Mean SquareF RegressionSS Reg pSS Reg /p = MS Reg MS Reg /s 2 ErrorSS Error n-p-1SS Error /(n-p-1) =MS Error = s 2 TotalSS Total n-1
186
Uses: 1.To estimate 2 (the error variance). - Use s 2 = MS Error to estimate 2. 2.To test the Hypothesis H 0 : 1 = 2 =... = p = 0. Use the test statistic - Reject H 0 if F > F (p,n-p-1).
187
3.To compute other statistics that are useful in describing the relationship between Y (the dependent variable) and X 1, X 2,...,X p (the independent variables). a)R 2 = the coefficient of determination = SS Reg /SS Total = = the proportion of variance in Y explained by X 1, X 2,...,X p 1 - R 2 = the proportion of variance in Y that is left unexplained by X 1, X2,..., X p = SS Error /SS Total.
188
b)R a 2 = "R 2 adjusted" for degrees of freedom. = 1 -[the proportion of variance in Y that is left unexplained by X 1, X 2,..., X p adjusted for d.f.]
189
c) R= R 2 = the Multiple correlation coefficient of Y with X 1, X 2,...,X p = = the maximum correlation between Y and a linear combination of X 1, X 2,...,X p Comment: The statistics F, R 2, R a 2 and R are equivalent statistics.
190
Logistic regression
191
The dependent variable y is binary. It takes on two values “Success” (1) or “Failure” (0) This is the situation in which Logistic Regression is used We are interested in predicting a y from a continuous dependent variable x.
192
The logisitic Regression Model Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. The ratio: is called the odds ratio This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio
193
The logisitic Regression Model i. e. : In terms of the odds ratio Assumes the log odds ratio is linearly related to x.
194
The logisitic Regression Model Solving for p in terms x.
195
Interpretation of the parameter 0 (determines the intercept) p x
196
Interpretation of the parameter 1 (determines when p is 0.50 (along with 0 )) p x when
197
Interpretation of the parameter 1 (determines slope when p is 0.50 ) p x
198
The Multiple Logistic Regression model
199
Here we attempt to predict the outcome of a binary response variable Y from several independent variables X 1, X 2, … etc
200
Nonparametric Statistical Methods
201
Definition When the data is generated from process (model) that is known except for finite number of unknown parameters the model is called a parametric model. Otherwise, the model is called a non- parametric model Statistical techniques that assume a non- parametric model are called non-parametric.
202
Nonparametric Statistical Methods
203
The sign test A nonparametric test for the central location of a distribution
204
To carry out the The Sign test: S = the number of observations that exceed 0 = s observed p-value = P [S ≥ s observed ] ( = 2 P [S ≥ s observed ] for 2-tailed test) where S is binomial, n = sample size, p = 0.50 1.Compute the test statistic: 2.Compute the p-value of test statistic, s observed : 3.Reject H 0 if p-value low (< 0.05)
205
Sign Test for Large Samples
206
If n is large we can use the Normal approximation to the Binomial. Namely S has a Binomial distribution with p = ½ and n = sample size. Hence for large n, S has approximately a Normal distribution with mean and standard deviation
207
Hence for large n,use as the test statistic (in place of S) Choose the critical region for z from the Standard Normal distribution. i.e. Reject H 0 if z z /2 two tailed ( a one tailed test can also be set up.
208
Nonparametric Confidence Intervals
209
Now arrange the data x 1, x 2, x 3, … x n in increasing order Assume that the data, x 1, x 2, x 3, … x n is a sample from an unknown distribution. Hence x (1) < x (2) < x (3) < … < x (n) x (1) = the smallest observation x (2) = the 2 nd smallest observation x (n) = the largest observation Consider the k th smallest observation and the k th largest observation in the data x 1, x 2, x 3, … x n x (k) and x (n – k + 1)
210
Hence P[x (k) < median < x (n – k + 1) ] = p(k) + p(k + 1) + … + p(n-k) = P = P[k ≤ the no. of obs greater than the median ≤ n-k] where p(i)’s are binomial probabilities with n = the sample size and p =1/2. This means that x (k) to x (n – k + 1) is a P(100)% confidence interval for the median Choose k so that P = p(k) + p(k + 1) + … + p(n-k) is close to.95 (or 0.99)
211
Summarizing where P = p(k) + p(k + 1) + … + p(n-k) and p(i)’s are binomial probabilities with n = the sample size and p =1/2. x (k) to x (n – k + 1) is a P(100)% confidence interval for the median
212
For large values of n one can use the normal approximation to the Binomial to find the value of k so that x (k) to x (n – k + 1) is a 95% confidence interval for the median.
214
The Wilcoxon Signed Rank Test An Alternative to the sign test
215
For Wicoxon’s signed-Rank test we would assign ranks to the absolute values of (x 1 – 0, x 2 – 0, …, x n – 0 ). A rank of 1 to the value of x i – 0 which is smallest in absolute value. A rank of n to the value of x i – 0 which is largest in absolute value. W + = the sum of the ranks associated with positive values of x i – 0. W - = the sum of the ranks associated with negative values of x i – 0.
216
To carry out Wilcoxon’s signed rank test We 1.Compute T = W + or W - (usually it would be the smaller of the two) 2.Let t observed = the observed value of T. 3.Compute the p-value = P[T ≤ t observed ] (2 P[T ≤ t observed ] for a two-tailed test). i.For n ≤ 12 use the table. ii.For n > 12 use the Normal approximation. 4.Conclude H A (Reject H 0 ) if p-value is less than 0.05 (or 0.01).
217
For sample sizes, n > 12 we can use the fact that T (W + or W - ) has approximately a normal distribution with
218
1.The t – test i.This test requires the assumption of normality. ii.If the data is not normally distributed the test is invalid The probability of a type I error may not be equal to its desired value (0.05 or 0.01) iii.If the data is normally distributed, the t-test commits type II errors with a smaller probability than any other test (In particular Wilcoxon’s signed rank test or the sign test) 2.The sign test i.This test does not require the assumption of normality (true also for Wilcoxon’s signed rank test). ii.This test ignores the magnitude of the observations completely. Wilcoxon’s test takes the magnitude into account by ranking them Comments
219
Two-sample – Non-parametic tests
220
Mann-Whitney Test A non-parametric two sample test for comparison of central location
221
The Mann-Whitney Test This is a non parametric alternative to the two sample t test (or z test) for independent samples. These tests (t and z) assume the data is normal The Mann- Whitney test does not make this assumption. Sample of n from population 1 x 1, x 2, x 3, …, x n Sample of m from population 2 y 1, y 2, y 3, …, y m
222
The Mann-Whitney test statistics U 1 and U 2 Arrange the observations from the two samples combined in increasing order (retaining sample membership) and assign ranks to the observations. Let W 1 = the sum of the ranks for sample 1. Let W 2 = the sum of the ranks for sample 2. Then and
223
The distribution function of U (U 1 or U 2 ) has been tabled for various values of n and m (<n) when the two observations are coming from the same distribution. These tables can be used to set up critical regions for the Mann-Whitney U test.
224
The Mann-Whitney test for large samples For large samples (n > 10 and m >10) the statistics U 1 and U 2 have approximately a Normal distribution with mean and standard deviation
225
Thus we can convert U i to a standard normal statistic And reject H 0 if z z /2 (for a two tailed test)
226
The Kruskal Wallis Test Comparing the central location for k populations An nonparametric alternative to the one-way ANOVA F-test
227
Situation: Data is collected from k populations. The sample size from population i is n i. The data from population i is:
228
The computation of The Kruskal-Wallis statistic We group the N = n 1 + n 2 + … + n k observation from k populations together and rank these observations from 1 to N. Let r ij be the rank associated with with the observation x ij. Handling of “tied” observations If a group of observations are equal the ranks that would have been assigned to those observations are averaged
229
The Kruskal-Wallis statistic where = the sum of the ranks for the i th sample
230
The Kruskal-Wallis test Reject H 0 : the k populations have same central location
231
Probability Theory Probability – Models for random phenomena
232
Definitions
233
The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
234
An Event, E The event, E, is any subset of the sample space, S. i.e. any set of outcomes (not necessarily all outcomes) of the random phenomena S E Venn diagram
235
The event, E, is said to have occurred if after the outcome has been observed the outcome lies in E. S E
236
Set operations on Events Union Let A and B be two events, then the union of A and B is the event (denoted by A B) defined by: A B = {e| e belongs to A or e belongs to B} A B AB
237
The event A B occurs if the event A occurs or the event and B occurs. A B AB
238
Intersection Let A and B be two events, then the intersection of A and B is the event (denoted by A B) defined by: A B = {e| e belongs to A and e belongs to B} A B AB
239
AB The event A B occurs if the event A occurs and the event and B occurs.
240
Complement Let A be any event, then the complement of A (denoted by ) defined by: = {e| e does not belongs to A} A
241
The event occurs if the event A does not occur A
242
In problems you will recognize that you are working with: 1.Union if you see the word or, 2.Intersection if you see the word and, 3.Complement if you see the word not.
243
Definition: mutually exclusive Two events A and B are called mutually exclusive if: A B
244
If two events A and B are are mutually exclusive then: A B 1.They have no outcomes in common. They can’t occur at the same time. The outcome of the random experiment can not belong to both A and B.
245
Rules of Probability
246
The additive rule P[A B] = P[A] + P[B] – P[A B] and if A B = P[A B] = P[A] + P[B]
247
The Rule for complements for any event E
248
Conditional probability
249
The multiplicative rule of probability and if A and B are independent. This is the definition of independent
250
Counting techniques
251
Summary of counting rules Rule 1 n(A 1 A 2 A 3 …. ) = n(A 1 ) + n(A 2 ) + n(A 3 ) + … if the sets A 1, A 2, A 3, … are pairwise mutually exclusive (i.e. A i A j = ) Rule 2 n 1 = the number of ways the first operation can be performed n 2 = the number of ways the second operation can be performed once the first operation has been completed. N = n 1 n 2 = the number of ways that two operations can be performed in sequence if
252
Rule 3 n 1 = the number of ways the first operation can be performed n i = the number of ways the i th operation can be performed once the first (i - 1) operations have been completed. i = 2, 3, …, k N = n 1 n 2 … n k = the number of ways the k operations can be performed in sequence if
253
Basic counting formulae 1.Orderings 2.Permutations The number of ways that you can choose k objects from n in a specific order 3.Combinations The number of ways that you can choose k objects from n (order of selection irrelevant)
254
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment
255
Random variables are either Discrete –Integer valued –The set of possible values for X are integers Continuous –The set of possible values for X are all real numbers –Range over a continuum.
256
The Probability distribution of A random variable A Mathematical description of the possible values of the random variable together with the probabilities of those values
257
The probability distribution of a discrete random variable is describe by its : probability function p(x). p(x) = the probability that X takes on the value x. This can be given in either a tabular form or in the form of an equation. It can also be displayed in a graph.
258
Comments: Every probability function must satisfy: 1.The probability assigned to each value of the random variable must be between 0 and 1, inclusive: 2.The sum of the probabilities assigned to all the values of the random variable must equal 1: 3.
259
Probability Distributions of Continuous Random Variables
260
Probability Density Function The probability distribution of a continuous random variable is describe by probability density curve f(x).
261
Notes: The Total Area under the probability density curve is 1. The Area under the probability density curve is from a to b is P[a < X < b].
262
Normal Probability Distributions (Bell shaped curve)
263
Mean, Variance and standard deviation of Random Variables Numerical descriptors of the distribution of a Random Variable
264
Mean of a Discrete Random Variable The mean, , of a discrete random variable x is found by multiplying each possible value of x by its own probability and then adding all the products together: Notes: The mean is a weighted average of the values of X. The mean is the long-run average value of the random variable. The mean is centre of gravity of the probability distribution of the random variable
265
2 Variance of a Discrete Random Variable: Variance, 2, of a discrete random variable x is found by multiplying each possible value of the squared deviation from the mean, (x ) 2, by its own probability and then adding all the products together: Standard Deviation of a Discrete Random Variable: The positive square root of the variance:
266
The Binomial distribution An important discrete distribution
267
X is said to have the Binomial distribution with parameters n and p. 1. X is the number of successes occurring in the n repetitions of a Success-Failure Experiment. 2.The probability of success is p. 3. The probability function
268
Mean,Variance & Standard Deviation of the Binomial Ditribution The mean, variance and standard deviation of the binomial distribution can be found by using the following three formulas:
269
Mean of a Continuous Random Variable (uses calculus) The mean, , of a discrete random variable x Notes: The mean is a weighted average of the values of X. The mean is the long-run average value of the random variable. The mean is centre of gravity of the probability distribution of the random variable
270
Variance of a Continuous Random Variable Standard Deviation of a Continuous Random Variable: The positive square root of the variance:
271
The Normal Probability Distribution Points of Inflection
272
Main characteristics of the Normal Distribution Bell Shaped, symmetric Points of inflection on the bell shaped curve are at – and + That is one standard deviation from the mean Area under the bell shaped curve between – and + is approximately 2/3. Area under the bell shaped curve between – 2 and + 2 is approximately 95%.
273
Normal approximation to the Binomial distribution Using the Normal distribution to calculate Binomial probabilities
274
Normal Approximation to the Binomial distribution X has a Binomial distribution with parameters n and p Y has a Normal distribution
275
Sampling Theory Determining the distribution of Sample statistics
276
The distribution of the sample mean
277
Thus if x 1, x 2, …, x n denote n independent random variables each coming from the same Normal distribution with mean and standard deviation . Then has Normal distribution with
278
The Central Limit Theorem The Central Limit Theorem (C.L.T.) states that if n is sufficiently large, the sample means of random samples from any population with mean and finite standard deviation are approximately normally distributed with mean and standard deviation. Technical Note: The mean and standard deviation given in the CLT hold for any sample size; it is only the “approximately normal” shape that requires n to be sufficiently large.
279
Graphical Illustration of the Central Limit Theorem Original Population 30 Distribution of x: n = 10 Distribution of x: n = 30 Distribution of x: n = 2 30
280
Implications of the Central Limit Theorem The Conclusion that the sampling distribution of the sample mean is Normal, will to true if the sample size is large (>30). (even though the population may be non- normal). When the population can be assumed to be normal, the sampling distribution of the sample mean is Normal, will to true for any sample size. Knowing the sampling distribution of the sample mean allows to answer probability questions related to the sample mean.
281
Sampling Distribution of a Sample Proportion
282
Sampling Distribution for Sample Proportions Let p = population proportion of interest or binomial probability of success. Let is approximately a normal distribution with = sample proportion or proportion of successes.
283
Sampling distribution of a differences
284
If X, Yare independent normal random variables, then : X – Y is normal with Note
285
Sampling distribution of a difference in two Sample means
286
Situation We have two normal populations (1 and 2) Let 1 and 1 denote the mean and standard deviation of population 1. Let 2 and 2 denote the mean and standard deviation of population 2. Let x 1, x 2, x 3, …, x n denote a sample from a normal population 1. Let y 1, y 2, y 3, …, y m denote a sample from a normal population 2. Objective is to compare the two population means
287
Then
288
Sampling distribution of a difference in two Sample proportions
289
Situation Suppose we have two Success-Failure experiments Let p 1 = the probability of success for experiment 1. Let p 2 = the probability of success for experiment 2. Suppose that experiment 1 is repeated n 1 times and experiment 2 is repeated n 2 Let x 1 = the no. of successes in the n 1 repititions of experiment 1, x 2 = the no. of successes in the n 2 repititions of experiment 2.
290
Then
291
The Chi-square ( 2 ) distribution
292
The Chi-squared distribution with degrees of freedom Comment: If z 1, z 2,..., z are independent random variables each having a standard normal distribution then U = has a chi-squared distribution with degrees of freedom.
293
The Chi-squared distribution with degrees of freedom - degrees of freedom
294
2 d.f. 3 d.f. 4 d.f.
295
Statistics that have the Chi-squared distribution: This statistic is used to detect independence between two categorical variables d.f. = (r – 1)(c – 1)
296
Let x 1, x 2, …, x n denote a sample from the normal distribution with mean and standard deviation , then has a chi-square distribution with d.f. = n – 1.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.