STAT 6304 Final Project Fall, 2016.

STAT 6304 Final Project Fall, 2016

Statistical Tools vs. Variable Types
Response (output) Predictor (input) Numerical Categorical/Mixed Simple and Multiple Regression Analysis of Variance (ANOVA) Analysis of Covariance (ANCOVA) Categorical Categorical data analysis

Project Outline For detail, see For analysis part, you should Explore Data. What do the data show? By tables, graphs and descriptive statistics Build up models: (One-way) ANOVA model or SLR model or both For categorical data, conduct analysis in Chapter 10 Explain how the explanatory variables (X) affect responses (Y) based on these models/analysis

More on One-way ANOVA Method
Use the battery data Quick review of Lab5-R-one-way Now we know the mean lifetimes of brands are not all the same. What next?

Multiple Comparison Procedures
Once we reject H0: ==...t in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns (levels), are all 4 ’s different? Are 3 the same and one different? If so, which one? etc. multiple comparisons

These “more detailed” inquiries into the process are called MULTIPLE COMPARISON PROCEDURES.
Errors (Type I): We set up “” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at = .05; each test has type I error (rej H0 when it’s true) of However, P(at least one type I error in the 3 tests) = 1-P( accept all ) = 1 - (.95)3  .14 3, given true multiple comparisons

In other words, Probability is
In other words, Probability is .14 that at least one type one error is made. For 5 tests, prob = .23. Question - Should we choose = .05, and suffer (for 5 tests) a .23 Experimentwise Error rate (“a” or aE)? OR Should we choose/control the overall error rate, “a”, to be .05, and find the individual test  by 1 - (1-)5 = .05, (which gives us  = .011)? multiple comparisons

would be valid only if the tests are independent; often they’re not.
The formula 1 - (1-)5 = .05 would be valid only if the tests are independent; often they’re not. [ e.g., 1=22= 3, 1= 3 IF accepted & rejected, isn’t it more likely that rejected? ] 1 2 3 1 2 3 multiple comparisons

Error Rates When the tests are not independent, it’s usually very difficult to arrive at the correct for an individual test so that a specified value results for the experimentwise error rate (or called family error rate). multiple comparisons

There are many multiple comparison procedures. We’ll cover only a few.
Pairwise Comparisons Method 1: (Fisher Test) Do a series of pairwise t-tests (pooled t-tests), each with specified  value (for individual test). This is called “Fisher’s LEAST SIGNIFICANT DIFFERENCE” (LSD). Not recommended!! multiple comparisons

Pairwise comparisons Method 2: (Tukey Test) A procedure which controls the experimentwise error rate is “TUKEY’S HONESTLY SIGNIFICANT DIFFERENCE TEST ”. Recommended!! Note that to avoid being too conservative, the significance level of Tukey test can be set bigger (10%), especially for more than 4 samples. multiple comparisons

Summarize the Results: Underline Diagram
Summarize the comparison results. (Lab5-R) Now, sort the sample means in an ascending order. We will begin at the smallest one. Compare the smallest and largest and check if they are significant (p-value < .10). No, mark a underline between them and Stop; Yes, continue to compare the smallest and the 2nd largest Repeat step 2 for the 2nd smallest and so on. multiple comparisons

More on Linear Regression
Multiple Linear Regression One response Y and many predictors X1, …, Xk Regression Equation is Y=b0+b1X1+b2X2+…bkXk Model diagnostics are the same (see Lab 7- R): Normality of residuals Equal variance of residuals Independence of residuals

U. S. State Public-School Expenditures (Anscombe dataset in R, Car library)
The Anscombe data frame has 51 rows and 4 columns. The observations are the U. S. states plus Washington, D. C. in 1970. This data frame contains the following columns: educationPer-capita education expenditures, dollars. (Y) incomePer-capita income, dollars. (X1) youngProportion under 18, per (X2) urbanProportion urban, per (X3)

# import data library(car) data(Anscombe) education<-Anscombe$education income<-Anscombe$income young<-Anscombe$young urban<-Anscombe$urban # draw scatterplots plot(education,income) plot(education,young) plot(education,urban) # Multiple Linear Regression result<-lm(education~income+young+urban) summary(result) rs<-rstandard(result) fits<-fitted.values(result) # check for normality qqnorm(rs) qqline(rs,col=2) shapiro.test(rs) # check for equal variances: residual plots plot(fits,rs,main="residuals vs. fitted values") plot(income,rs) plot(young,rs) plot(urban,rs) # If we add income^2 into the model income2<-income^2 result2<-lm(education~income+income2+young+urban) summary(result2)

STAT 6304 Final Project Fall, 2016.

Similar presentations

Presentation on theme: "STAT 6304 Final Project Fall, 2016."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

STAT 6304 Final Project Fall, 2016.

Similar presentations

Presentation on theme: "STAT 6304 Final Project Fall, 2016."— Presentation transcript:

Similar presentations

About project

Feedback